npm - @bhargavvc/sdd-cc - Versions diffs - 1.30.1 → 1.35.0 - Mend

@bhargavvc/sdd-cc 1.30.1 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (242) hide show

package/README.ja-JP.md +144 -110
package/README.ko-KR.md +143 -107
package/README.md +183 -112
package/README.pt-BR.md +90 -52
package/README.zh-CN.md +141 -101
package/agents/sdd-advisor-researcher.md +23 -0
package/agents/sdd-ai-researcher.md +133 -0
package/agents/sdd-code-fixer.md +516 -0
package/agents/sdd-code-reviewer.md +355 -0
package/agents/sdd-codebase-mapper.md +3 -3
package/agents/sdd-debugger.md +17 -5
package/agents/sdd-doc-verifier.md +201 -0
package/agents/sdd-doc-writer.md +602 -0
package/agents/sdd-domain-researcher.md +153 -0
package/agents/sdd-eval-auditor.md +164 -0
package/agents/sdd-eval-planner.md +154 -0
package/agents/sdd-executor.md +87 -4
package/agents/sdd-framework-selector.md +160 -0
package/agents/sdd-intel-updater.md +314 -0
package/agents/sdd-nyquist-auditor.md +1 -1
package/agents/sdd-phase-researcher.md +71 -4
package/agents/sdd-plan-checker.md +100 -6
package/agents/sdd-planner.md +145 -206
package/agents/sdd-project-researcher.md +25 -2
package/agents/sdd-research-synthesizer.md +3 -3
package/agents/sdd-roadmapper.md +6 -6
package/agents/sdd-security-auditor.md +128 -0
package/agents/sdd-ui-auditor.md +43 -3
package/agents/sdd-ui-checker.md +5 -5
package/agents/sdd-ui-researcher.md +27 -4
package/agents/sdd-user-profiler.md +2 -2
package/agents/sdd-verifier.md +142 -22
package/bin/install.js +2145 -545
package/commands/sdd/add-backlog.md +5 -5
package/commands/sdd/add-tests.md +2 -2
package/commands/sdd/ai-integration-phase.md +36 -0
package/commands/sdd/analyze-dependencies.md +34 -0
package/commands/sdd/audit-fix.md +33 -0
package/commands/sdd/autonomous.md +7 -2
package/commands/sdd/cleanup.md +5 -0
package/commands/sdd/code-review-fix.md +52 -0
package/commands/sdd/code-review.md +55 -0
package/commands/sdd/complete-milestone.md +6 -6
package/commands/sdd/debug.md +22 -9
package/commands/sdd/discuss-phase.md +7 -2
package/commands/sdd/do.md +1 -1
package/commands/sdd/docs-update.md +48 -0
package/commands/sdd/eval-review.md +32 -0
package/commands/sdd/execute-phase.md +4 -0
package/commands/sdd/explore.md +27 -0
package/commands/sdd/fast.md +2 -2
package/commands/sdd/from-sdd2.md +45 -0
package/commands/sdd/help.md +2 -0
package/commands/sdd/import.md +36 -0
package/commands/sdd/intel.md +179 -0
package/commands/sdd/join-discord.md +2 -1
package/commands/sdd/manager.md +1 -0
package/commands/sdd/map-codebase.md +3 -3
package/commands/sdd/new-milestone.md +1 -1
package/commands/sdd/new-project.md +5 -1
package/commands/sdd/new-workspace.md +1 -1
package/commands/sdd/next.md +2 -0
package/commands/sdd/plan-milestone-gaps.md +2 -2
package/commands/sdd/plan-phase.md +6 -1
package/commands/sdd/plant-seed.md +1 -1
package/commands/sdd/profile-user.md +1 -1
package/commands/sdd/quick.md +5 -3
package/commands/sdd/reapply-patches.md +230 -42
package/commands/sdd/research-phase.md +3 -3
package/commands/sdd/review-backlog.md +1 -0
package/commands/sdd/review.md +6 -3
package/commands/sdd/scan.md +26 -0
package/commands/sdd/secure-phase.md +35 -0
package/commands/sdd/ship.md +1 -1
package/commands/sdd/thread.md +5 -5
package/commands/sdd/undo.md +34 -0
package/commands/sdd/verify-work.md +1 -1
package/commands/sdd/workstreams.md +17 -11
package/hooks/dist/sdd-check-update.js +33 -8
package/hooks/dist/sdd-context-monitor.js +17 -8
package/hooks/dist/sdd-phase-boundary.sh +27 -0
package/hooks/dist/sdd-prompt-guard.js +1 -0
package/hooks/dist/sdd-read-guard.js +82 -0
package/hooks/dist/sdd-session-state.sh +33 -0
package/hooks/dist/sdd-statusline.js +137 -15
package/hooks/dist/sdd-validate-commit.sh +47 -0
package/hooks/dist/sdd-workflow-guard.js +4 -4
package/hooks/sdd-check-update.js +139 -0
package/hooks/sdd-context-monitor.js +165 -0
package/hooks/sdd-phase-boundary.sh +27 -0
package/hooks/sdd-prompt-guard.js +97 -0
package/hooks/sdd-read-guard.js +82 -0
package/hooks/sdd-session-state.sh +33 -0
package/hooks/sdd-statusline.js +241 -0
package/hooks/sdd-validate-commit.sh +47 -0
package/hooks/sdd-workflow-guard.js +94 -0
package/package.json +3 -3
package/scripts/build-hooks.js +18 -7
package/scripts/prompt-injection-scan.sh +1 -0
package/scripts/rebrand-gsd-to-sdd.sh +221 -220
package/scripts/run-tests.cjs +5 -1
package/scripts/sync-upstream.sh +1 -1
package/sdd/bin/lib/commands.cjs +79 -17
package/sdd/bin/lib/config.cjs +90 -48
package/sdd/bin/lib/core.cjs +452 -87
package/sdd/bin/lib/docs.cjs +267 -0
package/sdd/bin/lib/frontmatter.cjs +381 -336
package/sdd/bin/lib/init.cjs +110 -16
package/sdd/bin/lib/intel.cjs +660 -0
package/sdd/bin/lib/learnings.cjs +378 -0
package/sdd/bin/lib/milestone.cjs +42 -11
package/sdd/bin/lib/model-profiles.cjs +17 -15
package/sdd/bin/lib/phase.cjs +367 -288
package/sdd/bin/lib/profile-output.cjs +106 -10
package/sdd/bin/lib/roadmap.cjs +146 -115
package/sdd/bin/lib/schema-detect.cjs +238 -0
package/sdd/bin/lib/sdd2-import.cjs +511 -0
package/sdd/bin/lib/security.cjs +124 -3
package/sdd/bin/lib/state.cjs +648 -264
package/sdd/bin/lib/template.cjs +8 -4
package/sdd/bin/lib/verify.cjs +209 -28
package/sdd/bin/lib/workstream.cjs +7 -3
package/sdd/bin/sdd-tools.cjs +184 -12
package/sdd/contexts/dev.md +21 -0
package/sdd/contexts/research.md +22 -0
package/sdd/contexts/review.md +22 -0
package/sdd/references/agent-contracts.md +79 -0
package/sdd/references/ai-evals.md +156 -0
package/sdd/references/ai-frameworks.md +186 -0
package/sdd/references/artifact-types.md +113 -0
package/sdd/references/common-bug-patterns.md +114 -0
package/sdd/references/context-budget.md +49 -0
package/sdd/references/continuation-format.md +25 -25
package/sdd/references/domain-probes.md +125 -0
package/sdd/references/few-shot-examples/plan-checker.md +73 -0
package/sdd/references/few-shot-examples/verifier.md +109 -0
package/sdd/references/gate-prompts.md +100 -0
package/sdd/references/gates.md +70 -0
package/sdd/references/git-integration.md +1 -1
package/sdd/references/ios-scaffold.md +123 -0
package/sdd/references/model-profile-resolution.md +2 -0
package/sdd/references/model-profiles.md +24 -18
package/sdd/references/planner-gap-closure.md +62 -0
package/sdd/references/planner-reviews.md +39 -0
package/sdd/references/planner-revision.md +87 -0
package/sdd/references/planning-config.md +252 -0
package/sdd/references/revision-loop.md +97 -0
package/sdd/references/thinking-models-debug.md +44 -0
package/sdd/references/thinking-models-execution.md +50 -0
package/sdd/references/thinking-models-planning.md +62 -0
package/sdd/references/thinking-models-research.md +50 -0
package/sdd/references/thinking-models-verification.md +55 -0
package/sdd/references/thinking-partner.md +96 -0
package/sdd/references/ui-brand.md +4 -4
package/sdd/references/universal-anti-patterns.md +63 -0
package/sdd/references/verification-overrides.md +227 -0
package/sdd/references/workstream-flag.md +56 -3
package/sdd/templates/AI-SPEC.md +246 -0
package/sdd/templates/DEBUG.md +1 -1
package/sdd/templates/SECURITY.md +61 -0
package/sdd/templates/UAT.md +4 -4
package/sdd/templates/VALIDATION.md +4 -4
package/sdd/templates/claude-md.md +32 -9
package/sdd/templates/config.json +4 -0
package/sdd/templates/debug-subagent-prompt.md +1 -1
package/sdd/templates/dev-preferences.md +1 -1
package/sdd/templates/discovery.md +2 -2
package/sdd/templates/phase-prompt.md +1 -1
package/sdd/templates/planner-subagent-prompt.md +3 -3
package/sdd/templates/project.md +1 -1
package/sdd/templates/research.md +1 -1
package/sdd/templates/state.md +2 -2
package/sdd/workflows/add-phase.md +8 -8
package/sdd/workflows/add-tests.md +12 -9
package/sdd/workflows/add-todo.md +5 -3
package/sdd/workflows/ai-integration-phase.md +284 -0
package/sdd/workflows/analyze-dependencies.md +96 -0
package/sdd/workflows/audit-fix.md +157 -0
package/sdd/workflows/audit-milestone.md +11 -11
package/sdd/workflows/audit-uat.md +2 -2
package/sdd/workflows/autonomous.md +195 -27
package/sdd/workflows/check-todos.md +12 -10
package/sdd/workflows/cleanup.md +2 -0
package/sdd/workflows/code-review-fix.md +497 -0
package/sdd/workflows/code-review.md +515 -0
package/sdd/workflows/complete-milestone.md +56 -22
package/sdd/workflows/diagnose-issues.md +10 -3
package/sdd/workflows/discovery-phase.md +5 -3
package/sdd/workflows/discuss-phase-assumptions.md +24 -6
package/sdd/workflows/discuss-phase-power.md +291 -0
package/sdd/workflows/discuss-phase.md +173 -21
package/sdd/workflows/do.md +23 -21
package/sdd/workflows/docs-update.md +1155 -0
package/sdd/workflows/eval-review.md +155 -0
package/sdd/workflows/execute-phase.md +594 -38
package/sdd/workflows/execute-plan.md +67 -96
package/sdd/workflows/explore.md +139 -0
package/sdd/workflows/fast.md +5 -5
package/sdd/workflows/forensics.md +2 -2
package/sdd/workflows/health.md +4 -4
package/sdd/workflows/help.md +122 -119
package/sdd/workflows/import.md +276 -0
package/sdd/workflows/inbox.md +387 -0
package/sdd/workflows/insert-phase.md +7 -7
package/sdd/workflows/list-phase-assumptions.md +4 -4
package/sdd/workflows/list-workspaces.md +2 -2
package/sdd/workflows/manager.md +35 -32
package/sdd/workflows/map-codebase.md +7 -5
package/sdd/workflows/milestone-summary.md +2 -2
package/sdd/workflows/new-milestone.md +17 -9
package/sdd/workflows/new-project.md +50 -25
package/sdd/workflows/new-workspace.md +7 -5
package/sdd/workflows/next.md +67 -11
package/sdd/workflows/note.md +9 -7
package/sdd/workflows/pause-work.md +75 -12
package/sdd/workflows/plan-milestone-gaps.md +8 -8
package/sdd/workflows/plan-phase.md +294 -42
package/sdd/workflows/plant-seed.md +6 -3
package/sdd/workflows/pr-branch.md +42 -14
package/sdd/workflows/profile-user.md +9 -7
package/sdd/workflows/progress.md +45 -45
package/sdd/workflows/quick.md +195 -47
package/sdd/workflows/remove-phase.md +6 -6
package/sdd/workflows/remove-workspace.md +3 -1
package/sdd/workflows/research-phase.md +2 -2
package/sdd/workflows/resume-project.md +12 -12
package/sdd/workflows/review.md +109 -9
package/sdd/workflows/scan.md +102 -0
package/sdd/workflows/secure-phase.md +166 -0
package/sdd/workflows/session-report.md +2 -2
package/sdd/workflows/settings.md +38 -12
package/sdd/workflows/ship.md +21 -9
package/sdd/workflows/stats.md +1 -1
package/sdd/workflows/transition.md +23 -23
package/sdd/workflows/ui-phase.md +15 -7
package/sdd/workflows/ui-review.md +29 -4
package/sdd/workflows/undo.md +314 -0
package/sdd/workflows/update.md +171 -20
package/sdd/workflows/validate-phase.md +6 -4
package/sdd/workflows/verify-phase.md +210 -6
package/sdd/workflows/verify-work.md +83 -9
package/sdd/commands/sdd/workstreams.md +0 -63

package/sdd/references/ai-frameworks.md ADDED Viewed

@@ -0,0 +1,186 @@
+# AI Framework Decision Matrix
+> Reference used by `sdd-framework-selector` and `sdd-ai-researcher`.
+> Distilled from official docs, benchmarks, and developer reports (2026).
+---
+## Quick Picks
+| Situation | Pick |
+|-----------|------|
+| Simplest path to a working agent (OpenAI) | OpenAI Agents SDK |
+| Simplest path to a working agent (model-agnostic) | CrewAI |
+| Production RAG / document Q&A | LlamaIndex |
+| Complex stateful workflows with branching | LangGraph |
+| Multi-agent teams with defined roles | CrewAI |
+| Code-aware autonomous agents (Anthropic) | Claude Agent SDK |
+| "I don't know my requirements yet" | LangChain |
+| Regulated / audit-trail required | LangGraph |
+| Enterprise Microsoft/.NET shops | AutoGen/AG2 |
+| Google Cloud / Gemini-committed teams | Google ADK |
+| Pure NLP pipelines with explicit control | Haystack |
+---
+## Framework Profiles
+### CrewAI
+- **Type:** Multi-agent orchestration
+- **Language:** Python only
+- **Model support:** Model-agnostic
+- **Learning curve:** Beginner (role/task/crew maps to real teams)
+- **Best for:** Content pipelines, research automation, business process workflows, rapid prototyping
+- **Avoid if:** Fine-grained state management, TypeScript, fault-tolerant checkpointing, complex conditional branching
+- **Strengths:** Fastest multi-agent prototyping, 5.76x faster than LangGraph on QA tasks, built-in memory (short/long/entity/contextual), Flows architecture, standalone (no LangChain dep)
+- **Weaknesses:** Limited checkpointing, coarse error handling, Python only
+- **Eval concerns:** Task decomposition accuracy, inter-agent handoff, goal completion rate, loop detection
+### LlamaIndex
+- **Type:** RAG and data ingestion
+- **Language:** Python + TypeScript
+- **Model support:** Model-agnostic
+- **Learning curve:** Intermediate
+- **Best for:** Legal research, internal knowledge assistants, enterprise document search, any system where retrieval quality is the #1 priority
+- **Avoid if:** Primary need is agent orchestration, multi-agent collaboration, or chatbot conversation flow
+- **Strengths:** Best-in-class document parsing (LlamaParse), 35% retrieval accuracy improvement, 20-30% faster queries, mixed retrieval strategies (vector + graph + reranker)
+- **Weaknesses:** Data framework first — agent orchestration is secondary
+- **Eval concerns:** Context faithfulness, hallucination, answer relevance, retrieval precision/recall
+### LangChain
+- **Type:** General-purpose LLM framework
+- **Language:** Python + TypeScript
+- **Model support:** Model-agnostic (widest ecosystem)
+- **Learning curve:** Intermediate–Advanced
+- **Best for:** Evolving requirements, many third-party integrations, teams wanting one framework for everything, RAG + agents + chains
+- **Avoid if:** Simple well-defined use case, RAG-primary (use LlamaIndex), complex stateful workflows (use LangGraph), performance at scale is critical
+- **Strengths:** Largest community and integration ecosystem, 25% faster development vs scratch, covers RAG/agents/chains/memory
+- **Weaknesses:** Abstraction overhead, p99 latency degrades under load, complexity creep risk
+- **Eval concerns:** End-to-end task completion, chain correctness, retrieval quality
+### LangGraph
+- **Type:** Stateful agent workflows (graph-based)
+- **Language:** Python + TypeScript (full parity)
+- **Model support:** Model-agnostic (inherits LangChain integrations)
+- **Learning curve:** Intermediate–Advanced (graph mental model)
+- **Best for:** Production-grade stateful workflows, regulated industries, audit trails, human-in-the-loop flows, fault-tolerant multi-step agents
+- **Avoid if:** Simple chatbot, purely linear workflow, rapid prototyping
+- **Strengths:** Best checkpointing (every node), time-travel debugging, native Postgres/Redis persistence, streaming support, chosen by 62% of developers for stateful agent work (2026)
+- **Weaknesses:** More upfront scaffolding, steeper curve, overkill for simple cases
+- **Eval concerns:** State transition correctness, goal completion rate, tool use accuracy, safety guardrails
+### OpenAI Agents SDK
+- **Type:** Native OpenAI agent framework
+- **Language:** Python + TypeScript
+- **Model support:** Optimized for OpenAI (supports 100+ via Chat Completions compatibility)
+- **Learning curve:** Beginner (4 primitives: Agents, Handoffs, Guardrails, Tracing)
+- **Best for:** OpenAI-committed teams, rapid agent prototyping, voice agents (gpt-realtime), teams wanting visual builder (AgentKit)
+- **Avoid if:** Model flexibility needed, complex multi-agent collaboration, persistent state management required, vendor lock-in concern
+- **Strengths:** Simplest mental model, built-in tracing and guardrails, Handoffs for agent delegation, Realtime Agents for voice
+- **Weaknesses:** OpenAI vendor lock-in, no built-in persistent state, younger ecosystem
+- **Eval concerns:** Instruction following, safety guardrails, escalation accuracy, tone consistency
+### Claude Agent SDK (Anthropic)
+- **Type:** Code-aware autonomous agent framework
+- **Language:** Python + TypeScript
+- **Model support:** Claude models only
+- **Learning curve:** Intermediate (18 hook events, MCP, tool decorators)
+- **Best for:** Developer tooling, code generation/review agents, autonomous coding assistants, MCP-heavy architectures, safety-critical applications
+- **Avoid if:** Model flexibility needed, stable/mature API required, use case unrelated to code/tool-use
+- **Strengths:** Deepest MCP integration, built-in filesystem/shell access, 18 lifecycle hooks, automatic context compaction, extended thinking, safety-first design
+- **Weaknesses:** Claude-only vendor lock-in, newer/evolving API, smaller community
+- **Eval concerns:** Tool use correctness, safety, code quality, instruction following
+### AutoGen / AG2 / Microsoft Agent Framework
+- **Type:** Multi-agent conversational framework
+- **Language:** Python (AG2), Python + .NET (Microsoft Agent Framework)
+- **Model support:** Model-agnostic
+- **Learning curve:** Intermediate–Advanced
+- **Best for:** Research applications, conversational problem-solving, code generation + execution loops, Microsoft/.NET shops
+- **Avoid if:** You want ecosystem stability, deterministic workflows, or "safest long-term bet" (fragmentation risk)
+- **Strengths:** Most sophisticated conversational agent patterns, code generation + execution loop, async event-driven (v0.4+), cross-language interop (Microsoft Agent Framework)
+- **Weaknesses:** Ecosystem fragmented (AutoGen maintenance mode, AG2 fork, Microsoft Agent Framework preview) — genuine long-term risk
+- **Eval concerns:** Conversation goal completion, consensus quality, code execution correctness
+### Google ADK (Agent Development Kit)
+- **Type:** Multi-agent orchestration framework
+- **Language:** Python + Java
+- **Model support:** Optimized for Gemini; supports other models via LiteLLM
+- **Learning curve:** Intermediate (agent/tool/session model, familiar if you know LangGraph)
+- **Best for:** Google Cloud / Vertex AI shops, multi-agent workflows needing built-in session management and memory, teams already committed to Gemini, agent pipelines that need Google Search / BigQuery tool integration
+- **Avoid if:** Model flexibility is required beyond Gemini, no Google Cloud dependency acceptable, TypeScript-only stack
+- **Strengths:** First-party Google support, built-in session/memory/artifact management, tight Vertex AI and Google Search integration, own eval framework (RAGAS-compatible), multi-agent by design (sequential, parallel, loop patterns), Java SDK for enterprise teams
+- **Weaknesses:** Gemini vendor lock-in in practice, younger community than LangChain/LlamaIndex, less third-party integration depth
+- **Eval concerns:** Multi-agent task decomposition, tool use correctness, session state consistency, goal completion rate
+### Haystack
+- **Type:** NLP pipeline framework
+- **Language:** Python
+- **Model support:** Model-agnostic
+- **Learning curve:** Intermediate
+- **Best for:** Explicit, auditable NLP pipelines, document processing with fine-grained control, enterprise search, regulated industries needing transparency
+- **Avoid if:** Rapid prototyping, multi-agent workflows, or you want a large community
+- **Strengths:** Explicit pipeline control, strong for structured data pipelines, good documentation
+- **Weaknesses:** Smaller community, less agent-oriented than alternatives
+- **Eval concerns:** Extraction accuracy, pipeline output validity, retrieval quality
+---
+## Decision Dimensions
+### By System Type
+| System Type | Primary Framework(s) | Key Eval Concerns |
+|-------------|---------------------|-------------------|
+| RAG / Knowledge Q&A | LlamaIndex, LangChain | Context faithfulness, hallucination, retrieval precision/recall |
+| Multi-agent orchestration | CrewAI, LangGraph, Google ADK | Task decomposition, handoff quality, goal completion |
+| Conversational assistants | OpenAI Agents SDK, Claude Agent SDK | Tone, safety, instruction following, escalation |
+| Structured data extraction | LangChain, LlamaIndex | Schema compliance, extraction accuracy |
+| Autonomous task agents | LangGraph, OpenAI Agents SDK | Safety guardrails, tool correctness, cost adherence |
+| Content generation | Claude Agent SDK, OpenAI Agents SDK | Brand voice, factual accuracy, tone |
+| Code automation | Claude Agent SDK | Code correctness, safety, test pass rate |
+### By Team Size and Stage
+| Context | Recommendation |
+|---------|----------------|
+| Solo dev, prototyping | OpenAI Agents SDK or CrewAI (fastest to running) |
+| Solo dev, RAG | LlamaIndex (batteries included) |
+| Team, production, stateful | LangGraph (best fault tolerance) |
+| Team, evolving requirements | LangChain (broadest escape hatches) |
+| Team, multi-agent | CrewAI (simplest role abstraction) |
+| Enterprise, .NET | AutoGen AG2 / Microsoft Agent Framework |
+### By Model Commitment
+| Preference | Framework |
+|-----------|-----------|
+| OpenAI-only | OpenAI Agents SDK |
+| Anthropic/Claude-only | Claude Agent SDK |
+| Google/Gemini-committed | Google ADK |
+| Model-agnostic (full flexibility) | LangChain, LlamaIndex, CrewAI, LangGraph, Haystack |
+---
+## Anti-Patterns
+1. **Using LangChain for simple chatbots** — Direct SDK call is less code, faster, and easier to debug
+2. **Using CrewAI for complex stateful workflows** — Checkpointing gaps will bite you in production
+3. **Using OpenAI Agents SDK with non-OpenAI models** — Loses the integration benefits you chose it for
+4. **Using LlamaIndex as a multi-agent framework** — It can do agents, but that's not its strength
+5. **Defaulting to LangChain without evaluating alternatives** — "Everyone uses it" ≠ right for your use case
+6. **Starting a new project on AutoGen (not AG2)** — AutoGen is in maintenance mode; use AG2 or wait for Microsoft Agent Framework GA
+7. **Choosing LangGraph for simple linear flows** — The graph overhead is not worth it; use LangChain chains instead
+8. **Ignoring vendor lock-in** — Provider-native SDKs (OpenAI, Claude) trade flexibility for integration depth; decide consciously
+---
+## Combination Plays (Multi-Framework Stacks)
+| Production Pattern | Stack |
+|-------------------|-------|
+| RAG with observability | LlamaIndex + LangSmith or Langfuse |
+| Stateful agent with RAG | LangGraph + LlamaIndex |
+| Multi-agent with tracing | CrewAI + Langfuse |
+| OpenAI agents with evals | OpenAI Agents SDK + Promptfoo or Braintrust |
+| Claude agents with MCP | Claude Agent SDK + LangSmith or Arize Phoenix |

package/sdd/references/artifact-types.md ADDED Viewed

@@ -0,0 +1,113 @@
+# SDD Artifact Types
+This reference documents all artifact types in the SDD planning taxonomy. Each type has a defined
+shape, lifecycle, location, and consumption mechanism. A well-formatted artifact that no workflow
+reads is inert — the consumption mechanism is what gives an artifact meaning.
+---
+## Core Artifacts
+### ROADMAP.md
+- **Shape**: Milestone + phase listing with goals and canonical refs
+- **Lifecycle**: Created → Updated per milestone → Archived
+- **Location**: `.planning/ROADMAP.md`
+- **Consumed by**: `plan-phase`, `discuss-phase`, `execute-phase`, `progress`, `state` commands
+### STATE.md
+- **Shape**: Current position tracker (phase, plan, progress, decisions)
+- **Lifecycle**: Continuously updated throughout the project
+- **Location**: `.planning/STATE.md`
+- **Consumed by**: All orchestration workflows; `resume-project`, `progress`, `next` commands
+### REQUIREMENTS.md
+- **Shape**: Numbered acceptance criteria with traceability table
+- **Lifecycle**: Created at project start → Updated as requirements are satisfied
+- **Location**: `.planning/REQUIREMENTS.md`
+- **Consumed by**: `discuss-phase`, `plan-phase`, CONTEXT.md generation; executor marks complete
+### CONTEXT.md (per-phase)
+- **Shape**: 6-section format: domain, decisions, canonical_refs, code_context, specifics, deferred
+- **Lifecycle**: Created before planning → Used during planning and execution → Superseded by next phase
+- **Location**: `.planning/phases/XX-name/XX-CONTEXT.md`
+- **Consumed by**: `plan-phase` (reads decisions), `execute-phase` (reads code_context and canonical_refs)
+### PLAN.md (per-plan)
+- **Shape**: Frontmatter + objective + tasks with types + success criteria + output spec
+- **Lifecycle**: Created by planner → Executed → SUMMARY.md produced
+- **Location**: `.planning/phases/XX-name/XX-YY-PLAN.md`
+- **Consumed by**: `execute-phase` executor; task commits reference plan IDs
+### SUMMARY.md (per-plan)
+- **Shape**: Frontmatter with dependency graph + narrative + deviations + self-check
+- **Lifecycle**: Created at plan completion → Read by subsequent plans in same phase
+- **Location**: `.planning/phases/XX-name/XX-YY-SUMMARY.md`
+- **Consumed by**: Orchestrator (progress), planner (context for future plans), `milestone-summary`
+### HANDOFF.json / .continue-here.md
+- **Shape**: Structured pause state (JSON machine-readable + Markdown human-readable)
+- **Lifecycle**: Created on pause → Consumed on resume → Replaced by next pause
+- **Location**: `.planning/HANDOFF.json` + `.planning/phases/XX-name/.continue-here.md` (or spike/deliberation path)
+- **Consumed by**: `resume-project` workflow
+---
+## Extended Artifacts
+### DISCUSSION-LOG.md (per-phase)
+- **Shape**: Audit trail of assumptions and corrections from discuss-phase
+- **Lifecycle**: Created at discussion time → Read-only audit record
+- **Location**: `.planning/phases/XX-name/XX-DISCUSSION-LOG.md`
+- **Consumed by**: Human review; not read by automated workflows
+### USER-PROFILE.md
+- **Shape**: Calibration tier and preferences profile
+- **Lifecycle**: Created by `profile-user` → Updated as preferences are observed
+- **Location**: `~/.claude/sdd/USER-PROFILE.md`
+- **Consumed by**: `discuss-phase-assumptions` (calibration tier), `plan-phase`
+### SPIKE.md / DESIGN.md (per-spike)
+- **Shape**: Research question + methodology + findings + recommendation
+- **Lifecycle**: Created → Investigated → Decided → Archived
+- **Location**: `.planning/spikes/SPIKE-NNN/`
+- **Consumed by**: Planner when spike is referenced; `pause-work` for spike context handoff
+---
+## Standing Reference Artifacts
+### METHODOLOGY.md
+- **Shape**: Standing reference — reusable interpretive frameworks (lenses) that apply across phases
+- **Lifecycle**: Created → Active → Superseded (when a lens is replaced by a better one)
+- **Location**: `.planning/METHODOLOGY.md` (project-scoped, not phase-scoped)
+- **Contents**: Named lenses, each documenting:
+  - What it diagnoses (the class of problem it detects)
+  - What it recommends (the class of response it prescribes)
+  - When to apply (triggering conditions)
+  - Example: Bayesian updating, STRIDE threat modeling, Cost-of-delay prioritization
+- **Consumed by**:
+  - `discuss-phase-assumptions` — reads METHODOLOGY.md (if it exists) and applies active lenses
+    to the current assumption analysis before surfacing findings to the user
+  - `plan-phase` — reads METHODOLOGY.md to inform methodology selection for each plan
+  - `pause-work` — includes METHODOLOGY.md in the Required Reading section of `.continue-here.md`
+    so resuming agents inherit the project's analytical orientation
+**Why consumption matters:** A METHODOLOGY.md that no workflow reads is inert. The lenses only
+take effect when an agent loads them into its reasoning context before analysis. This is why
+both the discuss-phase-assumptions and pause-work workflows explicitly reference this file.
+**Example lens entry:**
+```markdown
+## Bayesian Updating
+**Diagnoses:** Decisions made with stale priors — assumptions formed early that evidence has since
+contradicted, but which remain embedded in the plan.
+**Recommends:** Before confirming an assumption, ask: "What evidence would make me change this?"
+If no evidence could change it, it's a belief, not an assumption. Flag for user review.
+**Apply when:** Any assumption carries Confident label but was formed before recent architectural
+changes, library upgrades, or scope corrections.
+```

package/sdd/references/common-bug-patterns.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Common Bug Patterns
+Checklist of frequent bug patterns to scan before forming hypotheses. Ordered by frequency. Check these FIRST — they cover ~80% of bugs across all technology stacks.
+<patterns>
+## Null / Undefined Access
+- **Null property access** — accessing property on `null` or `undefined`, missing null check or optional chaining
+- **Missing return value** — function returns `undefined` instead of expected value, missing `return` statement or wrong branch
+- **Destructuring null** — array/object destructuring on `null`/`undefined`, API returned error shape instead of data
+- **Undefaulted optional** — optional parameter used without default, caller omitted argument
+## Off-by-One / Boundary
+- **Wrong loop bound** — loop starts at 1 instead of 0, or ends at `length` instead of `length - 1`
+- **Fence-post error** — "N items need N-1 separators" miscounted
+- **Inclusive vs exclusive** — range boundary `<` vs `<=`, slice/substring end index
+- **Empty collection** — `.length === 0` falls through to logic assuming items exist
+## Async / Timing
+- **Missing await** — async function called without `await`, gets Promise object instead of resolved value
+- **Race condition** — two async operations read/write same state without coordination
+- **Stale closure** — callback captures old variable value, not current one
+- **Initialization order** — event handler fires before setup complete
+- **Leaked timer** — timeout/interval not cleaned up, fires after component/context destroyed
+## State Management
+- **Shared mutation** — object/array modified in place affects other consumers
+- **Stale render** — state updated but UI not re-rendered, missing reactive trigger or wrong reference
+- **Stale handler state** — closure captures state at bind time, not current value
+- **Dual source of truth** — same data stored in two places, one gets out of sync
+- **Invalid transition** — state machine allows transition missing guard condition
+## Import / Module
+- **Circular dependency** — module A imports B, B imports A, one gets `undefined`
+- **Export mismatch** — default vs named export, `import X` vs `import { X }`
+- **Wrong extension** — `.js` vs `.cjs` vs `.mjs`, `.ts` vs `.tsx`
+- **Path case sensitivity** — works on Windows/macOS, fails on Linux
+- **Missing extension** — ESM requires explicit file extensions in imports
+## Type / Coercion
+- **String vs number compare** — `"5" > "10"` is `true` (lexicographic), `5 > 10` is `false`
+- **Implicit coercion** — `==` instead of `===`, truthy/falsy surprises (`0`, `""`, `[]`)
+- **Numeric precision** — `0.1 + 0.2 !== 0.3`, large integers lose precision
+- **Falsy valid value** — value is `0` or `""` which is valid but falsy
+## Environment / Config
+- **Missing env var** — environment variable missing or wrong value in dev vs prod vs CI
+- **Hardcoded path** — works on one machine, fails on another
+- **Port conflict** — port already in use, previous process still running
+- **Permission denied** — different user/group in deployment
+- **Missing dependency** — not in package.json or not installed
+## Data Shape / API Contract
+- **Changed response shape** — backend updated, frontend expects old format
+- **Wrong container type** — array where object expected or vice versa, `data` vs `data.results` vs `data[0]`
+- **Missing required field** — required field omitted in payload, backend returns validation error
+- **Date format mismatch** — ISO string vs timestamp vs locale string
+- **Encoding mismatch** — UTF-8 vs Latin-1, URL encoding, HTML entities
+## Regex / String
+- **Sticky lastIndex** — regex `g` flag with `.test()` then `.exec()`, `lastIndex` not reset between calls
+- **Missing escape** — `.` matches any char, `$` is special, backslash needs doubling
+- **Greedy overmatch** — `.*` eats through delimiters, need `.*?`
+- **Wrong quote type** — string interpolation needs backticks for template literals
+## Error Handling
+- **Swallowed error** — empty `catch {}` or logs but doesn't rethrow/handle
+- **Wrong error type** — catches base `Error` when specific type needed
+- **Error in handler** — cleanup code throws, masking original error
+- **Unhandled rejection** — missing `.catch()` or try/catch around `await`
+## Scope / Closure
+- **Variable shadowing** — inner scope declares same name, hides outer variable
+- **Loop variable capture** — all closures share same `var i`, use `let` or bind
+- **Lost this binding** — callback loses context, need `.bind()` or arrow function
+- **Scope confusion** — `var` hoisted to function, `let`/`const` block-scoped
+</patterns>
+<usage>
+## How to Use This Checklist
+1. **Before forming any hypothesis**, scan the relevant categories based on the symptom
+2. **Match symptom to pattern** — if the bug involves "undefined is not an object", check Null/Undefined first
+3. **Each checked pattern is a hypothesis candidate** — verify or eliminate with evidence
+4. **If no pattern matches**, proceed to open-ended investigation
+### Symptom-to-Category Quick Map
+| Symptom | Check First |
+|---------|------------|
+| "Cannot read property of undefined/null" | Null/Undefined Access |
+| "X is not a function" | Import/Module, Type/Coercion |
+| Works sometimes, fails sometimes | Async/Timing, State Management |
+| Works locally, fails in CI/prod | Environment/Config |
+| Wrong data displayed | Data Shape, State Management |
+| Off by one item / missing last item | Off-by-One/Boundary |
+| "Unexpected token" / parse error | Data Shape, Type/Coercion |
+| Memory leak / growing resource usage | Async/Timing (cleanup), Scope/Closure |
+| Infinite loop / max call stack | State Management, Async/Timing |
+</usage>

package/sdd/references/context-budget.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Context Budget Rules
+Standard rules for keeping orchestrator context lean. Reference this in workflows that spawn subagents or read significant content.
+See also: `references/universal-anti-patterns.md` for the complete set of universal rules.
+---
+## Universal Rules
+Every workflow that spawns agents or reads significant content must follow these rules:
+1. **Never** read agent definition files (`agents/*.md`) -- `subagent_type` auto-loads them
+2. **Never** inline large files into subagent prompts -- tell agents to read files from disk instead
+3. **Read depth scales with context window** -- check `context_window_tokens` in `.planning/config.json`:
+   - At < 500000 tokens (default 200k): read only frontmatter, status fields, or summaries. Never read full SUMMARY.md, VERIFICATION.md, or RESEARCH.md bodies.
+   - At >= 500000 tokens (1M model): MAY read full subagent output bodies when the content is needed for inline presentation or decision-making. Still avoid unnecessary reads.
+4. **Delegate** heavy work to subagents -- the orchestrator routes, it doesn't execute
+5. **Proactive warning**: If you've already consumed significant context (large file reads, multiple subagent results), warn the user: "Context budget is getting heavy. Consider checkpointing progress."
+## Read Depth by Context Window
+| Context Window | Subagent Output Reading | SUMMARY.md | VERIFICATION.md | PLAN.md (other phases) |
+|---------------|------------------------|------------|-----------------|------------------------|
+| < 500k (200k model) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
+| >= 500k (1M model) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
+**How to check:** Read `.planning/config.json` and inspect `context_window_tokens`. If the field is absent, treat as 200k (conservative default).
+## Context Degradation Tiers
+Monitor context usage and adjust behavior accordingly:
+| Tier | Usage | Behavior |
+|------|-------|----------|
+| PEAK | 0-30% | Full operations. Read bodies, spawn multiple agents, inline results. |
+| GOOD | 30-50% | Normal operations. Prefer frontmatter reads, delegate aggressively. |
+| DEGRADING | 50-70% | Economize. Frontmatter-only reads, minimal inlining, warn user about budget. |
+| POOR | 70%+ | Emergency mode. Checkpoint progress immediately. No new reads unless critical. |
+## Context Degradation Warning Signs
+Quality degrades gradually before panic thresholds fire. Watch for these early signals:
+- **Silent partial completion** -- agent claims task is done but implementation is incomplete. Self-check catches file existence but not semantic completeness. Always verify agent output meets the plan's must_haves, not just that files exist.
+- **Increasing vagueness** -- agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This indicates context pressure even before budget warnings fire.
+- **Skipped steps** -- agent omits protocol steps it would normally follow. If an agent's success criteria has 8 items but it only reports 5, suspect context pressure.
+When delegating to agents, the orchestrator cannot verify semantic correctness of agent output -- only structural completeness. This is a fundamental limitation. Mitigate with must_haves.truths and spot-check verification.

package/sdd/references/continuation-format.md CHANGED Viewed

@@ -11,9 +11,9 @@ Standard format for presenting next steps after completing a command or workflow
 **{identifier}: {name}** — {one-line description}
-`{command to copy-paste}`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`{command to copy-paste}`
 ---
@@ -29,7 +29,7 @@ Standard format for presenting next steps after completing a command or workflow
 1. **Always show what it is** — name + description, never just a command path
 2. **Pull context from source** — ROADMAP.md for phases, PLAN.md `<objective>` for plans
 3. **Command in inline code** — backticks, easy to copy-paste, renders as clickable link
-4. **`/clear` explanation** — always include, keeps it concise but explains why
+4. **`/clear` first** — always show `/clear` before the command so users run it in the correct order
 5. **"Also available" not "Other options"** — sounds more app-like
 6. **Visual separators** — `---` above and below to make it stand out
@@ -44,15 +44,15 @@ Standard format for presenting next steps after completing a command or workflow
 **02-03: Refresh Token Rotation** — Add /api/auth/refresh with sliding expiry
-`/sdd:execute-phase 2`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`/sdd-execute-phase 2`
 ---
 **Also available:**
 - Review plan before executing
-- `/sdd:list-phase-assumptions 2` — check assumptions
+- `/sdd-list-phase-assumptions 2` — check assumptions
 ---
 ```
@@ -69,9 +69,9 @@ Add note that this is the last plan and what comes after:
 **02-03: Refresh Token Rotation** — Add /api/auth/refresh with sliding expiry
 <sub>Final plan in Phase 2</sub>
-`/sdd:execute-phase 2`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`/sdd-execute-phase 2`
 ---
@@ -91,15 +91,15 @@ Add note that this is the last plan and what comes after:
 **Phase 2: Authentication** — JWT login flow with refresh tokens
-`/sdd:plan-phase 2`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`/sdd-plan-phase 2`
 ---
 **Also available:**
-- `/sdd:discuss-phase 2` — gather context first
-- `/sdd:research-phase 2` — investigate unknowns
+- `/sdd-discuss-phase 2` — gather context first
+- `/sdd-research-phase 2` — investigate unknowns
 - Review roadmap
 ---
@@ -120,15 +120,15 @@ Show completion status before next action:
 **Phase 3: Core Features** — User dashboard, settings, and data export
-`/sdd:plan-phase 3`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`/sdd-plan-phase 3`
 ---
 **Also available:**
-- `/sdd:discuss-phase 3` — gather context first
-- `/sdd:research-phase 3` — investigate unknowns
+- `/sdd-discuss-phase 3` — gather context first
+- `/sdd-research-phase 3` — investigate unknowns
 - Review what Phase 2 built
 ---
@@ -145,13 +145,13 @@ When there's no clear primary action:
 **Phase 3: Core Features** — User dashboard, settings, and data export
-**To plan directly:** `/sdd:plan-phase 3`
+`/clear` then one of:
-**To discuss context first:** `/sdd:discuss-phase 3`
+**To plan directly:** `/sdd-plan-phase 3`
-**To research unknowns:** `/sdd:research-phase 3`
+**To discuss context first:** `/sdd-discuss-phase 3`
-<sub>`/clear` first → fresh context window</sub>
+**To research unknowns:** `/sdd-research-phase 3`
 ---
 ```
@@ -169,9 +169,9 @@ All 4 phases shipped
 **Start v1.1** — questioning → research → requirements → roadmap
-`/sdd:new-milestone`
+`/clear` then:
-<sub>`/clear` first → fresh context window</sub>
+`/sdd-new-milestone`
 ---
 ```
@@ -214,7 +214,7 @@ Extract: `**02-03: Refresh Token Rotation** — Add /api/auth/refresh with slidi
 ## To Continue
 Run `/clear`, then paste:
-/sdd:execute-phase 2
+/sdd-execute-phase 2
 ```
 User has no idea what 02-03 is about.
@@ -222,7 +222,7 @@ User has no idea what 02-03 is about.
 ### Don't: Missing /clear explanation
 ```
-`/sdd:plan-phase 3`
+`/sdd-plan-phase 3`
 Run /clear first.
 ```
@@ -242,7 +242,7 @@ Sounds like an afterthought. Use "Also available:" instead.
 ```
 ```
-/sdd:plan-phase 3
+/sdd-plan-phase 3
 ```
 ```