npm - buildflow-dev - Versions diffs - 1.0.7 → 4.0.2 - Mend

buildflow-dev 1.0.7 → 4.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +138 -51
package/package.json +4 -2
package/src/commands/init.js +19 -0
package/src/commands/install.js +2 -2
package/templates/CLAUDE.md +49 -33
package/templates/commands/build.md +237 -66
package/templates/commands/check.md +59 -24
package/templates/commands/hotfix.md +119 -0
package/templates/commands/modify.md +212 -38
package/templates/commands/onboard.md +246 -52
package/templates/commands/plan.md +204 -36
package/templates/commands/ship.md +109 -47
package/templates/commands/spec.md +251 -0
package/templates/commands/start.md +38 -8
package/templates/commands/think.md +186 -38

package/templates/commands/spec.md ADDED Viewed

@@ -0,0 +1,251 @@
+---
+name: buildflow-spec
+description: Generate user-story-backed PRD, Technical Design, and Acceptance Criteria with self-critique pass
+allowed-tools: Read, Write, WebSearch
+agent: strategist
+---
+# /buildflow-spec
+Spec-Driven Development layer. Produces formally structured, self-critiqued spec artifacts before any code is planned or written. Every plan task, every build output, and every ship gate references these docs.
+Run after `/buildflow-start`, before `/buildflow-plan`.
+## Usage
+- `/buildflow-spec` — full spec from vision
+- `/buildflow-spec prd` — regenerate PRD only
+- `/buildflow-spec tdd` — regenerate TDD only
+- `/buildflow-spec acceptance` — regenerate ACs only
+- `/buildflow-spec --fast` — minimal spec for small features (single screen / endpoint)
+- `/buildflow-spec --review` — critique existing specs without regenerating
+## Context Packet
+- `.buildflow/core/vision.md`
+- `.buildflow/codebase/PATTERNS.md` (if exists — align spec with existing architecture)
+- `.buildflow/memory/light.md` (app_name, framework, phase only)
+- `.buildflow/specs/` (if regenerating)
+---
+## Step 1: Validate Vision
+Read `.buildflow/core/vision.md`.
+If empty: "Run `/buildflow-start` first."
+If `PATTERNS.md` exists: note the existing architectural style (component structure, naming, API patterns).
+The TDD must align with these — don't invent new patterns unless explicitly asked.
+---
+## Step 2: Clarify (one round, all questions at once)
+Ask only what vision.md left unanswered. Max 5 questions:
+- What does success look like for the **user** — as a measurable outcome, not a feature list?
+- What is explicitly **out of scope** this phase?
+- Any **hard constraints**: tech stack, deadline, team size, compliance?
+- Any **third-party integrations** required?
+- If `--fast`: "Describe the single feature in one sentence."
+---
+## Step 3: Generate PRD
+Write `.buildflow/specs/PRD.md`:
+```markdown
+# Product Requirements Document
+**App:** [name]  **Phase:** [N]  **Date:** [today]  **Status:** DRAFT
+## Problem Statement
+[One paragraph. Must answer: what exists today that's broken/missing, who suffers, why current solutions fail.]
+## Users & Goals
+| User Type | Job-to-be-Done | Current Pain |
+|-----------|----------------|--------------|
+| [type] | [goal — verb phrase] | [specific friction today] |
+## User Stories
+| ID | Story | Acceptance Signal |
+|----|-------|------------------|
+| US-01 | As a [user], I want [goal] so that [outcome] | [one-line measurable signal] |
+| US-02 | ... | ... |
+## Features — In Scope
+| ID | Feature | Linked Stories | Priority | Success Metric |
+|----|---------|---------------|----------|----------------|
+| F-01 | [name] | US-01, US-02 | Must / Should / Could | [number or binary] |
+## Explicitly Out of Scope
+- [item] — reason: [why excluded this phase]
+## Constraints
+| Type | Constraint |
+|------|-----------|
+| Tech | [stack lock-in, version requirements] |
+| Timeline | [deadline if any] |
+| Compliance | [GDPR, HIPAA, accessibility level, etc.] |
+| Team | [size, skills] |
+## Phase Complete When
+- [ ] [measurable outcome — must map to a Success Metric above]
+```
+---
+## Step 4: Generate TDD
+Write `.buildflow/specs/TDD.md`:
+If `PATTERNS.md` exists: components and API shapes must follow existing conventions.
+```markdown
+# Technical Design Document
+**App:** [name]  **Phase:** [N]  **Date:** [today]  **Status:** DRAFT
+## Architecture Overview
+[2–3 sentences. What components exist, how they communicate, what changes this phase.]
+## Component Map
+| Component | Responsibility | Linked Feature | Interface Type |
+|-----------|---------------|----------------|----------------|
+| [name] | [single responsibility] | F-01 | REST / gRPC / event / function |
+## Data Model Changes
+| Entity | Fields Added/Changed | Reason |
+|--------|---------------------|--------|
+| [entity] | [field: type] | [F-01] |
+## API Contracts
+| Endpoint / Function | Method | Request Shape | Response Shape | Status Codes | Auth |
+|--------------------|--------|---------------|----------------|-------------|------|
+| /api/[path] | POST | `{ field: type }` | `{ field: type }` | 200, 400, 401, 500 | yes/no |
+## Error Response Format
+All errors follow:
+```json
+{ "error": { "code": "ERROR_CODE", "message": "human readable", "field": "optional" } }
+```
+## Technology Decisions
+| Decision | Choice | Rationale | Alternatives Rejected | Reversibility |
+|----------|--------|-----------|----------------------|--------------|
+| [area] | [choice] | [why — cite constraint or principle] | [what else, why not] | easy / hard |
+## Non-Functional Requirements
+| Type | Requirement | Measurement Method |
+|------|------------|-------------------|
+| Performance | [endpoint] responds in < [Nms] at [N] rps | Load test / APM |
+| Security | [specific requirement — not "secure"] | Audit / pen test |
+| Accessibility | WCAG [2.1 AA / 2.1 AAA] | Automated + manual |
+| Availability | [N]% uptime | Monitoring |
+| Data retention | [N] days | Automated policy |
+## Known Risks
+| Risk | Likelihood | Impact | Mitigation | Owner |
+|------|-----------|--------|-----------|-------|
+| [risk] | low/med/high | low/med/high | [concrete action] | dev / PM / ops |
+```
+---
+## Step 5: Generate Acceptance Criteria
+Write `.buildflow/specs/acceptance.md`:
+**Rules for every AC:**
+- Binary — pass or fail only, no partial credit
+- Testable — an automated test or explicit manual step can verify it
+- Specific — no vague words (see Critic pass below)
+- Covers both happy path AND at least one failure/edge case per feature
+```markdown
+# Acceptance Criteria
+**App:** [name]  **Phase:** [N]  **Date:** [today]
+## [Feature F-01: name]  →  US-01, US-02
+**Happy path:**
+- AC-001: Given [setup state], when [user/system action], then [exact observable outcome]
+- AC-002: Given [...], when [...], then [...]
+**Failure / edge cases (required — minimum 1 per feature):**
+- AC-003: Given [invalid/boundary/error input], when [...], then [specific error behavior]
+- AC-004: Given [concurrent/race/timeout scenario], when [...], then [...]
+## [Feature F-02: name]  →  US-03
+- AC-005: Given [...], when [...], then [...]
+- AC-006 (edge): Given [...], when [...], then [...]
+## Non-Functional
+- AC-NF-001: [endpoint] under [N] concurrent users responds within [Nms] (p95)
+- AC-NF-002: [page/flow] achieves WCAG 2.1 AA with zero automated violations
+- AC-NF-003: No secrets present in committed code (automated scan passes)
+```
+---
+## Step 6: Spec Critic Pass (automatic — runs before showing user)
+Before presenting specs to the user, self-review all three docs as a Spec Critic:
+### Vague Language Scan
+Search every AC for these banned words. Flag any found:
+`correctly` `properly` `works` `fast` `quickly` `slow` `good` `bad` `easy` `easily`
+`should` `appropriate` `reasonable` `nice` `clean` `simple` `obvious` `intuitive`
+For each flagged word: replace with a specific, measurable alternative or mark `[NEEDS SPECIFICITY]`.
+### Coverage Check
+- Every feature in PRD has at least 2 ACs (1 happy + 1 error/edge) — flag if not
+- Every user story US-XX is referenced in at least one AC's feature section — flag orphans
+- Every component in TDD maps to at least one PRD feature — flag orphans
+- Every NFR in TDD has a corresponding AC-NF — flag gaps
+### Testability Check
+For each AC, verify it can be answered as a pass/fail automated test or explicit manual step.
+Flag any AC that requires human judgment to evaluate.
+### Consistency Check
+- API contracts in TDD match any referenced endpoints in ACs
+- Data model changes in TDD are sufficient to support all AC outcomes
+- Technology decisions don't contradict any constraints in PRD
+### Critic Report
+Show the user:
+```
+Spec Critic Report
+──────────────────
+Vague language:   [N found — fixed N, flagged N]
+Coverage gaps:    [list any orphaned features/stories/components]
+Testability:      [list any ACs needing rework]
+Consistency:      [any TDD/PRD conflicts]
+Overall quality:  STRONG / NEEDS REVISION
+```
+---
+## Step 7: User Review Gate
+Show summary of all three specs + Critic Report. Ask:
+> "Specs ready for review. Critic score: [STRONG / NEEDS REVISION]
+> Approve to lock, or tell me what to revise."
+- **Approve:** lock specs. Update `light.md`:
+  ```yaml
+  spec_status: locked
+  spec_phase: [N]
+  ac_count: [N]
+  us_count: [N]
+  spec_critic: strong/revised
+  ```
+- **Revise:** apply changes to the named section, re-run Critic pass, repeat Step 7.
+Do not proceed until user approves.
+---
+## --fast Mode
+For single-feature additions:
+- Skip User Stories table (inline in feature row)
+- Skip Technology Decisions (use existing stack)
+- Generate 3 ACs minimum: 1 happy + 1 error + 1 NFR
+- Skip Critic coverage check (only vague language scan)
+- Token budget: ~8K
+---
+## Token Budget: ~20K (full) / ~8K (--fast)

package/templates/commands/start.md CHANGED Viewed

@@ -9,31 +9,61 @@ agent: strategist
 Begin your project. Works for both greenfield and existing codebases.
+## Context Packet (load only these)
+- `.buildflow/memory/light.md`
+- `.buildflow/you/preferences.md`
+Do NOT load: specs, phases, codebase files — this is vision only.
 ## Step 1: Load Memory
 Read `.buildflow/memory/light.md` and `.buildflow/you/preferences.md`.
+If `light.md` is over 3K tokens: prune it now (see pruning rules below).
 ## Step 2: Detect Mode
-**Greenfield (no src/ code):**
+**Greenfield (no src/ code yet):**
 Ask vision questions one at a time:
 1. What are you building?
 2. Who is it for?
 3. What problem does it solve?
 4. What's the simplest useful version?
-5. Timeline, team size, budget?
+5. Timeline, team size, constraints?
 6. Confidence in the idea (1-5)?
 **Existing codebase (src/ exists):**
 Check if `.buildflow/codebase/MAP.md` exists.
-- If NO: "Run /buildflow-onboard first to analyze your codebase."
-- If YES: Load MAP.md. Ask about goals for this work session.
+- If NO: "Run `/buildflow-onboard` first to analyze your codebase."
+- If YES: Load MAP.md summary only (not full file). Ask about goals for this session.
 ## Step 3: Save Vision
-Write to `.buildflow/core/vision.md` and update `.buildflow/core/state.md`.
+Write to `.buildflow/core/vision.md`.
+Initialize `light.md` with:
+```yaml
+app_name: [name]
+framework: [detected or stated]
+language: [detected or stated]
+phase: 0
+spec_status: none
+plan_status: none
+onboard_status: [yes/no]
+last_session: [today]
+```
 ## Step 4: Recommend Next Step
-- High confidence (4-5) + greenfield: "Run /buildflow-think"
-- Low confidence: "Let's research first with /buildflow-think"
-- Existing project, onboarded: "Ready! Try /buildflow-modify or /buildflow-think"
+| Situation | Next command |
+|-----------|-------------|
+| Greenfield, confidence 4–5 | `/buildflow-spec` — define what to build formally |
+| Greenfield, confidence 1–3 | `/buildflow-think [topic]` — research first |
+| Existing project, not onboarded | `/buildflow-onboard` — map codebase first |
+| Existing project, onboarded | `/buildflow-spec` or `/buildflow-modify` |
+| Emergency fix needed | `/buildflow-hotfix "description"` |
+## light.md Pruning Rules
+If `light.md` exceeds 3K tokens on session start:
+- Remove: completed phase task lists, wave details, build timestamps older than last phase
+- Archive these to the most recent `phases/[N]/retro.md`
+- Keep: app_name, framework, language, current_phase, spec_status, style_fingerprint, last 2 decisions
+- After pruning: report "Context pruned: light.md reduced from [X] → [Y] tokens"
 ## Token Budget: ~8K

package/templates/commands/think.md CHANGED Viewed

@@ -1,49 +1,197 @@
 ---
 name: buildflow-think
-description: Research and discuss with parallel Researcher agents
+description: Deep research, architecture review, build-vs-buy reasoning, and engineering cognition
 allowed-tools: Read, Write, WebSearch
 agents: strategist, researcher, synthesizer
 ---
 # /buildflow-think
-Deep research and discussion mode. Spawns parallel Researcher agents then synthesizes findings.
+Research, reasoning, and architecture review. Spawns parallel Researchers when web evidence is needed. Synthesizes conflicting information into a concrete recommendation with confidence score.
+Goes beyond research — includes engineering cognition modes for architecture review, build-vs-buy analysis, technical debt assessment, and complexity budgeting.
 ## Usage
-- `/buildflow-think` — open-ended discussion
-- `/buildflow-think <topic>` — research a specific topic
+- `/buildflow-think <topic>` — research a specific topic or technology
 - `/buildflow-think tech-stack` — compare technology options
-- `/buildflow-think risks` — identify project risks
-## Step 1: Load Context
-Read `.buildflow/memory/light.md`, `.buildflow/core/vision.md`.
-## Step 2: Clarify Research Goal
-Ask: "What do you want to explore or decide?"
-If already specified in the command, confirm understanding.
-## Step 3: Parallel Research (if web research needed)
-Spawn up to 3 Researcher agents in parallel, each with:
-- A specific research question
-- Instructions to find 2-3 sources
-- Trust score (1-5) for each source
-- Key findings in bullet points
-## Step 4: Synthesize
-Synthesizer agent combines findings:
-- Points of agreement across sources
-- Conflicting information (flag explicitly)
-- Recommendation with confidence (1-5)
-- Open questions remaining
-## Step 5: Discussion
-Ask confidence check: "How confident are you in this direction? (1-5)"
-- 1-2: Explore alternatives
-- 3: Identify what would increase confidence
-- 4-5: Move forward, suggest next step
-## Step 6: Save Insights
-Write to `.buildflow/research/[topic]-[date].md`
-Update `.buildflow/memory/light.md` with key decisions.
-## Token Budget: ~30K (parallel research)
+- `/buildflow-think risks` — surface technical and product risks
+- `/buildflow-think --arch` — architecture review of current codebase or proposed design
+- `/buildflow-think --build-vs-buy <capability>` — should we build it or use a library/service?
+- `/buildflow-think --debt` — assess current technical debt and prioritize
+- `/buildflow-think --complexity` — is the proposed plan too complex for the team/timeline?
+## Context Packet
+- `.buildflow/core/vision.md`
+- `.buildflow/memory/light.md` (app_name, framework, key decisions only)
+- `.buildflow/codebase/MAP.md` (for --arch, --debt, --complexity modes)
+- `.buildflow/specs/TDD.md` (for --arch mode, if exists)
+---
+## Standard Research Mode (default)
+### Step 1: Clarify Research Goal
+If topic is specified, confirm understanding in one sentence.
+If open-ended: "What are you trying to decide or understand?"
+### Step 2: Decompose into Research Questions
+Break the topic into 2–3 specific, answerable sub-questions.
+Assign one to each Researcher agent.
+### Step 3: Parallel Research
+Spawn up to 3 Researcher agents simultaneously, each:
+- Answering their specific sub-question
+- Finding 2–3 sources
+- Rating each source trust: 1 (blog opinion) → 5 (official docs / peer-reviewed)
+- Summarizing key findings in bullet points
+### Step 4: Synthesize
+Synthesizer combines all findings:
+- **Consensus:** what all sources agree on
+- **Conflicts:** where sources disagree — flag explicitly with each position
+- **Gaps:** what the research didn't answer
+- **Recommendation:** concrete, actionable, with confidence (1–5)
+- **Risks:** what could go wrong with the recommendation
+### Step 5: Confidence Gate
+If confidence < 3: "Low confidence. Here's what would increase it: [specific gaps to fill]"
+If confidence ≥ 4: suggest next step (spec / plan / build)
+### Step 6: Save
+Write `.buildflow/research/[topic]-[date].md`
+Update `light.md` key decisions if a choice was made.
+---
+## Architecture Review Mode (`--arch`)
+Triggered when: designing a new system, evaluating a proposed approach, or onboarding to a codebase.
+### Step 1: Load Architecture Context
+Read `MAP.md`, `TDD.md` (if exists), `PATTERNS.md`.
+If greenfield: work from vision + proposed TDD.
+### Step 2: Structural Analysis
+Evaluate:
+- **Separation of concerns** — do modules have single, clear responsibilities?
+- **Coupling** — are modules tightly bound in ways that make changes expensive?
+- **Cohesion** — does each module contain related things?
+- **Boundaries** — are module boundaries enforced or leaky?
+- **Scalability** — will this design hold under 10× the current load/data?
+### Step 3: Pattern Consistency
+Does the proposed design follow existing patterns in the codebase?
+If introducing new patterns: is there a good reason, or is it accidental inconsistency?
+### Step 4: Failure Mode Analysis
+For each major component, ask: "What happens when this fails?"
+- Does the failure cascade?
+- Is there a recovery path?
+- Will the user see a clear error or silent corruption?
+### Step 5: Engineering Smell Detection
+Flag any of these if present:
+- **God object** — one class/module doing too many things
+- **Shotgun surgery** — a single logical change requires edits across many files
+- **Primitive obsession** — using raw strings/numbers where domain types would be clearer
+- **Anemic model** — data objects with no behavior, all logic in services
+- **Circular dependency** — A imports B imports A
+- **Distributed monolith** — microservices that can't deploy independently
+### Step 6: Architecture Report
+```
+Architecture Review
+───────────────────
+Strengths:       [what's well-designed]
+Concerns:        [issues with severity: HIGH / MEDIUM / LOW]
+Smells detected: [list or NONE]
+Failure modes:   [unhandled scenarios]
+Recommendation:  [concrete changes or "proceed as-is"]
+Confidence:      [1–5]
+```
+---
+## Build vs Buy Mode (`--build-vs-buy <capability>`)
+Triggered when evaluating whether to implement a capability in-house or use an external library/service.
+### Step 1: Define the Capability
+Exact scope: what does this need to do? What are the boundaries?
+### Step 2: Research Options
+Parallel Researchers investigate:
+- **Build** — what would implementation cost? What's the maintenance burden?
+- **Buy (OSS)** — what libraries exist? License, maintenance status, community health?
+- **Buy (SaaS)** — what services exist? Cost, reliability, vendor lock-in risk?
+### Step 3: Evaluation Matrix
+| Factor | Build | OSS Library | SaaS |
+|--------|-------|-------------|------|
+| Time to working | [est] | [est] | [est] |
+| Ongoing maintenance | high | low–med | none |
+| Customization | full | partial | limited |
+| Cost | dev time | free (usually) | $/mo |
+| Vendor lock-in | none | low | HIGH |
+| Compliance fit | full control | depends | verify |
+| Team expertise needed | yes | some | low |
+### Step 4: Recommendation
+Given project constraints (team size, timeline, compliance from PRD):
+- **Recommend:** [build / OSS / SaaS]
+- **Reason:** [top 2 factors that drove the decision]
+- **Risk:** [biggest downside of this choice]
+- **Confidence:** [1–5]
+---
+## Technical Debt Mode (`--debt`)
+### Step 1: Load Hotspots
+Read `HOTSPOTS.md`. These are the known high-risk files.
+### Step 2: Debt Classification
+For each hotspot and any other known issues:
+| Item | Type | Impact | Cost to Fix | ROI |
+|------|------|--------|------------|-----|
+| [issue] | CODE / ARCH / TEST / INFRA | HIGH/MED/LOW | S/M/L/XL | high/med/low |
+Debt types:
+- **CODE** — complexity, duplication, poor naming
+- **ARCH** — wrong abstraction, bad module boundary, circular dep
+- **TEST** — missing or shallow test coverage on critical paths
+- **INFRA** — outdated deps, missing CI, manual steps that should be automated
+### Step 3: Priority Recommendation
+Sort by ROI (impact of fixing ÷ cost to fix). Top 3 items to address next.
+---
+## Complexity Budget Mode (`--complexity`)
+Used before a plan is executed to ask: "Is this too much for the team/timeline?"
+### Step 1: Load Plan
+Read `phases/[N]/PLAN.md`. Sum effort estimates.
+### Step 2: Complexity Assessment
+```
+Complexity Budget Check
+───────────────────────
+Total estimated effort: [sum]
+XL tasks: [N] — [list them]  ← each XL is a risk
+External dependencies: [N]   ← each is a coordination cost
+New patterns introduced: [N] ← each needs learning time
+Files touching hotspots: [N] ← each is higher risk
+Verdict: FEASIBLE / RISKY / OVER-SCOPED
+```
+- **FEASIBLE** — proceed
+- **RISKY** — flag XL tasks for `/buildflow-think` before building, consider splitting
+- **OVER-SCOPED** — recommend cutting scope, suggest which features to defer
+---
+## Token Budget: ~30K (standard) / ~35K (--arch or --build-vs-buy) / ~20K (--debt or --complexity)