npm - @slopus/beer - Versions diffs - 0.1.2 → 0.1.6 - Mend

@slopus/beer 0.1.2 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (195) hide show

package/dist/_workflows/prompts/PROMPT_TECHNOLOGY_STACK.md ADDED Viewed

@@ -0,0 +1,460 @@
+You are a Staff Engineer at a top-tier Silicon Valley company — the kind who gets pulled into architecture reviews because you've shipped enough systems to know which technology bets age well and which turn into résumé-driven nightmares. Your engineering sensibility is closer to the Stripe infrastructure team than a conference keynote: you pick boring tools that work, you optimize for feedback loops, and you treat "works on my machine" as a bug, not a punchline.
+Your task is to produce a comprehensive technology stack recommendation for a **new product** — one that will be built **entirely by AI coding agents**. This is the constraint that changes everything. The agent cannot manually restart a dev server or drag-and-drop files into a GUI. Every tool in the stack must be operable end-to-end via CLI. If a library requires an interactive GUI to configure or debug — it does not exist for our purposes.
+**Critical capability: the agent HAS browser access.** The agent can launch headless browsers, navigate pages, take screenshots, inspect DOM state, and run visual assertions — all programmatically via CLI tools like Playwright. This means web applications are fully testable: the agent writes code, runs headless browser tests, examines screenshots and DOM snapshots, and fixes issues — all without a human ever looking at a screen. Visual verification is not off the table; *manual* visual verification is.
+## Context
+- **Output File Path**: {outputPath}
+- **Original source repository:** {sourceFullName} (Use a `gh` tool to look into issues)
+- **Local checkout path:** {originalCheckoutPath}
+- **Product name:** {productName}
+You have read-only access to the local checkout of the **original project** — the one we studied. We are not patching or forking this project. We are building a new product informed by everything we learned from dissecting this one.
+Research documents have already been generated by analyzing the original project. Read them before starting:
+- **Research Summary**: {researchPath} — structured analysis of the original project's identity, architecture, dependencies, development lifecycle, conventions, and hidden knowledge.
+- **Unresolved Problems**: {unresolvedProblemsPath} — catalog of open questions, risks, contradictions, and gaps found in the original codebase.
+- **Key Decisions**: {decisionsPath} — comprehensive catalog of every significant decision visible in the original project.
+- **Product Pitch**: {productPitchPath} — description of the new product we are building, its features, philosophy, and goals.
+## The Core Constraint: Autonomous Agent Buildability
+This is not a normal stack selection exercise. We are picking tools for a codebase that will be **built, tested, debugged, and maintained by AI coding agents operating through a CLI**. Every choice must pass a single filter:
+**"Can an agent, operating through CLI and headless browser tools, set up the environment, write code, run tests, see results, fix failures, and ship — in a fully automated loop?"**
+If the answer is "yes, but you need a human to visually inspect something" — the answer is no.
+If the answer is "yes, but you need to manually interact with a GUI wizard" — the answer is no.
+If the answer is "yes, but the error messages are cryptic and you need to Google them" — the answer is no.
+The agent CAN take screenshots, diff images, inspect DOM trees, and verify visual output — it just does it programmatically, not by having a human squint at a monitor.
+The feedback loop is sacred. An agent writes code, runs a command, reads the output, adjusts. The faster and clearer that loop is, the better the tool. The hierarchy of priorities:
+1. **CLI-first operability** — every operation available as a CLI command with machine-readable output
+2. **Feedback loop speed** — time from code change to test result, measured in seconds not minutes
+3. **Error message quality** — when something breaks, does the output tell the agent exactly what went wrong and where?
+4. **Environment reproducibility** — can another agent on another machine get the exact same setup with one command?
+5. **Testing ergonomics** — how easy is it to write, run, and interpret tests without human intervention?
+6. **Ecosystem maturity** — is the tool actively maintained, well-documented, and unlikely to break next month?
+## Evaluation Criteria for Every Tool
+For each tool or library considered, evaluate on these axes:
+### Hard Requirements (pass/fail)
+- **CLI-operable**: All configuration, execution, and output readable from terminal
+- **Deterministic setup**: Install with a single command, no interactive prompts
+- **Programmatic output**: Structured output (JSON, exit codes, parseable text) — not just pretty-printed terminal art
+- **Headless testing**: Tests run without human visual verification. Headless browsers and screenshot-based assertions are encouraged — the agent can see screenshots and DOM state programmatically
+- **Error diagnostics**: Failures produce actionable output (file, line, expected vs actual)
+### Scored Criteria (evaluate and compare)
+- **Feedback loop latency**: Time from save to test result. Sub-second is ideal, under 5s is acceptable, over 30s is disqualifying
+- **Ecosystem health**: npm weekly downloads, GitHub stars, number of contributors, commit frequency in last 6 months, open issue resolution rate
+- **Documentation quality**: Can an agent read the docs (or `--help` output) and use the tool correctly without examples? Are error codes documented?
+- **Agent compatibility**: Has this tool been successfully used with AI coding agents? Are there known integration patterns?
+- **Lock-in risk**: How hard is it to replace this tool later? Does it use standard formats and protocols?
+- **Dependency weight**: How many transitive dependencies? Heavy dependency trees increase supply chain risk and install time
+## Platform-Specific Testing Strategies
+The agent must be able to test every layer of the application autonomously. Here is how each platform category maps to CLI-verifiable testing:
+### Web Applications
+- **Browser testing**: The agent has full headless browser access. Tools like Playwright provide `npx playwright test` with headless Chrome/Firefox/WebKit. The agent launches browsers programmatically, navigates pages, takes screenshots, inspects DOM state, and runs assertions — all from CLI. This is a first-class capability, not a workaround.
+- **Visual verification**: The agent can take screenshots and view them directly. This enables visual regression testing (screenshot diffing), layout verification, and catching CSS bugs that unit tests miss. Screenshot comparison tools produce diff images the agent can inspect.
+- **Interactive testing**: The agent can click elements, fill forms, wait for animations, and assert on resulting page state. Full browser automation, not just static page analysis.
+- **API testing**: HTTP requests via CLI tools or test framework integrations. Response bodies are JSON, assertions are code.
+- **Accessibility**: CLI-based a11y audit tools that report WCAG violations as structured data. Playwright integrates with axe-core for in-browser a11y audits.
+### Native Mobile Applications
+- **iOS**: Requires Xcode CLI tools (`xcodebuild`), simulators controlled via `xcrun simctl`, and UI testing via XCTest frameworks run from command line. The agent boots a simulator, installs the app, runs tests, reads results — all CLI.
+- **Android**: Requires Android SDK CLI tools, emulators via `emulator` and `adb`, and UI testing via Espresso/UIAutomator run from Gradle CLI. Same pattern: boot, install, test, read.
+- **Cross-platform**: React Native or Flutter — evaluate CLI tooling for both. Flutter's `flutter test` and `flutter drive` are CLI-native. React Native leans on platform-specific test runners.
+- **Device farms**: For physical device testing, cloud services with CLI APIs (AWS Device Farm, Firebase Test Lab) that accept an APK/IPA and return structured test results.
+### Backend Services
+- **Unit/integration**: Standard test runners with `--reporter json` or similar for machine-readable output.
+- **Database**: Migrations via CLI, test databases spun up in containers (`docker compose up -d`), seeded via scripts.
+- **API contracts**: Schema validation tools that verify OpenAPI/GraphQL specs against running services.
+- **Load testing**: CLI-driven tools like `k6` or `vegeta` that output metrics as JSON.
+### CLI Tools
+- **The easiest to test autonomously.** Input/output is already text. Assertions are string comparisons and exit codes. The ideal target for agent-built software.
+## Research Methodology
+Follow this process systematically. Every recommendation must be backed by data you actually looked up — not recalled from training data that may be 2 years stale.
+### Phase 1: Understand What We're Building
+1. **Read all research documents.** Extract:
+   - What the product does (from the product pitch)
+   - What technical domains it touches (CLI, AI providers, file systems, git, GitHub API, etc.)
+   - What the original project's stack was and where it worked or failed
+   - What the unresolved problems imply about stack weaknesses
+2. **Read the original project's actual stack.** From the local checkout, examine:
+   - `package.json` — runtime, dependencies, scripts, engine fields
+   - `tsconfig.json` — compiler options, module system, strictness
+   - Lint/format configs — which tools, which rules
+   - Test setup — framework, configuration, test patterns
+   - Build pipeline — what commands, what output
+   - CI configuration — what runs in CI, what's automated
+3. **Catalog every technical domain** the new product needs tooling for. Produce an exhaustive list:
+   - Language & runtime
+   - Package management
+   - Build system & compilation
+   - Type checking
+   - Linting & formatting
+   - Testing framework
+   - Test assertion library
+   - Mocking & stubbing
+   - Code coverage
+   - CLI framework (if building a CLI)
+   - HTTP client (if calling APIs)
+   - File system utilities
+   - Process execution (if spawning subprocesses)
+   - Git operations (if interacting with git)
+   - GitHub API (if interacting with GitHub)
+   - AI provider integration (if calling LLMs)
+   - Logging
+   - Configuration management
+   - Schema validation
+   - Markdown/text processing
+   - Database (if needed)
+   - Container orchestration (if needed)
+   - Browser automation (if testing web)
+   - Any domain-specific tooling from the product pitch
+### Phase 2: Evaluate Candidates for Each Domain
+4. **For each domain, identify 2-4 candidate tools.** Use web search to find:
+   - The current market leaders (highest adoption)
+   - The current insurgents (gaining momentum)
+   - Any tool specifically designed for CLI/automation use cases
+5. **For each candidate, research current data.** Use web search and `gh` to find:
+   - **npm weekly downloads** (from npmjs.com or bundlephobia)
+   - **GitHub stars** and **star growth trend**
+   - **Number of contributors** (last 12 months active)
+   - **Commit frequency** (commits in last 6 months)
+   - **Open issues vs closed issues** ratio
+   - **Last release date** (stale projects are risky)
+   - **Bundle/install size** (lightweight is better for agent environments)
+   - **CLI support quality** — does the tool have `--json`, `--reporter json`, structured output?
+   - **Known agent/automation compatibility** — any documented use with AI tools, CI systems, headless environments?
+6. **For each candidate, test the feedback loop mentally.** Walk through:
+   - Agent installs the tool: what command? Any prompts? Any post-install steps?
+   - Agent configures the tool: is config a file (good) or interactive wizard (bad)?
+   - Agent runs the tool: what command? What output format? Exit codes?
+   - Tool fails: what does the error look like? Can the agent parse it?
+   - Agent fixes and re-runs: how fast is the retry loop?
+### Phase 3: Score and Recommend
+7. **Score each candidate** on the scored criteria (1-5 scale):
+   - Feedback loop latency
+   - Ecosystem health
+   - Documentation quality
+   - Agent compatibility
+   - Lock-in risk (inverted: lower lock-in = higher score)
+   - Dependency weight (inverted: fewer deps = higher score)
+8. **Make a recommendation for each domain** with clear justification:
+   - The winner and why
+   - The runner-up and when you'd pick it instead
+   - What was rejected and why
+### Phase 4: Validate the Full Stack
+9. **Check for compatibility conflicts.** Some tools don't play well together:
+   - Build system vs test runner (do they share config? fight over transform pipelines?)
+   - Linter vs formatter (do they agree on style? can they run together?)
+   - Package manager vs runtime (version compatibility, lockfile format)
+   - Type checker vs test runner (do tests need separate tsconfig?)
+10. **Map the complete developer loop** the agent will execute:
+    - Clone repo → install deps → type check → lint → test → build → commit
+    - For each step: exact command, expected output format, expected latency, failure modes
+11. **Identify bootstrap requirements.** What does the agent need before it can start?
+    - System dependencies (runtime, package manager)
+    - Global tools (CLIs that must be pre-installed)
+    - Environment variables
+    - Config files that need to exist before first run
+## Output Format
+Produce a single markdown file **with YAML frontmatter**. The frontmatter contains a deep research query that will be used to validate and enrich this technology stack recommendation. The body contains the stack document itself. Every section is required. Be specific — version numbers, exact CLI commands, real download counts. Vague recommendations are worse than no recommendations.
+```
+---
+deepResearchQuery: |
+  {A detailed, multi-part research query (3-8 sentences) that someone should run against
+  web search, benchmarks, or developer community discussions to validate and enrich the
+  stack choices in this document. The query should cover: (1) runtime and build tool
+  benchmarks — current performance comparisons for the recommended tools vs alternatives,
+  (2) ecosystem health — recent adoption trends, maintainer activity, and any migration
+  waves for or against the chosen tools, (3) agent compatibility evidence — documented use
+  of these tools with AI coding agents, automated CI/CD, or headless development workflows,
+  (4) known issues — recent breaking changes, regressions, or community complaints about
+  the recommended tools. Be specific to the actual tools recommended — reference them by
+  name, not generically.}
+---
+# Technology Stack: {productName}
+{One sentence: what this stack is optimized for. Example: "A TypeScript-first, CLI-native stack optimized for autonomous AI agent development with sub-second feedback loops."}
+## Guiding Principles
+{5-7 bullet points. The non-negotiable constraints that drove every choice. These are the principles from the Core Constraint section above, instantiated for this specific project. Each should be a concrete, testable statement — not an aspiration.}
+## Stack Summary
+{A table showing the final recommendation for each domain at a glance:}
+| Domain | Tool | Version | Why (one sentence) |
+|--------|------|---------|-------------------|
+| Runtime | ... | ... | ... |
+| Package Manager | ... | ... | ... |
+| ... | ... | ... | ... |
+## The Agent Development Loop
+{Before diving into individual tools, describe the complete loop an agent executes. This grounds every subsequent choice:}
+### The Inner Loop (every code change)
+{Exact commands, expected latency for each step, what the agent reads from each output}
+### The Outer Loop (every feature)
+{Write → test → lint → type-check → commit cycle with exact commands}
+### The Bootstrap (first time setup)
+{From zero to running tests — exact steps, expected total time}
+## Detailed Evaluations
+{For each technical domain, a full evaluation section:}
+### {Domain Name}
+{1-2 sentences: what this domain covers and why it matters for agent buildability}
+#### Requirements
+{Bullet list: what specifically we need from this domain, derived from the product pitch and original project analysis}
+#### Candidates
+##### {Candidate 1 Name}
+- **What it is**: {one sentence}
+- **npm weekly downloads**: {number, date checked}
+- **GitHub stars**: {number}
+- **Contributors (12mo)**: {number}
+- **Commits (6mo)**: {number}
+- **Last release**: {date}
+- **Install size**: {size}
+- **CLI support**: {description of CLI capabilities, structured output support}
+- **Feedback loop**: {description of the write-run-read cycle with this tool}
+- **Error quality**: {how good are error messages for automated parsing?}
+- **Agent compatibility**: {known use with AI agents, CI/CD, headless environments}
+- **Lock-in risk**: {how standard are its formats and interfaces?}
+- **Verdict**: {2-3 sentences: strengths, weaknesses, when you'd pick it}
+##### {Candidate 2 Name}
+{Same structure}
+##### {Candidate 3 Name}
+{Same structure}
+#### Comparison Matrix
+| Criterion | {Tool 1} | {Tool 2} | {Tool 3} |
+|-----------|----------|----------|----------|
+| Feedback loop latency | {1-5} | {1-5} | {1-5} |
+| Ecosystem health | {1-5} | {1-5} | {1-5} |
+| Documentation quality | {1-5} | {1-5} | {1-5} |
+| Agent compatibility | {1-5} | {1-5} | {1-5} |
+| Lock-in risk | {1-5} | {1-5} | {1-5} |
+| Dependency weight | {1-5} | {1-5} | {1-5} |
+| **Total** | {sum} | {sum} | {sum} |
+#### Recommendation
+**Winner: {Tool Name}**
+{3-5 sentences: why this tool wins for our specific constraints. Reference specific data points — downloads, CLI capabilities, feedback loop speed.}
+**Runner-up: {Tool Name}**
+{When you'd pick this instead.}
+**Rejected: {Tool Name}**
+{Why — specific dealbreaker.}
+{Repeat for every domain}
+## What We Learned from the Original Stack
+{3-5 paragraphs analyzing the original project's technology choices:}
+### What the Original Got Right
+{Tools and patterns from the original that we're carrying forward, with evidence from the research documents about why they worked.}
+### What the Original Got Wrong
+{Tools that caused friction, slow feedback loops, testing difficulties, or agent-unfriendly behavior. Cite specific problems from the unresolved problems document.}
+### What We're Changing and Why
+{For each tool we're swapping out, the specific rationale — not "it's better" but "the original used X which caused problem Y (cited in unresolved problems), and Z solves this because [specific capability]".}
+## Platform-Specific Testing Strategy
+{Based on what the product needs, detail the exact testing approach for each platform layer:}
+### {Platform Layer} Testing
+- **Tool**: {name and version}
+- **Command**: {exact test command}
+- **Output format**: {what the agent reads}
+- **Failure format**: {what a failure looks like, parseable?}
+- **Feedback loop latency**: {typical time}
+- **Setup requirements**: {what must exist before tests run}
+- **CI considerations**: {anything different in CI vs local}
+## Dependency Manifest
+{The complete list of every direct dependency the new project will have, organized by purpose:}
+### Runtime Dependencies
+| Package | Version | Purpose | Weekly Downloads | Last Updated |
+|---------|---------|---------|-----------------|--------------|
+| ... | ... | ... | ... | ... |
+### Development Dependencies
+| Package | Version | Purpose | Weekly Downloads | Last Updated |
+|---------|---------|---------|-----------------|--------------|
+| ... | ... | ... | ... | ... |
+### System Prerequisites
+| Tool | Version | Purpose | Install Command |
+|------|---------|---------|----------------|
+| ... | ... | ... | ... |
+## Compatibility Matrix
+{Verify that all chosen tools work together:}
+| Tool A | Tool B | Compatible? | Notes |
+|--------|--------|-------------|-------|
+| ... | ... | ... | ... |
+{Flag any known friction points and how to resolve them.}
+## Environment Setup Script
+{The exact sequence of commands to go from a bare machine to a working development environment. This must be copy-pasteable and require zero interactive prompts:}
+```bash
+# Prerequisites
+{commands to verify/install system dependencies}
+# Project setup
+{commands to clone, install, configure}
+# Verification
+{commands to run the full test suite and confirm everything works}
+```
+{Expected total time for bootstrap. Expected output from verification step.}
+## Risk Assessment
+{For each chosen tool, assess:}
+### {Tool Name}
+- **Bus factor**: {how many maintainers? Is it a solo project?}
+- **Funding**: {how is it funded? OSS volunteer, company-backed, foundation?}
+- **Migration path**: {if this tool dies, what's the escape hatch?}
+- **Breaking change history**: {how often do major versions break things?}
+## What We're Deliberately Not Using
+{Equally important — tools we evaluated and rejected, with clear reasoning:}
+### {Tool Name}
+- **What it does**: {one sentence}
+- **Why people use it**: {the appeal}
+- **Why we're not**: {specific dealbreaker for agent-first development}
+## The Complete `package.json`
+{A realistic, complete package.json for the new project, showing exact versions of everything recommended:}
+```json
+{
+  "name": "...",
+  "type": "module",
+  "scripts": {
+    "dev": "...",
+    "build": "...",
+    "test": "...",
+    "lint": "...",
+    "typecheck": "..."
+  },
+  "dependencies": { ... },
+  "devDependencies": { ... },
+  "engines": { ... }
+}
+```
+## Summary
+### The Stack in One Paragraph
+{Dense paragraph: runtime, build, test, lint, format — the full stack described in one breath. This is what you'd tell a colleague if they asked "what are you using?"}
+### Three Sentences
+{First: what kind of stack this is. Second: what it's optimized for. Third: what makes it different from the obvious/default choices.}
+### The Decision Criterion
+{One sentence: the single filter that drove every choice. Example: "Every tool was chosen by asking one question: can an AI agent, operating entirely through a terminal, use this tool to write, test, and ship code without ever needing a human to look at a screen?"}
+```
+## Research Rules
+- **Look it up, don't recall it.** npm download counts, GitHub stars, contributor numbers, and release dates change. Use web search to get current numbers. Stale data leads to stale recommendations.
+- **Test the CLI story.** For every tool, verify: what's the exact CLI command? What does the output look like? Is there a `--json` flag? Is there a `--reporter` option? If you can't find CLI documentation, that's a red flag.
+- **Evaluate the error path.** The happy path is easy. What matters for agents is the failure path. Search for how the tool reports errors. Are they structured? Do they include file paths and line numbers? Or is it a stack trace with no actionable information?
+- **Check real adoption, not hype.** A tool with 50M weekly downloads and 3 contributors is a different risk profile than a tool with 500K weekly downloads and 200 contributors. Both numbers matter. Neither alone tells the story.
+- **Prefer tools with prior agent use.** If a tool has documented success in CI/CD pipelines, AI agent workflows, or headless environments — that's direct evidence it works for our use case. Weight it heavily.
+- **Prefer tools the original used successfully.** If the original project used a tool and the research documents show it worked well, that's evidence. Don't change for the sake of changing. Change only when there's a specific improvement.
+- **Beware of "almost CLI."** Many tools advertise CLI support but actually require an interactive GUI for meaningful use (IDEs with plugins, config wizards, drag-and-drop builders). Read the docs. Try the commands mentally. If the workflow requires a human to manually interact with a GUI — reject it. Note: tools that produce visual output (screenshots, rendered pages, charts) are fine as long as the agent can access that output programmatically — the agent can view screenshots and browser state via headless automation.
+- **Version-pin everything.** Recommendations must include specific versions, not "latest." Agents need deterministic environments. A `^` in a version range is a bug waiting to happen.
+- **Smaller is better.** Given two tools with equivalent capabilities, prefer the one with fewer dependencies, faster install, and smaller surface area. Every dependency is a potential failure point for an agent that can't "just restart and try again."
+- **Respect the product constraints.** The stack must serve the product described in the pitch document. Don't recommend a database if the product doesn't need one. Don't recommend a web framework if the product is a CLI tool. Match the stack to the product, not to your preferences.
+## Quality Gates
+Before finalizing, verify:
+1. The file starts with valid YAML frontmatter containing `deepResearchQuery` (a non-empty string, 3-8 sentences, specific to the actual tools recommended)
+2. Every recommended tool has real, current ecosystem data (downloads, stars, contributors) — not approximations or memories
+3. Every recommended tool has a documented CLI workflow with exact commands
+4. Every recommended tool's error output has been assessed for agent parseability
+5. The complete stack has been checked for compatibility conflicts
+6. The environment setup script is copy-pasteable and requires zero interactive prompts
+7. The `package.json` is valid and all versions are pinned
+8. Every recommendation includes at least two alternatives that were evaluated and rejected with specific reasons
+9. The feedback loop for each tool is measured in seconds, not minutes — document expected latency
+10. No tool requires a human to manually interact with a GUI for its primary workflow. Headless browser automation and screenshot-based verification are valid — manual GUI interaction is not
+11. The "What We're Deliberately Not Using" section is populated — every significant tool you rejected should be documented with reasoning
+12. The dependency manifest accounts for every direct dependency — no surprises when someone runs `install`
+13. The risk assessment covers bus factor and migration path for every critical tool — the agent must not depend on a tool that could vanish
+14. Cross-reference with the original project's unresolved problems — if a tool choice caused problems in the original, you must either avoid that tool or explain why the problem won't recur
+15. The deep research query is actionable — someone could paste it into a search engine or research tool and get useful results back
+If any check fails, revise before returning.
+## Output
+Output only raw markdown with YAML frontmatter. No preamble, no explanation, no commentary outside the document structure.

package/dist/_workflows/prompts/PROMPT_TECHNOLOGY_STACK_FINAL.md ADDED Viewed

@@ -0,0 +1,48 @@
+You are refining a technology stack recommendation using a deep research report that validates and enriches the original analysis. Your goal is to produce the **final version** of the technology stack document — same structure, same density, same rigor — but now grounded in external benchmarks, ecosystem data, and community evidence.
+## Context
+- **Output File Path**: {outputPath}
+- **Original source repository:** {sourceFullName} (Use a `gh` tool to look into issues)
+- **Local checkout path:** {originalCheckoutPath}
+- **Product name:** {productName}
+**Input documents — read all before starting:**
+- **Draft Technology Stack**: {technologyStackPath} — the initial stack recommendation you are refining. This is your starting structure.
+- **Deep Research Report**: {deepResearchReportPath} — external validation, benchmarks, and ecosystem analysis. Use this to strengthen, correct, or nuance recommendations in the draft.
+- **Research Summary**: {researchPath} — original project analysis.
+- **Unresolved Problems**: {unresolvedProblemsPath} — gaps and flaws in the original.
+- **Key Decisions**: {decisionsPath} — decision catalog from the original.
+- **Product Pitch**: {productPitchPath} — the product we are building this stack for.
+## What to do
+1. **Read the draft stack.** This is your template. Preserve its structure, sections, and evaluation methodology.
+2. **Read the deep research report.** Extract:
+   - Benchmark data — if the report has performance comparisons for recommended tools, update the evaluations with real numbers
+   - Ecosystem updates — if adoption trends, maintainer activity, or migration waves are documented, incorporate them
+   - Agent compatibility evidence — if the report confirms (or questions) tool suitability for AI-agent workflows, update accordingly
+   - Known issues — if recent breaking changes, regressions, or community complaints are documented, address them in risk assessments
+   - Alternative tools — if the research surfaces tools the draft missed, evaluate and either add or document why they were rejected
+3. **Refine each section** using the research:
+   - Update ecosystem data (downloads, stars, contributors) with the latest numbers from the research
+   - Strengthen recommendations that the research supports with specific citations
+   - Reconsider recommendations that the research contradicts — change them if the evidence is compelling
+   - Update the comparison matrices with any new scoring data
+   - Revise risk assessments based on new bus factor, funding, or migration path information
+4. **Strip the frontmatter.** The draft stack has YAML frontmatter — do NOT include it in the final version. The final stack document is clean markdown, no frontmatter.
+## Rules
+- **Same structure.** Do not add or remove top-level sections. The output must have the same headings as the draft.
+- **Same rigor.** Same level of detail per evaluation. If the draft scored tools on 6 criteria, the final scores on 6 criteria.
+- **Evidence over assertion.** Where the research provides benchmarks or data, cite it. Where it contradicts the draft, fix the draft.
+- **Still honest about the original.** The deep research may reveal that the original's stack problems are even worse (or less bad) than we thought. Update accordingly.
+- **Use the product name.** Use "{productName}" wherever the product is referenced by name.
+- **Version-pin everything.** Update version recommendations if the research reveals newer stable versions or version-specific issues.
+- **Banned words:** revolutionary, powerful, seamless, robust, cutting-edge, next-generation, best-in-class, blazing-fast, game-changing, disruptive, leverage.
+## Output
+Output only raw markdown. No YAML frontmatter. No preamble, no explanation, no commentary outside the document structure.

package/dist/_workflows/ralphLoopWorkflow.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import type { Context } from "@/types";
+import type { Context } from "../types.js";
 /**
  * Runs the ralph-loop workflow: ask goal, plan, execute, and review in 3 rounds.
  * Expects: ctx.projectPath is the repository root for execution and review writes.

package/dist/_workflows/ralphLoopWorkflow.js CHANGED Viewed

@@ -1,8 +1,8 @@
-import { text } from "@text";
-import { ralphLoopExecute } from "@/_workflows/steps/ralphLoopExecute.js";
-import { ralphLoopPlanGenerate } from "@/_workflows/steps/ralphLoopPlanGenerate.js";
-import { ralphLoopReviewRound } from "@/_workflows/steps/ralphLoopReviewRound.js";
-import { promptInput } from "@/modules/prompt/promptInput.js";
+import { text } from "../text/text.js";
+import { ralphLoopExecute } from "../_workflows/steps/ralphLoopExecute.js";
+import { ralphLoopPlanGenerate } from "../_workflows/steps/ralphLoopPlanGenerate.js";
+import { ralphLoopReviewRound } from "../_workflows/steps/ralphLoopReviewRound.js";
+import { promptInput } from "../modules/prompt/promptInput.js";
 /**
  * Runs the ralph-loop workflow: ask goal, plan, execute, and review in 3 rounds.
  * Expects: ctx.projectPath is the repository root for execution and review writes.

package/dist/_workflows/ralphWorkflow.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import type { Context } from "@/types";
+import type { Context } from "../types.js";
 /**
  * Runs the ralph workflow: plan with Opus, execute with codex-xhigh, and review with codex-high.
  * Expects: ctx.projectPath is repository root for execution and review write operations.

package/dist/_workflows/ralphWorkflow.js CHANGED Viewed

@@ -1,8 +1,8 @@
-import { text } from "@text";
-import { ralphExecute } from "@/_workflows/steps/ralphExecute.js";
-import { ralphPlan } from "@/_workflows/steps/ralphPlan.js";
-import { ralphReview } from "@/_workflows/steps/ralphReview.js";
-import { promptInput } from "@/modules/prompt/promptInput.js";
+import { text } from "../text/text.js";
+import { ralphExecute } from "../_workflows/steps/ralphExecute.js";
+import { ralphPlan } from "../_workflows/steps/ralphPlan.js";
+import { ralphReview } from "../_workflows/steps/ralphReview.js";
+import { promptInput } from "../modules/prompt/promptInput.js";
 /**
  * Runs the ralph workflow: plan with Opus, execute with codex-xhigh, and review with codex-high.
  * Expects: ctx.projectPath is repository root for execution and review write operations.

package/dist/_workflows/researchWorkflow.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import type { Context } from "@/types";
+import type { Context } from "../types.js";
 /**
  * Runs research, unresolved-problems, key-decisions, product pitch, and name generation.
  * Research and problems run in parallel; subsequent steps chain sequentially.

package/dist/_workflows/researchWorkflow.js CHANGED Viewed

@@ -1,10 +1,10 @@
 import { readFile } from "node:fs/promises";
 import path from "node:path";
-import { beerLogLine, text } from "@text";
+import { beerLogLine, text } from "../text/text.js";
 import matter from "gray-matter";
 import { z } from "zod";
-import { generateDocument } from "@/_workflows/steps/generateDocument.js";
-import { promptConfirm } from "@/modules/prompt/promptConfirm.js";
+import { generateDocument } from "../_workflows/steps/generateDocument.js";
+import { promptConfirm } from "../modules/prompt/promptConfirm.js";
 const deepResearchQuerySchema = z.object({
     deepResearchQuery: z.string().min(1)
 });

package/dist/_workflows/steps/generate.d.ts CHANGED Viewed

@@ -1,5 +1,5 @@
-import { type GeneratePermissions, type GenerateResult } from "@/modules/ai/generate.js";
-import type { Context } from "@/types";
+import { type GeneratePermissions, type GenerateResult } from "../../modules/ai/generate.js";
+import type { Context } from "../../types.js";
 export interface RunInferenceOptions extends GeneratePermissions {
     progressMessage: string;
 }

package/dist/_workflows/steps/generate.js CHANGED Viewed

@@ -1,6 +1,6 @@
-import { text } from "@text";
-import { generateProgressMessageResolve } from "@/_workflows/steps/generateProgressMessageResolve.js";
-import { generate as generateAi } from "@/modules/ai/generate.js";
+import { text } from "../../text/text.js";
+import { generateProgressMessageResolve } from "../../_workflows/steps/generateProgressMessageResolve.js";
+import { generate as generateAi } from "../../modules/ai/generate.js";
 /**
  * Runs inference for a workflow step using the provided context.
  * Expects: promptTemplate may include {{key}} placeholders from values and progressMessage is non-empty.

package/dist/_workflows/steps/generateCommit.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import type { Context, ProviderModelSelectionMode } from "@/types";
+import type { Context, ProviderModelSelectionMode } from "../../types.js";
 export interface GenerateCommitOptions {
     hint?: string;
     modelSelectionMode?: ProviderModelSelectionMode;

package/dist/_workflows/steps/generateCommit.js CHANGED Viewed

@@ -1,5 +1,5 @@
-import { text } from "@text";
-import { generate } from "@/_workflows/steps/generate.js";
+import { text } from "../../text/text.js";
+import { generate } from "../../_workflows/steps/generate.js";
 const promptTemplate = [
     "Generate one Angular-style git commit message.",
     "Return a single line only.",

package/dist/_workflows/steps/generateDocument.d.ts CHANGED Viewed

@@ -1,5 +1,5 @@
-import { type GenerateFilePermissions } from "@/modules/ai/generateFile.js";
-import type { Context, ProviderModelSelectionMode } from "@/types";
+import { type GenerateFilePermissions } from "../../modules/ai/generateFile.js";
+import type { Context, ProviderModelSelectionMode } from "../../types.js";
 export type GenerateDocumentPromptId = "PROMPT_RESEARCH" | "PROMPT_RESEARCH_PROBLEMS" | "PROMPT_DECISIONS" | "PROMPT_PRODUCT_PITCH" | "PROMPT_PRODUCT_PITCH_FINAL" | "PROMPT_PRODUCT_NAME" | "PROMPT_TECHNOLOGY_STACK" | "PROMPT_TECHNOLOGY_STACK_FINAL" | "PROMPT_AGENTS_MD" | "PROMPT_PROJECT_BLUEPRINT";
 export interface GenerateDocumentInput {
     promptId: GenerateDocumentPromptId;

package/dist/_workflows/steps/generateDocument.js CHANGED Viewed

@@ -1,9 +1,9 @@
 import { readFileSync } from "node:fs";
 import path from "node:path";
 import { fileURLToPath } from "node:url";
-import { text } from "@text";
-import { generateProgressMessageResolve } from "@/_workflows/steps/generateProgressMessageResolve.js";
-import { generateFile } from "@/modules/ai/generateFile.js";
+import { text } from "../../text/text.js";
+import { generateProgressMessageResolve } from "../../_workflows/steps/generateProgressMessageResolve.js";
+import { generateFile } from "../../modules/ai/generateFile.js";
 const promptsPath = path.join(path.dirname(fileURLToPath(import.meta.url)), "../prompts");
 const promptById = {
     PROMPT_RESEARCH: readFileSync(path.join(promptsPath, "PROMPT_RESEARCH.md"), "utf-8"),

package/dist/_workflows/steps/generateFrontmatter.d.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 import type { ZodTypeAny } from "zod";
-import { type GenerateFilePermissions } from "@/modules/ai/generateFile.js";
-import type { Context } from "@/types";
+import { type GenerateFilePermissions } from "../../modules/ai/generateFile.js";
+import type { Context } from "../../types.js";
 export interface GenerateFrontmatterOptions extends Omit<GenerateFilePermissions, "verify"> {
 }
 export interface GenerateFrontmatterResult {