npm - @axis-bootstrap/cli - Versions diffs - 0.1.0 - Mend

@axis-bootstrap/cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/README.md +90 -0
package/package.json +42 -0
package/src/commands/audit.js +53 -0
package/src/commands/cleanup.js +42 -0
package/src/commands/doctor.js +137 -0
package/src/commands/init.js +297 -0
package/src/commands/link.js +31 -0
package/src/commands/spdd.js +139 -0
package/src/commands/state.js +21 -0
package/src/index.js +113 -0
package/src/lib/copy.js +19 -0
package/src/lib/detect.js +70 -0
package/src/lib/i18n.js +147 -0
package/src/lib/paths.js +45 -0
package/src/lib/ui.js +29 -0
package/templates/CANVAS.md +48 -0
package/templates/CONVENTIONS.md +43 -0
package/templates/INSTRUCTIONS.md +49 -0
package/templates/STATE.md +27 -0
package/templates/bootstrap-skill/PLANNER.md +221 -0
package/templates/bootstrap-skill/PROMPT-TEMPLATE.md +128 -0
package/templates/bootstrap-skill/SKILL.md +56 -0
package/templates/bootstrap-skill/references/CANVAS-REASONS.md +111 -0
package/templates/bootstrap-skill/references/PATTERNS.md +372 -0
package/templates/bootstrap-skill/references/PHASE-1-DISCOVERY.md +120 -0
package/templates/bootstrap-skill/references/PHASE-2-SPEC.md +250 -0
package/templates/bootstrap-skill/references/PHASE-3-HARNESS.md +331 -0
package/templates/bootstrap-skill/references/PHASE-4-MEMORY.md +187 -0
package/templates/bootstrap-skill/references/PHASE-5-VALIDATION.md +194 -0
package/templates/bootstrap-skill/references/QUICKSTART.md +144 -0
package/templates/bootstrap-skill/references/TEMPLATES.md +602 -0
package/templates/bootstrap-skill/references/UNIVERSAL-MAP.md +216 -0
package/templates/settings.json +29 -0
package/templates/setup-ide-links.sh +33 -0
package/templates/skills/abstraction-first.md +55 -0
package/templates/skills/alignment.md +53 -0
package/templates/skills/iterative-review.md +55 -0
package/templates/skills/story-decompose.md +54 -0

package/templates/bootstrap-skill/references/PATTERNS.md ADDED Viewed

@@ -0,0 +1,372 @@
+# Patterns — Technical Patterns of the Framework
+Reusable patterns that the bootstrap applies and that should be followed in any post-bootstrap project evolution.
+## Index
+| #   | Pattern                            | Layer      |
+| --- | ---------------------------------- | ---------- |
+| 1   | Progressive Disclosure             | Spec       |
+| 2   | Token Budget                       | Spec       |
+| 3   | Knowledge Verification Chain       | Harness    |
+| 4   | Auto-Sizing by Complexity          | Harness    |
+| 5   | Skill Granularity                  | Spec       |
+| 6   | Description Quality                | Spec       |
+| 7   | Flows vs State                     | Spec       |
+| 8   | Composability Between Skills       | Spec       |
+| 9   | Maintenance Loop                   | Memory     |
+| 10  | Anthropic Three-Agent Pattern      | Harness    |
+| 11  | Usage Scenarios                    | (examples) |
+| 12  | ACE — Memory as Evolving Playbook  | Memory     |
+| 13  | K-Trial Reliability                | Harness    |
+| 14  | Testable Spec (Anti-Verbosity)     | Spec       |
+| 15  | Bidirectional Spec-Code Sync       | Memory     |
+---
+## 1. Progressive Disclosure (3 Layers)
+The spec loads in three distinct moments to minimize tokens:
+| Layer             | When it loads    | Content                                 | Cost           |
+| ----------------- | ---------------- | --------------------------------------- | -------------- |
+| **1 — Discovery** | Always (startup) | `name` + `description` from frontmatter | ~3 lines/skill |
+| **2 — Index**     | When relevant    | Full `SKILL.md`                         | ~40-60 lines   |
+| **3 — On-demand** | When needed      | `references/*.md`                       | on demand      |
+The agent knows a skill exists (layer 1), decides if it needs it (layer 2), only pulls deep details when implementing (layer 3).
+---
+## 2. Token Budget
+Progressive Disclosure only works with explicit limits. Define:
+| Category                                    | Budget               | Rule                            |
+| ------------------------------------------- | -------------------- | ------------------------------- |
+| **Fixed base** (INSTRUCTIONS + frontmatter) | ~1,500-2,000 tokens  | Always loaded                   |
+| **Active skills** (full SKILL.md)           | ~3,000-5,000 tokens  | Only relevant ones for the task |
+| **References on-demand**                    | ~5,000-10,000 tokens | Only when necessary             |
+| **Target total**                            | <15,000 tokens       | Reserve maximum for reasoning   |
+**Loading rules:**
+- Never load multiple `SKILL.md` simultaneously if not for the same task
+- Never load multiple architecture docs at the same time
+- When reaching the limit, unload the oldest content
+- Warn when the docs context exceeds the budget (signal of oversized skills)
+---
+## 3. Knowledge Verification Chain
+**Before asserting anything**, the agent follows this order **mandatorily**:
+```text
+Step 1: Codebase  → verify existing code, conventions and patterns
+Step 2: Project docs → README, .ai/docs/, inline comments, skills
+Step 3: Official docs (MCP/Context7) → resolve lib ID, consult current API
+Step 4: Web search → official docs, reliable sources, community standards
+Step 5: Mark as uncertain → "I'm not sure about X — please verify"
+```
+**Inviolable rules:**
+- Never skip to Step 5 if 1-4 are available
+- Step 5 is **always** signaled as uncertain — never presented as fact
+- **Never assume or fabricate.** If not found, say "I found no documentation"
+- Inventing APIs, patterns or behaviors causes cascading failures: wrong design → wrong tasks → wrong implementation
+- Uncertainty is always preferable to fabrication
+This chain must be referenced in `CONVENTIONS.md` and can be an independent rule in `.ai/rules/knowledge-verification.md`.
+---
+## 4. Auto-Sizing by Complexity
+Not every task needs the same level of planning. Before starting, the agent evaluates:
+| Complexity  | Indicators                    | Documentation                 | What to skip                     |
+| ----------- | ----------------------------- | ----------------------------- | -------------------------------- |
+| **Small**   | ≤3 files, scope in 1 sentence | Describe → Implement → Verify | Spec, design, task breakdown     |
+| **Medium**  | Clear feature, <10 steps      | Brief spec + inline design    | Formal design                    |
+| **Large**   | Multi-component, >10 steps    | Full spec + design + tasks    | Nothing                          |
+| **Complex** | Ambiguity, new domain         | Spec + discussion + research  | Nothing + interactive validation |
+**Rules:**
+- **Specify** and **Execute** are always mandatory — always know WHAT and DO IT
+- **Design** is skipped if the change is direct (no new architectural decisions)
+- **Task breakdown** is skipped if there are ≤3 obvious steps
+- **Safety valve:** even when tasks are skipped, the agent lists steps inline. If the listing reveals >5 steps or complex dependencies, **STOP** and create a formal task breakdown
+Avoids two extremes: over-engineering on simple tasks (burns tokens with ceremony) and under-planning on complex ones (generates rework).
+---
+## 5. Skill Granularity
+**When to create a new skill:**
+- Domain has >5 specific concepts
+- Has its own workflow
+- Information not derivable from the code without external context
+- Agent makes recurring errors without the skill
+**When to expand an existing one:**
+- Information is complementary
+- `SKILL.md` is still <60 lines after addition
+- Use scenario is the same
+**When to use `docs/` instead of a skill:**
+- It is pure reference documentation (schema, contracts)
+- Does not involve a workflow
+- Will be referenced by multiple skills
+---
+## 6. Description Quality
+The `description` in the frontmatter is the **only information** the agent reads in 100% of sessions. It determines whether the skill is used or ignored.
+**Checklist:**
+- [ ] Contains exact domain terms that appear in developer questions
+- [ ] Lists scenarios with action verbs ("implementing", "debugging", "understanding")
+- [ ] Has 2-4 lines (1 is vague, 5+ is excessive)
+- [ ] A new developer understands when to use it just by reading the description
+**Example:**
+```yaml
+# Weak
+description: Reference for the payments API integration.
+# Strong
+description: Complete reference for the Payments API integration.
+  Use when implementing API calls (endpoints, auth, payload format),
+  debugging API responses (error codes, rate limits),
+  or understanding the retry strategy and idempotency rules.
+```
+---
+## 7. Flows vs State
+| Type            | Where                               | Example                   |
+| --------------- | ----------------------------------- | ------------------------- |
+| Workflow        | `skills/<name>/SKILL.md`            | "How to collect API data" |
+| Algorithm/logic | `skills/<name>/references/GUIDE.md` | "Deduplication logic"     |
+| Schema/contract | `docs/database-schema.md`           | "Transactions table"      |
+| Current state   | `docs/STATE.md`                     | "Feature X in progress"   |
+**Rule:** skills document **flows**; docs document **state**.
+---
+## 8. Composability Between Skills
+Skills often need each other. Protocol:
+1. **Check availability** — before using another skill's functionality, check if it exists
+2. **Delegate if available** — use the complementary skill, do not reimplement
+3. **Graceful fallback** — if not available, use the standard approach
+4. **Recommend once** — suggest installation at most once per session
+**How to document in SKILL.md:**
+```markdown
+## Integrations
+- **Diagrams:** If `mermaid-studio` available, delegate diagram creation.
+  Fallback: inline Mermaid code blocks.
+- **Exploration:** If `codenavi` available, delegate navigation.
+  Fallback: built-in search tools.
+```
+Creates a modular ecosystem without rigid dependencies.
+---
+## 9. Maintenance Loop
+Unmaintained documentation becomes misinformation — worse than absent, because the agent acts with confidence on wrong info.
+**Fundamental rule:** every relevant behavioral change in the code must be reflected in the skills/docs **in the same session**.
+**Triggers:**
+| Event                                | Expected action                              |
+| ------------------------------------ | -------------------------------------------- |
+| Code changes a skill's flow          | Propose skill update before closing session  |
+| Business rule emerges                | Ask if it should be documented in skill/docs |
+| Bug reveals undocumented behavior    | Propose documenting it                       |
+| New integration                      | Evaluate new skill or expansion              |
+| Session paused with work in progress | Update `STATE.md`                            |
+**Closing protocol:**
+At the end of a session with changes, the agent:
+1. Lists behavioral changes in the code
+2. Identifies affected skills/docs
+3. Asks: *"The following documentation needs updating: [list]. Update now?"*
+Creates habit without being intrusive — does not update automatically, but does not let it pass without warning.
+**The agent as documentation guardian:** with `CONVENTIONS.md` in context, actively identifies when code contradicts docs and reports — even when not asked.
+---
+## 10. Anthropic Three-Agent Pattern
+For long tasks, separate into sub-agents:
+- **Planner** decomposes spec into tasks (does not execute)
+- **Generator** implements tasks (does not decide)
+- **Evaluator** validates output against spec (does not consult implementation history)
+Application in the framework:
+- `PLANNER.md` orchestrates phases (does not create artifacts)
+- Each `PHASE-N.md` generates artifacts (does not decide the next phase)
+- `PHASE-5-VALIDATION.md` validates (does not correct without confirmation)
+This separation avoids the classic problem where the agent starts "adjusting" decisions while implementing, losing the original path.
+---
+## 11. Usage Scenarios (examples)
+### Scenario 1 — Implement an integration
+```text
+Dev: "Implement data sending to API X"
+Startup: Agent reads INSTRUCTIONS.md, identifies via frontmatter
+         that skills api-integration and field-mapping are relevant.
+Trigger: Reads the 2 SKILL.md (~80 lines total). Knows endpoints,
+         normalization flow, retry pattern.
+On-demand: Needs full payload → reads API-REFERENCE.md.
+           Needs the mapping table → reads MAPPING-TABLE.md.
+Total: ~400 lines, vs ~2,000+ if everything was in a monolithic file.
+```
+### Scenario 2 — New feature
+```text
+Dev: "Add support for processing marketplace data"
+1. Agent reads collection skill → understands existing pattern
+2. Reads architecture-patterns.md → knows how to create strategies
+3. Reads code-style.md → follows naming conventions
+4. Reads detailed guide → understands pagination, dedup, enrichment
+Result: implements following patterns without asking.
+```
+### Scenario 3 — Multi-IDE
+New dev uses Windsurf while team uses Cursor. Without additional configuration, Windsurf reads `AGENTS.md` (symlink to `.ai/INSTRUCTIONS.md`) and skills in `.agents/skills/` (symlink to `.ai/skills/`). Receives exactly the same context.
+---
+## 12. ACE — Memory as Evolving Playbook
+> Based on: **Agentic Context Engineering** (arxiv 2510.04618). +10.6% in agent benchmarks, +8.6% in finance.
+The ACE approach treats `STATE.md` as a **self-curating playbook** — not as a history log. Three operations per session:
+| Operation      | What it does                              | Frequency                  |
+| -------------- | ----------------------------------------- | -------------------------- |
+| **Generation** | Adds new learning, decision, blocker      | Every session with changes |
+| **Reflection** | Identifies what is resolved or obsolete   | Every session              |
+| **Curation**   | Removes the obsolete, elevates the useful | Every session              |
+**Curation rules:**
+- STATE.md ≤ 80 lines → if larger, something was not removed
+- An entry in "Lessons Learned" only enters if it is *non-obvious* — insights a new dev would not know
+- "Deferred" is the organized trash can — idea that does not die, but does not block
+**Why it works:** prevents the problem documented in Spec Kit issue #75 — specs that grow indefinitely generate noise, not context. Curated context > voluminous context.
+---
+## 13. K-Trial Reliability
+> Based on: **ReliabilityBench** (arxiv 2601.06112). pass@1 overestimates reliability by 20-40%.
+Instead of measuring only "did the agent pass this task?", AXIS recommends measuring consistency:
+| Metric           | What it measures               | How to apply                              |
+| ---------------- | ------------------------------ | ----------------------------------------- |
+| **pass@1**       | Passed on the first attempt    | Baseline — do not use alone               |
+| **pass@k**       | Passed in k independent runs   | Smoke test the same flow 3x               |
+| **ε-robustness** | Passed with variation in input | Test with slight reformulation of request |
+**Minimum protocol for smoke test in Phase 5:**
+```bash
+# Run the same bootstrap command 3x on a test project
+# If the result is identical in all 3 runs: reliable
+# If it varies: document the variation point in STATE.md as blocker
+```
+**Warning signal:** if the generated structure varies between sessions, the harness is under-configured — probably missing explicit template or acceptance criteria in the skill.
+---
+## 14. Testable Spec (Anti-Verbosity)
+> Based on: GitHub Spec Kit issue #75 ("creates illusion of work") and ReliabilityBench findings.
+Long specs are not better specs. AXIS enforces:
+| Artifact          | Limit         | Consequence of exceeding                |
+| ----------------- | ------------- | --------------------------------------- |
+| `INSTRUCTIONS.md` | 100-180 lines | Context loaded always — direct noise    |
+| `SKILL.md`        | ≤ 60 lines    | Indexed always — each line costs tokens |
+| `STATE.md`        | ≤ 80 lines    | Read at session start — must be focused |
+**Testability criterion for a spec:** a spec item is testable if you can answer "how would I know the agent followed this?". If you can't answer, the item is too vague.
+Examples:
+| Vague (noise)           | Testable (signal)                                                  |
+| ----------------------- | ------------------------------------------------------------------ |
+| "Follow best practices" | "Use `createQueryBuilder` for bulk insert >100 records"            |
+| "Be careful with data"  | "Never execute `DROP` or `TRUNCATE` without explicit confirmation" |
+---
+## 15. Bidirectional Spec-Code Sync
+When code and spec diverge, the direction of the fix depends on the **type of change**:
+| Change type                                               | Direction   | Rule                                                                                                           |
+| --------------------------------------------------------- | ----------- | -------------------------------------------------------------------------------------------------------------- |
+| **Requirements changed** (new AC, business rule modified) | spec → code | Update the Canvas/skill/STATE first. Then regenerate or modify code guided by the updated spec.                |
+| **Refactoring** (structure/style, no behavior change)     | code → spec | Refactor code first. Then sync the spec back to reflect the new structure.                                     |
+| **Bug fix** (behavior was wrong)                          | spec → code | Clarify the correct behavior in the spec. Then fix the code. Never patch code without closing the intent loop. |
+**The golden rule:** when reality diverges from the spec, fix the spec first — then the code. The only exception is refactoring: clean the code, then sync.
+**Why it matters:** if you patch code without updating the spec, the next session starts with wrong context. The agent "rediscovers" the bug. The spec is the upstream source of truth — code is its output.
+**In AXIS terms:**
+- `STATE.md` is updated before implementation when requirements change
+- Skills are updated in the same session as the code that makes them stale
+- The Maintenance Loop (Pattern #9) triggers at session end — this pattern triggers at change time
+**Practical signals that divergence happened:**
+- Agent proposes something the spec explicitly contradicts → spec is stale
+- Code review reveals a pattern not in any rule → rule is missing
+- Bug surfaces that a Safeguard should have caught → Safeguard was absent or vague

package/templates/bootstrap-skill/references/PHASE-1-DISCOVERY.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Phase 1 — Discovery
+**Goal:** understand the project deeply enough to generate a correct spec in Phase 2 without needing to go back.
+**Typical duration:** 5-15 minutes of interview.
+**Output of this phase:** a mental *Project Profile* (or text draft) covering type, tools, domains, constraints, quality target, and IDEs.
+---
+## Principle: Read Before Asking
+Before the first question, the agent:
+1. Lists the target project files (up to 2 levels deep)
+2. Reads `README.md`, `package.json`/`pyproject.toml`/equivalent, and any pre-existing AI file (`CLAUDE.md`, `AGENTS.md`)
+3. Identifies the stack if possible
+4. **Only then asks** — and never asks what is already in the files
+This reduces friction and demonstrates attention to context.
+---
+## Block 1 — Universal Questions (always ask)
+```text
+1. In one sentence: what does this project do and for whom?
+2. Is it a software project, or another type (content, research, business, legal, educational)?
+3. How many people will work on it and for how long?
+4. Which agents/IDEs will be used? (Claude Code, Cursor, Windsurf, Copilot, others)
+5. Are there critical constraints? (compliance, deadline, budget, security)
+```
+Confirm the answers in a summary before advancing to Block 2.
+---
+## Block 2 — Branching by Type
+Use the answer to question 2 to choose the sub-block below. May apply more than one if the project is hybrid (e.g., research + content).
+### If SOFTWARE
+```text
+6a. What is the main stack? (language, framework, runtime)
+7a. How does the project run? (exact command — npm run dev, python main.py, go run, etc.)
+8a. Is there a database, queue, cache, or external services?
+9a. Is there an adopted architecture pattern? (DI, hexagonal, monolith, microservices, MVC, etc.)
+10a. Are there tests? What framework? Coverage of what?
+11a. Is there CI/CD? Where? (GitHub Actions, GitLab CI, etc.)
+12a. Which 3-5 areas/modules of the code have specific rules that deserve to become a skill?
+```
+### If CONTENT (articles, marketing, technical docs)
+```text
+6b. What is the format and distribution channel? (blog, LinkedIn, newsletter, book, video script)
+7b. Tone of voice and target audience?
+8b. Is there an established SEO, branding, or style guideline?
+9b. What is the workflow? (briefing → draft → review → publish)
+10b. Which skills would help? (e.g., "tone of voice", "article structure", "SEO checklist", "fact-checking")
+```
+### If RESEARCH / ACADEMIC
+```text
+6c. What is the discipline and central research question?
+7c. What methodology? (qualitative, quantitative, experimental, review)
+8c. What artifacts will be produced? (paper, dataset, analysis code, slides)
+9c. What conventions/norms? (APA, MLA, Chicago; citation format)
+10c. Which skills? (e.g., "methodology", "data collection", "statistical analysis", "academic writing")
+```
+### If BUSINESS / MANAGEMENT
+```text
+6d. What is the goal? (strategic planning, OKRs, reports, market analysis)
+7d. What are the expected artifacts? (deck, report, spreadsheet, BSC)
+8d. Who are the stakeholders and what is their technical level?
+9d. Are there adopted frameworks? (OKR, BSC, lean canvas, SWOT)
+10d. Which skills? (e.g., "executive report structure", "SWOT analysis", "tone for board")
+```
+### If LEGAL / COMPLIANCE
+```text
+6e. What jurisdiction and area? (labor, tax, GDPR/LGPD, contractual)
+7e. What artifacts? (contracts, legal opinions, DPIA, policies)
+8e. Are there official templates to follow?
+9e. What critical risks to avoid?
+10e. Which skills? (e.g., "contract drafting", "clause analysis", "compliance checklist")
+```
+### If EDUCATIONAL
+```text
+6f. What is the target audience and level?
+7f. What artifacts? (course, lesson plan, instructional material, assessment)
+8f. Is there a pedagogical methodology? (PBL, Bloom, flipped classroom)
+9f. Which skills? (e.g., "instructional design", "assessment design", "language for level X")
+```
+### If OTHER
+Apply principles from [UNIVERSAL-MAP.md](UNIVERSAL-MAP.md) and adapt. Ultimately, every activity has:
+- Knowledge domains (→ skills)
+- Quality standards (→ rules)
+- Final artifacts (→ templates)
+- Continuity between sessions (→ memory)
+---
+## Block 3 — Quality Calibration
+```text
+13. Is this a proof-of-concept, MVP, or production?
+14. What level of validation is acceptable? (vibe-check, human review, automated gates, all)
+15. Is there a history of problems the framework should prevent? (e.g., "we lose context whenever the dev changes", "AI responses diverge between IDEs")
+```