npm - nimai-core - Versions diffs - 0.4.7 → 0.4.9 - Mend

nimai-core 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/data/FORGE-quickref.md +326 -0
package/data/FORGE-spec-template.md +454 -0
package/dist/forge-root.d.ts +2 -0
package/dist/forge-root.d.ts.map +1 -0
package/dist/forge-root.js +61 -0
package/dist/forge-root.js.map +1 -0
package/dist/index.d.ts +1 -0
package/dist/index.d.ts.map +1 -1
package/dist/index.js +1 -0
package/dist/index.js.map +1 -1
package/dist/prompts.d.ts.map +1 -1
package/dist/prompts.js +11 -6
package/dist/prompts.js.map +1 -1
package/package.json +3 -2

package/data/FORGE-quickref.md ADDED Viewed

@@ -0,0 +1,326 @@
+# FORGE Quick Reference
+> **You are here:** This is the operational tool — the doc an agent uses during active work.
+>
+> **The other docs in this system:**
+> - **Canonical Framework** (`FORGE-canonical.md`) — deep explanation of every concept. Go there to understand *why* something works, not just *what* to do.
+> - **Spec Template** (`FORGE-spec-template.md`) — fill-in-the-blank form for a concrete project. Go there when you have a direction and need to turn it into an agent-ready brief.
+---
+## Agent Routing: What To Do Based On The Request
+*Read the request and match it to a route before doing anything else.*
+| If the request is… | Route |
+|---|---|
+| A new idea or creative problem — no direction yet | Run **Divergence → Convergence Loop** below. Stay in this doc. |
+| A loose or vague request that needs structuring | Run **Self-Spec Agent** prompt (see below) to generate a draft spec. |
+| A specific project ready to execute | Go to **Spec Template** → fill it out → deploy. |
+| How to approach or frame something | Use **4 Layers + 5 Primitives** below. Deeper detail in Canonical doc. |
+| Reviewing, validating, or debugging existing work | Run **Failure Mode Taxonomy** red-team below. |
+| Something feels wrong mid-project | Check **Failure Mode Taxonomy** first, then check Intent Layer for drift. |
+| A concept here that needs deeper explanation | Go to **Canonical Framework** for the full treatment. |
+| Building a multi-agent or long-running system | Go to **Canonical Framework** → Execution Architecture + Extended Patterns. |
+**Default rule:** Start here. Go to Canonical to understand. Go to Spec Template to execute.
+---
+## The Deeper Thesis
+Agents fail not from lack of intelligence but from **underspecified control surfaces**. The four control surfaces are specification, intent, context, and prompt. Engineer all four.
+---
+## The 4 Engineering Disciplines / Control Surfaces
+| Discipline | Control Surface | Skip It And… |
+|---|---|---|
+| **Prompt Craft** | Sub-task trigger | Agent doesn't know what to do right now |
+| **Context Engineering** | Information environment | Agent works from noise or stale data |
+| **Intent Engineering** | Values, trade-offs, deployment purpose | Agent resolves ambiguity the wrong way |
+| **Specification Engineering** | Complete blueprint for "done" | Agent can't define done and never stops |
+---
+## The 4 Layers (in order of leverage)
+| Layer | Role | Anti-Pattern |
+|---|---|---|
+| **1. Specification** | Defines exact "done" state + scope | Vague goals ("improve this") |
+| **2. Intent** | Agent's deployment purpose + trade-off hierarchy | Agent doesn't know its own role |
+| **3. Context** | Curated information environment | Dumping everything you have |
+| **4. Prompt** | Fires sub-task + assigns cognitive mode | Over-engineering this layer |
+---
+## The 5 Primitives (every sub-task must pass all 5)
+1. **Self-Contained** — Zero questions needed to start
+2. **Constrained** — Must / Must-Not / Prefer / Escalate defined
+3. **Modular** — Under 2 hours; independently verifiable
+4. **Acceptance Criteria** — Binary, measurable (not "looks good")
+5. **Evaluation** — Built-in check before reporting complete
+---
+## Pre-Execution Decisions (set BEFORE decomposition)
+**Risk Tier:**
+| Tier | Validation |
+|---|---|
+| Low | Self-check only |
+| Medium | Validator pass |
+| High | Validator + Adversarial Reflection + Human gate |
+**Resource Governance:** Model tier · Max runtime · Cost budget · Retry limit · Cost escalation trigger
+---
+## Cognitive Modes (reasoning postures, not personas)
+| Mode | Use When |
+|---|---|
+| **Deterministic** | Implementation; answer is knowable |
+| **Exploratory** | Research, brainstorming, hypothesis generation |
+| **Adversarial** | Security, risk, stress-testing |
+| **Synthesis** | Integrating multiple outputs |
+| **Audit** | Validation only — do not produce |
+**Over-specification warning:** Constrain evaluation, not divergence. In Exploratory phases, tight constraints on generation defeat the purpose. Reserve hard constraints for Adversarial and Deterministic phases.
+**Mode × Risk compatibility:** Exploratory mode is a pre-commitment mode. High risk tasks should be in Deterministic or Audit by final delivery. Exploratory + High risk as a final-delivery combination is a design error.
+---
+## Divergence → Convergence Loop (brainstorming)
+```
+Exploratory  →  Synthesis  →  Adversarial  →  Deterministic
+  DIVERGE        CLUSTER      STRESS-TEST      FORMALIZE
+```
+Generate wide → cluster patterns → attack top candidates → execute selected direction.
+---
+## Failure Mode Taxonomy (red-team before deployment)
+| Failure | Cause | Fix |
+|---|---|---|
+| Scope Creep | Ambiguous boundaries | Explicit Must-Not list |
+| Hallucinated Completion | Subjective criteria | Binary acceptance criteria |
+| Intent Drift | Unclear trade-offs or role | Ranked priorities + deployment purpose |
+| Context Collapse | Too much noise | Aggressive curation; MCP for live sources |
+| Runaway Cost | No resource ceiling | Hard caps before decomposition |
+| Overconfident Output | No uncertainty surfacing | Uncertainty reporting for high-stakes tasks |
+*If a failure doesn't fit these categories, extend the taxonomy — log it, identify root cause, add prevention.*
+---
+## Intent = Deployment Purpose + Trade-offs
+Every agent must know: **what it is · what it isn't · who consumes its output · what to optimize for when uncertain**
+## Adversarial Reflection (Medium/High risk)
+Produce → steelman critique → evaluate → revise → re-validate
+## Uncertainty Reporting (non-deterministic domains)
+Confidence % · uncertainty drivers · what would change the answer · alternative interpretations
+## Escalation Contract
+WHEN to stop · WHAT to report (include confidence) · WHO decides (named + SLA) · WHAT if no response
+## Context Hygiene
+2k high-signal tokens > 20k low-signal tokens · MCP for live sources · specify when to re-fetch
+---
+## When To Use This Framework
+The overhead should be proportional to the cost of failure. Use this as your guide:
+| Situation | What to use |
+|---|---|
+| Task is reversible, low-stakes, under 1 hour | Routing table above + nothing else. Just start. |
+| Task is consequential or takes more than 1 hour | Quickref checklist minimum. Self-Spec agent if unsure. |
+| Task is high-risk, irreversible, or affects others | Full Spec Template + Adversarial Reflection + Validator |
+| You're not sure | Ask: "What's the cost if this goes wrong?" Scale accordingly. |
+**The philosophy:** This framework exists to absorb any project — coding, science, writing, business, anything — and give it the structure needed to succeed without fail. Use as much or as little as the project demands. The principles apply universally; the overhead scales with stakes.
+---
+## Solo Operator Mode
+If you are working alone without a team, validators, or named escalation contacts, adapt as follows:
+**Escalation contract:** You are the reviewer. Set a time delay instead of an SLA — *"I will not proceed on this decision for 24 hours."* Distance and time create the same circuit-break that a second person would.
+**Validation:** For Medium risk, run the Adversarial Reflection yourself or have a second agent session do it cold — paste the output into a fresh context with no prior history and ask it to find flaws.
+**High risk solo:** Add a mandatory pause before any irreversible action. Write out the decision, the alternatives you considered, and your confidence level before executing. This is your human gate.
+**The Self-Spec + Reviewer pipeline replaces team infrastructure** — see below.
+---
+## Self-Spec Agent + Reviewer Pipeline
+This is the solo operator's complete workflow. Two prompts replace an entire team process.
+---
+### Prompt 1 — Self-Spec Agent
+*Use when you have a loose idea or vague request and need it turned into a complete spec.*
+Paste this prompt to any capable model along with your loose request:
+```
+You are a Specification Engineering agent operating under the FORGE.
+Your job is to take the loose request below and generate a complete draft spec
+using the framework's structure. Do not execute the request — only spec it.
+For each section, fill in what you can infer and mark anything uncertain
+with [NEEDS HUMAN INPUT: reason].
+Generate:
+1. Final deliverable (precise, format, measurable quality bar)
+2. Scope boundaries (in / out)
+3. Agent deployment purpose (what it is, is not, who consumes output)
+4. Trade-off hierarchy (ranked: accuracy / speed / cost / safety / other)
+5. Constraint architecture (Must / Must-Not / Prefer / Escalate)
+6. Task decomposition (sub-tasks under 2 hours, with acceptance criteria)
+7. Risk tier (Low / Medium / High with reasoning)
+8. Cognitive mode per sub-task
+9. Context needed (what the executing agent requires)
+10. Architecture Lock details for medium/high coding specs (Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, Auth implementation)
+11. Proposed validator prompt (what a reviewer should check)
+Loose request: [PASTE YOUR REQUEST HERE]
+```
+Review the output. Resolve all [NEEDS HUMAN INPUT] flags. Adjust anything that
+doesn't match your intent. Approved spec = your deployment brief.
+---
+### Prompt 1.5 — Spec-Quality Reviewer
+*Run this on a draft spec before approving it. The reviewing LLM checks whether the spec is agent-ready.*
+```
+You are the independent reviewer for this draft spec. Evaluate it now.
+You are a Specification Quality Reviewer operating under the FORGE framework.
+Your job is to evaluate the draft spec below for spec quality only — not implementation correctness.
+Do NOT assess any code, code diffs, or implementation output.
+Evaluate each dimension. Cite evidence for every verdict — a verdict without citation is treated as NO_GO.
+Classify each issue as HARD_FAIL, SOFT_FAIL, or NOTE. passed: true requires zero HARD_FAILs.
+1. Binary acceptance criteria — are all sub-task ACs measurable and unambiguous? Pre-checked [x] ACs are always invalid.
+2. Scope coherence — are in-scope/out-of-scope boundaries clear and non-contradictory? Terminology mismatches between conceptual and persisted representations are a HARD_FAIL.
+3. Constraint sufficiency — do Must/Must-Not/Prefer/Escalate constraints cover the key risks?
+4. Decomposition realism — can each sub-task be done within 2 hours? Sub-task dependencies must be explicit.
+5. Start-without-clarification viability — can an agent begin immediately with the context provided?
+6. Internal consistency — are terms, names, and concepts used consistently throughout?
+7. Mechanism lock — does every core flow commit to exactly ONE implementation path? Any "e.g./or/could/might/such as" in Deliverable, Scope, Constraints, or Tasks is a HARD_FAIL.
+8. Convergence declaration — does the spec include a Spec Convergence section with open_questions: 0 and ambiguities_remaining: 0? Absent, non-zero, or ready_for_build: no is a HARD_FAIL.
+9. Adversarial gap scan — does the spec include a substantively filled Edge Cases and Failure Modes section? Steelman as a builder following it exactly: what failure modes are unaddressed? Absent or placeholder-only is a HARD_FAIL for medium/high risk specs.
+10. Architecture completeness — for medium/high coding specs, does the spec include `## 1.6 Architecture Lock` with resolved fields for Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, and Auth implementation? Missing, blank, `___`, or `TBD` is a HARD_FAIL.
+Always end your response with:
+## Verdict
+```json
+{"passed": <true|false>, "schema_version": "2", "issues": [{"dimension": "<name>", "severity": "<HARD_FAIL|SOFT_FAIL|NOTE>", "detail": "<cited evidence>"}]}
+```
+Draft spec: [PASTE DRAFT SPEC HERE]
+```
+The host agent parses the `## Verdict` JSON block to drive the review loop:
+- `passed: true` → proceed to implementation
+- `passed: false` → refine the spec using the `issues` list, then re-review
+- Block absent or malformed → escalate to the human
+Use `nimai_spec_review` to generate this prompt automatically from a spec file.
+When Prompt 1.5 fails, the reviewer should append a `## Builder Brief` section after the verdict with one actionable fix per issue (section to edit + concrete fix direction).
+---
+### Prompt 2 — Reviewer / Validator Prompt Generator
+*Run this after Self-Spec is approved. Paste along with the approved spec.*
+```
+You are a Specification Engineering agent.
+Given the approved spec below, generate a Reviewer Prompt — precise instructions
+for a validator agent or solo reviewer to check the executing agent's output.
+The Reviewer Prompt must:
+- State exactly what is being checked and why
+- List binary pass/fail criteria from the spec's acceptance criteria
+- Include Adversarial Reflection sequence if risk tier is Medium or High
+- Include uncertainty reporting requirements if domain is non-deterministic
+- Specify what PASS looks like and what FAIL triggers (revise / escalate / abort)
+- Be usable by a solo operator with no additional context
+Approved spec: [PASTE APPROVED SPEC HERE]
+```
+---
+### The Complete Solo Pipeline
+```
+Loose request
+    ↓
+Self-Spec Agent (Prompt 1) → draft spec
+    ↓
+Spec-Quality Reviewer (Prompt 1.5) → passed? → NO → refine spec → loop
+    ↓ YES
+Human reviews + resolves [NEEDS HUMAN INPUT] flags
+    ↓
+Approved spec → Reviewer Prompt Generator (Prompt 2) → validator prompt
+    ↓
+Deploy executing agent with approved spec
+    ↓
+Paste output + validator prompt into fresh agent session
+    ↓
+PASS → done  |  FAIL → revise or escalate
+```
+---
+## Pre-Deployment Checklist
+**Specification**
+- [ ] Deliverable precise; scope boundaries explicit
+- [ ] Sub-tasks satisfy all 5 Primitives
+- [ ] For medium/high coding specs, `## 1.6 Architecture Lock` is present and fully resolved (no blank/`___`/`TBD` fields)
+**Intent**
+- [ ] Each agent has explicit deployment purpose
+- [ ] Trade-off hierarchy ranked; escalation contract complete
+**Context**
+- [ ] Context curated; MCP for live sources; freshness strategy set
+**Execution** *(set before decomposition)*
+- [ ] Risk tier assigned
+- [ ] Resource Governance parameters set
+- [ ] Cognitive mode per sub-task assigned
+- [ ] Validation routes by risk tier
+- [ ] Adversarial Reflection for medium/high risk
+- [ ] Uncertainty reporting for non-deterministic domains
+**Meta**
+- [ ] Spec red-teamed against Failure Mode Taxonomy
+- [ ] Planner execution plan will be saved as artifact
+- [ ] Living Specification log set up (multi-session)
+- [ ] Spec Convergence section filled — `open_questions: 0`, `ambiguities_remaining: 0`, `ready_for_build: yes`
+---
+*FORGE v1.0 — Framework for Orchestrating Reliable Generative Execution*
+*System docs: Quick Reference (this doc) · Canonical Framework · Spec Template*

package/data/FORGE-spec-template.md ADDED Viewed

@@ -0,0 +1,454 @@
+# FORGE Spec Template
+> **You are here:** This is the Spec Template — the execution tool. Use it to turn a project direction into a complete agent-ready brief.
+>
+> **The other docs in this system:**
+> - **Quick Reference** (`FORGE-quickref.md`) — start here if you don't have a project direction yet. If the request is a new idea or open-ended, go to the Quick Reference and run the Divergence → Convergence Loop or the Self-Spec Agent first. Come back here once you have a chosen direction.
+> - **Canonical Framework** (`FORGE-canonical.md`) — go there if you encounter a concept in this template you don't fully understand. Each section maps to a section in the canonical doc.
+>
+> **When to be in this doc:** You have a specific project or direction. You are ready to define it completely before handing it to an agent. If you can't fill in a field, that is an unresolved decision — resolve it here, not mid-run.
+>
+> **Solo operator shortcut:** Instead of filling this manually, use the Self-Spec Agent prompt in the Quick Reference to generate a draft, then review and approve it here.
+---
+> Fill in every field before handing to an agent. A blank field is an unresolved decision — resolve it here, not mid-run.
+> *Based on the FORGE*
+---
+## 0. Pre-Flight Decisions
+> These must be set first. Everything else is built on top of them.
+**Risk Tier:** [ ] Low  [ ] Medium  [ ] High
+*(If unsure, go one tier higher)*
+**Primary Cognitive Mode for this project:**
+[ ] Deterministic  [ ] Exploratory  [ ] Adversarial  [ ] Synthesis  [ ] Audit
+[ ] Multi-phase — using Divergence → Convergence Loop (see Section 6)
+*→ Unsure which mode? See Cognitive Mode table in Quick Reference or Canonical doc.*
+**Resource Governance:**
+- Model tier (Planner): `_______________`
+- Model tier (Workers): `_______________`
+- Max runtime per sub-task: `_______________`
+- Total compute / cost budget: `_______________`
+- Retry limit before escalation: `_______________`
+- Cost threshold that triggers stop-and-report: `_______________`
+---
+## 1. Specification Layer — The Blueprint
+### 1.1 Final Deliverable
+*Describe precisely. Not "a good analysis" — describe the exact artifact, format, and measurable quality.*
+```
+Deliverable: _______________________________________________
+Format: ___________________________________________________
+Length / Size: _____________________________________________
+Measurable quality bar: ____________________________________
+Benchmark dataset (if quality bar requires measurement):
+  path: _______________  (e.g. tests/fixtures/receipts/ or N/A)
+  schema: _____________  (field names and types of each labeled example, or N/A)
+  threshold computation rule: ___  (e.g. "pass rate = correct / total labeled examples", or N/A)
+```
+### 1.2 Scope Boundaries
+*Be explicit. Ambiguity here becomes scope creep mid-run.*
+**In scope:**
+- `_______________`
+- `_______________`
+- `_______________`
+**Out of scope (Must-Not):**
+- `_______________`
+- `_______________`
+- `_______________`
+### 1.3 Task Decomposition
+*Break the project into sub-tasks. Each must satisfy the 5 Primitives and complete in under 2 hours.*
+*→ Need a reminder of the 5 Primitives? See Quick Reference or Canonical doc Section "5 Primitives."*
+| # | Sub-task | Cognitive Mode | Risk Tier | Acceptance Criteria | Eval Method |
+|---|---|---|---|---|---|
+| 1 | | | | | |
+| 2 | | | | | |
+| 3 | | | | | |
+| 4 | | | | | |
+| 5 | | | | | |
+*Add rows as needed. Every sub-task needs all five columns filled.*
+*Note: Sub-task risk tiers may differ from the overall project tier. A research sub-task and a production deployment sub-task in the same project have different risk profiles — assign each independently. The project-level tier sets the floor for escalation; sub-task tiers govern validation routing.*
+### 1.4 Acceptance Criteria — Master Definition of Done
+*Binary. Measurable. No subjective criteria.*
+The project is complete when ALL of the following are true:
+- [ ] `_______________`
+- [ ] `_______________`
+- [ ] `_______________`
+---
+## 1.5 Mechanism Decision
+*One block per core architectural or algorithmic choice. If no core mechanism applies, write "N/A" and justify.*
+*A builder must be able to start coding without choosing missing architecture. If any mechanism is undecided, resolve it here first.*
+---
+**Decision: [short name, e.g., "Primary parsing method"]**
+```
+Chosen approach: _______________________________________________
+Rejected alternatives: _________________________________________
+Why rejected: __________________________________________________
+Impact on ACs: _________________________________________________
+```
+---
+## 1.6 Architecture Lock
+*Required for medium/high coding specs. Advisory for low/unknown/non-coding specs.*
+*All fields must be resolved before builder handoff. Blank placeholders and "TBD" are unresolved.*
+```
+Persistence layer: _______________________________________________
+File/object storage: _____________________________________________
+External model/service + env var: ________________________________
+API trigger flow: ________________________________________________
+Entity status state machine: _____________________________________
+Auth implementation: _____________________________________________
+```
+---
+## 2. Intent Layer — The Compass
+### 2.1 Agent Deployment Purpose
+*Tell the agent what it is, what it is not, and who consumes its output. One clear paragraph.*
+```
+You are: ___________________________________________________
+You are NOT responsible for: _______________________________
+Your output is consumed by: ________________________________
+  (human decision-maker / another agent / automated pipeline)
+Your output feeds into: ____________________________________
+```
+### 2.2 Trade-off Hierarchy
+*Rank these in order of priority. The agent will use this when it hits a fork.*
+Rank the following from 1 (highest) to n (lowest) for this task:
+| Priority | Value |
+|---|---|
+| `___` | Accuracy / Correctness |
+| `___` | Speed / Efficiency |
+| `___` | Cost / Token economy |
+| `___` | Safety / Risk avoidance |
+| `___` | Novelty / Creativity |
+| `___` | Completeness |
+| `___` | Simplicity / Readability |
+| `___` | *(add domain-specific value)* |
+### 2.3 Constraint Architecture
+**Must (non-negotiable requirements):**
+- `_______________`
+- `_______________`
+**Must-Not (hard prohibitions):**
+- `_______________`
+- `_______________`
+**Prefer (soft preferences when trade-offs arise):**
+- `_______________`
+- `_______________`
+**Escalate (stop and surface to human when):**
+- `_______________`
+- `_______________`
+### 2.4 Forbidden Approaches
+*Specific methods, tools, frameworks, or reasoning patterns to avoid.*
+- `_______________`
+- `_______________`
+---
+## 3. Context Layer — The Environment
+### 3.1 Provided Context
+*List all documents, artifacts, or data sources being provided. Mark each as AUTHORITATIVE or REFERENCE.*
+| Source | Type | Authority Level | Notes |
+|---|---|---|---|
+| | | AUTHORITATIVE / REFERENCE | |
+| | | AUTHORITATIVE / REFERENCE | |
+| | | AUTHORITATIVE / REFERENCE | |
+### 3.2 Known State
+*What has already been tried? What failed? What is known?*
+```
+Prior attempts: ____________________________________________
+Known failures / dead ends: ________________________________
+Known constraints: _________________________________________
+Current blockers: __________________________________________
+```
+### 3.3 Context Freshness
+*Tell the agent when to trust the provided context vs. when to re-fetch or verify.*
+[ ] All provided context is current — use as ground truth
+[ ] The following sources may be stale and should be verified: `_______________`
+[ ] The agent must re-fetch live data for: `_______________`
+[ ] MCP connections available: `_______________`
+### 3.4 Domain Conventions
+*Terminology, style, standards, or norms the agent must match.*
+```
+Terminology to use: ________________________________________
+Terminology to avoid: ______________________________________
+Style / format standards: __________________________________
+Domain-specific conventions: _______________________________
+```
+---
+## 4. Prompt Layer — The Trigger
+### 4.1 Opening System Instruction
+*The master instruction that frames the entire run. Reference Sections 1–3 explicitly.*
+```
+You are [deployment purpose from 2.1].
+Your task is to produce [deliverable from 1.1].
+You are operating within the following constraints: [from 2.3].
+Your trade-off priority order is: [from 2.2].
+The context you have been provided is: [from 3.1].
+You will complete this task by executing the following sub-tasks in order: [from 1.3].
+For each sub-task, your completion criterion is: [from 1.4].
+```
+### 4.2 Per-Sub-Task Prompt Template
+*Use this template for each sub-task trigger.*
+```
+Sub-task [#]: [name]
+Cognitive mode: [mode]
+Your input: [what you're working from]
+Your output: [exact format and content required]
+Acceptance criteria: [binary criteria from 1.3]
+Evaluation: [how this will be checked]
+Resource cap: [max runtime / tokens for this sub-task]
+```
+---
+## 5. Governance & Validation
+### 5.1 Escalation Contract
+**Escalation triggers** *(from 2.3 — repeated here for executor clarity)*:
+- `_______________`
+- `_______________`
+**Who reviews escalations:**
+```
+Name / Role: _______________________________________________
+Contact: __________________________________________________
+Response SLA: ______________________________________________
+```
+**If no response arrives within SLA, the agent should:**
+[ ] Hold and wait
+[ ] Attempt alternative path: `_______________`
+[ ] Abort and report
+### 5.2 Adversarial Reflection Trigger
+*(Required for Medium and High risk tasks)*
+[ ] Not required (Low risk)
+[ ] Required after sub-task(s): `_______________`
+[ ] Required before final delivery
+The agent should critique: `_______________`
+The revision threshold is: `_______________` *(what level of critique warrants a revision?)*
+### 5.3 Uncertainty Reporting
+*(Required for non-deterministic domains)*
+[ ] Not required
+[ ] Required — agent must report:
+  - Confidence estimate (0–100%) with justification
+  - Primary uncertainty drivers
+  - What data would most reduce uncertainty
+  - Alternative plausible interpretations
+---
+## 6. Brainstorming Mode (Divergence → Convergence)
+*Complete this section only if primary mode is multi-phase brainstorming.*
+**Phase 1 — DIVERGE (Exploratory Mode)**
+```
+Generate: _________________________________________________
+Constraint: No filtering or judgment at this stage.
+Output format: ____________________________________________
+Volume target: ____________________________________________
+```
+**Phase 2 — CLUSTER (Synthesis Mode)**
+```
+Organize the output of Phase 1 by: ________________________
+Identify: _________________________________________________
+Output format: ____________________________________________
+```
+**Phase 3 — STRESS-TEST (Adversarial Mode)**
+```
+Attack the top [n] candidates from Phase 2.
+Criteria to stress-test against: __________________________
+Output format: ____________________________________________
+What constitutes a fatal flaw: ____________________________
+```
+**Phase 4 — FORMALIZE (Deterministic Mode)**
+```
+Selected direction: ______________________________________
+Formalize as: ____________________________________________
+Acceptance criteria: _____________________________________
+```
+---
+## 7. Domain-Specific Additions
+### If Coding:
+- Language / framework / version: `_______________`
+- Performance targets: `_______________`
+- API / interface contracts: `_______________`
+- Security requirements: `_______________`
+- Test coverage expectation: `_______________`
+### If Science / Research:
+- Null hypothesis: `_______________`
+- Falsification criteria: `_______________`
+- Authorized data sources: `_______________`
+- Forbidden data sources: `_______________`
+- Statistical significance threshold: `_______________`
+- Correction method: `_______________`
+- Reproducibility requirements: `_______________`
+### If Writing / Content:
+- Audience: `_______________`
+- Purpose: `_______________`
+- Structure: `_______________`
+- Word count / length: `_______________`
+- Style reference: `_______________`
+- Mandatory inclusions: `_______________`
+- Mandatory exclusions: `_______________`
+### If Business / Strategy:
+- Decision-maker: `_______________`
+- Decision to be made: `_______________`
+- Options to evaluate: `_______________`
+- Recommendation format: `_______________`
+- Regulatory Must-Nots: `_______________`
+- Stakeholder sensitivities: `_______________`
+---
+## 8. Spec Validation (Red-Team Checklist)
+*Complete before handing to the agent. Every unchecked box is a known risk.*
+**5 Primitives — does every sub-task have:**
+- [ ] A self-contained problem statement (zero questions needed to start)
+- [ ] A Constraint Architecture (Must / Must-Not / Prefer / Escalate)
+- [ ] A runtime under 2 hours
+- [ ] Binary acceptance criteria
+- [ ] A built-in evaluation method
+**Failure Mode Taxonomy — does this spec prevent:**
+- [ ] Scope Creep — explicit Must-Not list and scope boundaries
+- [ ] Hallucinated Completion — binary acceptance criteria defined
+- [ ] Intent Drift — ranked trade-offs and deployment purpose explicit
+- [ ] Context Collapse — context curated; noise removed
+- [ ] Runaway Cost — resource caps set before decomposition
+- [ ] Overconfident Output — uncertainty reporting required (if applicable)
+**Final gate:**
+- [ ] Risk tier and resource governance set *before* decomposition
+- [ ] Every agent has a deployment purpose statement
+- [ ] Escalation contract complete (who, when, SLA, no-response behavior)
+- [ ] Planner execution plan will be saved as artifact
+---
+## 8.5 Edge Cases and Failure Modes
+> Required for medium/high risk specs. Red-team the spec before builder handoff.
+> For each scenario, document what the spec says the builder must do. A blank list is a hard stop.
+**FORGE failure mode coverage:**
+- Scope Creep — which Must-Not explicitly prevents it: `_______________`
+- Hallucinated Completion — which AC is hardest to fake: `_______________`
+- Intent Drift — which decision will the builder most likely get wrong: `_______________`
+- Context Collapse — what context is load-bearing and must not be dropped: `_______________`
+- Runaway Cost — what resource cap prevents unbounded execution: `_______________`
+- Overconfident Output — what requires uncertainty reporting: `_______________`
+**Domain-specific edge cases:**
+- `_______________`
+- `_______________`
+---
+## 9. Spec Convergence
+> Fill before handing to a builder. A non-zero count or "no" is a hard stop.
+> Every [NEEDS HUMAN INPUT] flag from spec drafting must be resolved before this section is filled.
+```
+open_questions: 0
+ambiguities_remaining: 0
+ready_for_build: yes
+convergence_notes: _______________________________________________
+```
+---
+*FORGE Spec Template v1.0 — companion to the FORGE system. A blank field is an unresolved decision.*
+*System docs: Quick Reference · Canonical Framework · Spec Template (this doc)*
+<!-- nimai-spec -->

package/dist/forge-root.d.ts ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ export declare const FORGE_ROOT: string;
2	+ //# sourceMappingURL=forge-root.d.ts.map

package/dist/forge-root.d.ts.map ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"version":3,"file":"forge-root.d.ts","sourceRoot":"","sources":["../src/forge-root.ts"],"names":[],"mappings":"AAyBA,eAAO,MAAM,UAAU,QAAkB,CAAC"}

package/dist/forge-root.js ADDED Viewed

@@ -0,0 +1,61 @@
+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.FORGE_ROOT = void 0;
+const fs = __importStar(require("fs"));
+const path = __importStar(require("path"));
+/**
+ * Locate the FORGE docs root by walking up from __dirname until we find
+ * FORGE-quickref.md. Works regardless of where the package is installed.
+ */
+function findForgeRoot() {
+    // 1. Check bundled data/ directory (works when installed via npm)
+    const bundled = path.join(__dirname, '..', 'data');
+    if (fs.existsSync(path.join(bundled, 'FORGE-quickref.md')))
+        return bundled;
+    // 2. Walk up from __dirname (works in monorepo dev)
+    let dir = __dirname;
+    for (let i = 0; i < 10; i++) {
+        if (fs.existsSync(path.join(dir, 'FORGE-quickref.md')))
+            return dir;
+        const parent = path.dirname(dir);
+        if (parent === dir)
+            break;
+        dir = parent;
+    }
+    throw new Error('Cannot locate FORGE-quickref.md. Ensure the FORGE docs are present at the repo root.');
+}
+exports.FORGE_ROOT = findForgeRoot();
+//# sourceMappingURL=forge-root.js.map

package/dist/forge-root.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"forge-root.js","sourceRoot":"","sources":["../src/forge-root.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,uCAAyB;AACzB,2CAA6B;AAE7B;;;GAGG;AACH,SAAS,aAAa;IACpB,kEAAkE;IAClE,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,CAAC,SAAS,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;IACnD,IAAI,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,OAAO,EAAE,mBAAmB,CAAC,CAAC;QAAE,OAAO,OAAO,CAAC;IAE3E,oDAAoD;IACpD,IAAI,GAAG,GAAG,SAAS,CAAC;IACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,EAAE,EAAE,CAAC,EAAE,EAAE,CAAC;QAC5B,IAAI,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,mBAAmB,CAAC,CAAC;YAAE,OAAO,GAAG,CAAC;QACnE,MAAM,MAAM,GAAG,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC;QACjC,IAAI,MAAM,KAAK,GAAG;YAAE,MAAM;QAC1B,GAAG,GAAG,MAAM,CAAC;IACf,CAAC;IACD,MAAM,IAAI,KAAK,CACb,sFAAsF,CACvF,CAAC;AACJ,CAAC;AAEY,QAAA,UAAU,GAAG,aAAa,EAAE,CAAC"}

package/dist/index.d.ts CHANGED Viewed

@@ -6,4 +6,5 @@ export * from './prompts';
 export * from './clarification';
 export * from './discovery';
 export * from './verdict';
+export * from './forge-root';
 //# sourceMappingURL=index.d.ts.map

package/dist/index.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,SAAS,CAAC;AACxB,cAAc,YAAY,CAAC;AAC3B,cAAc,QAAQ,CAAC;AACvB,cAAc,WAAW,CAAC;AAC1B,cAAc,WAAW,CAAC;AAC1B,cAAc,iBAAiB,CAAC;AAChC,cAAc,aAAa,CAAC;AAC5B,cAAc,WAAW,CAAC"}
1	+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,SAAS,CAAC;AACxB,cAAc,YAAY,CAAC;AAC3B,cAAc,QAAQ,CAAC;AACvB,cAAc,WAAW,CAAC;AAC1B,cAAc,WAAW,CAAC;AAC1B,cAAc,iBAAiB,CAAC;AAChC,cAAc,aAAa,CAAC;AAC5B,cAAc,WAAW,CAAC;AAC1B,cAAc,cAAc,CAAC"}

package/dist/index.js CHANGED Viewed

@@ -22,4 +22,5 @@ __exportStar(require("./prompts"), exports);
 __exportStar(require("./clarification"), exports);
 __exportStar(require("./discovery"), exports);
 __exportStar(require("./verdict"), exports);
+__exportStar(require("./forge-root"), exports);
 //# sourceMappingURL=index.js.map

package/dist/index.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;AAAA,0CAAwB;AACxB,6CAA2B;AAC3B,yCAAuB;AACvB,4CAA0B;AAC1B,4CAA0B;AAC1B,kDAAgC;AAChC,8CAA4B;AAC5B,4CAA0B"}
1	+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;AAAA,0CAAwB;AACxB,6CAA2B;AAC3B,yCAAuB;AACvB,4CAA0B;AAC1B,4CAA0B;AAC1B,kDAAgC;AAChC,8CAA4B;AAC5B,4CAA0B;AAC1B,+CAA6B"}

package/dist/prompts.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"prompts.d.ts","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH;;;GAGG;AACH,wBAAgB,YAAY,CAAC,OAAO,EAAE,MAAM,EAAE,cAAc,EAAE,MAAM,GAAG,MAAM,CAkC5E;AAED;;;;;;;;;;;;GAYG;AACH,wBAAgB,aAAa,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,~~CA6FzD~~;AAED;;;GAGG;AACH,wBAAgB,YAAY,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAgBxD"}
1	+ {"version":3,"file":"prompts.d.ts","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH;;;GAGG;AACH,wBAAgB,YAAY,CAAC,OAAO,EAAE,MAAM,EAAE,cAAc,EAAE,MAAM,GAAG,MAAM,CAkC5E;AAED;;;;;;;;;;;;GAYG;AACH,wBAAgB,aAAa,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAkGzD;AAED;;;GAGG;AACH,wBAAgB,YAAY,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAgBxD"}

package/dist/prompts.js CHANGED Viewed

@@ -74,15 +74,15 @@ A verdict without citation is treated as NO_GO — do not assert PASS without ev
 **Dimensions:**
-1. **Binary acceptance criteria** — are all sub-task ACs measurable and unambiguous? Are any ACs pre-checked (- [x]) in the draft, which is always invalid?
+1. **Binary acceptance criteria** — are all sub-task ACs measurable and unambiguous? Are any ACs pre-checked (- [x]) in the draft, which is always invalid? If Section 1.1 defines a quantitative quality bar, Section 1.4 must include at least one AC that directly enforces that bar (same metric family and threshold intent). Otherwise, FAIL.
 2. **Scope coherence** — are in-scope and out-of-scope boundaries clearly stated and non-contradictory? Check for conflicts between conceptual terminology (e.g., state names, entity names used in descriptions) and persisted/modelled representations (e.g., enums, schemas, data shapes). Any mismatch is a HARD_FAIL.
 3. **Constraint sufficiency** — do Must / Must-Not / Prefer / Escalate constraints cover the key risks?
-4. **Decomposition realism** — can each sub-task be completed within the stated 2-hour limit by a skilled agent? Check that sub-task dependencies are stated explicitly (if task B requires task A's output, that must be documented).
+4. **Decomposition realism** — can each sub-task be completed within the stated 2-hour limit by a skilled agent? Check that sub-task dependencies are stated explicitly (if task B requires task A's output, that must be documented). If Section 1.1 defines a benchmark/dataset path, at least one sub-task must explicitly own creating or curating that benchmark artifact.
 5. **Start-without-clarification viability** — can an agent begin immediately with the context provided, without asking the human for more information?
-6. **Internal consistency** — are terms, names, and concepts used consistently throughout the spec? Flag any case where the same entity is described differently in different sections (e.g., "webhook event" in scope, "push notification" in ACs — are these the same thing?).
-7. **Mechanism lock** — does every core flow (data pipeline, primary algorithm, key architecture choice) commit to exactly ONE implementation approach? Scan for "e.g.", "or", "could", "might", "such as" in Deliverable, Scope, Constraints, and Task decomposition sections. If any of these offer multiple options without making an explicit decision, that is a HARD_FAIL — a spec that leaves the mechanism unresolved forces the builder to choose architecture, creating drift in multi-builder workflows.
+6. **Internal consistency** — are terms, names, and concepts used consistently throughout the spec? Flag any case where the same entity is described differently in different sections (e.g., "webhook event" in scope, "push notification" in ACs — are these the same thing?). Treat data-type/precision mismatches as consistency defects (e.g., float vs decimal/cents representation). Also FAIL if obvious template artifact text remains (example guidance that should have been replaced by project-specific content).
+7. **Mechanism lock** — does every core flow (data pipeline, primary algorithm, key architecture choice) commit to exactly ONE implementation approach? Scan for "e.g.", "or", "could", "might", "such as" in Deliverable, Scope, Constraints, and Task decomposition sections. If any of these offer multiple options without making an explicit decision, that is a HARD_FAIL — a spec that leaves the mechanism unresolved forces the builder to choose architecture, creating drift in multi-builder workflows. Additional lock checks: (a) external service lock requires concrete vendor/model (env var names alone are not sufficient), (b) auth lock requires token/session type plus expiry policy, (c) endpoint lists without request/response field shapes are SOFT_FAIL.
 8. **Convergence declaration** — does the spec include a \`## Spec Convergence\` section with \`open_questions: 0\` and \`ambiguities_remaining: 0\`? If the section is absent, or either value is non-zero, or \`ready_for_build\` is "no", that is a HARD_FAIL. This is the formal GO gate — a spec with declared open questions is not agent-ready regardless of how well the other sections are written.
-9. **Adversarial gap scan** — does the spec include an \`## Edge Cases and Failure Modes\` section that is substantively filled? Steelman the spec as a builder who will follow it exactly: what will you hit mid-execution that the spec does not answer? What implicit assumption is baked in but undeclared? What failure mode from the FORGE taxonomy (scope creep, hallucinated completion, intent drift, context collapse, runaway cost, overconfident output) does the spec not address? An absent or blank section is a HARD_FAIL for medium/high risk specs. Placeholder-only content (\`___\`) is treated as absent.
+9. **Adversarial gap scan** — does the spec include an \`## Edge Cases and Failure Modes\` section that is substantively filled? Steelman the spec as a builder who will follow it exactly: what will you hit mid-execution that the spec does not answer? What implicit assumption is baked in but undeclared? What failure mode from the FORGE taxonomy (scope creep, hallucinated completion, intent drift, context collapse, runaway cost, overconfident output) does the spec not address? For mobile/capture flows, explicitly check offline/no-network behavior at capture time and retry/sync handling. An absent or blank section is a HARD_FAIL for medium/high risk specs. Placeholder-only content (\`___\`) is treated as absent.
 10. **Architecture completeness** — for medium/high coding specs, does the spec include \`## 1.6 Architecture Lock\` with all required fields resolved: Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, Auth implementation? Any missing field, blank placeholder, or \`TBD\` is a HARD_FAIL. For low/unknown/non-coding specs, this is NOTE-only advisory.
 ## Severity classification
@@ -103,6 +103,11 @@ Keep it brief: one sentence + one evidence citation per dimension. No preamble,
 Then write a consolidated remediation list for all FAIL dimensions.
+After writing the 10 dimensions, run one extra ambiguity sweep:
+- Ask: "Could two competent builders implement this spec in materially different ways while both claiming compliance?"
+- If yes, add those gaps as issues (SOFT_FAIL or HARD_FAIL based on impact) and include exact sections to fix.
+- If no, state one sentence: "No additional ambiguity gaps found in final sweep."
 Note: implementation correctness is explicitly out of scope for this review.
 ## Re-review protocol
@@ -139,7 +144,7 @@ If the spec passes all dimensions with no HARD_FAIL issues, use:
 {"passed": true, "schema_version": "2", "issues": []}
 \`\`\`
-If and only if \`passed\` is false, immediately append this section after the verdict block:
+If \`issues\` is non-empty (including SOFT_FAIL/NOTE with \`passed: true\`), immediately append this section after the verdict block:
 ## Paste this to your builder session:

package/dist/prompts.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"prompts.js","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":";AAAA;;;;GAIG;;AAMH,oCAkCC;AAeD,~~sCA6FC~~;AAMD,oCAgBC;~~AAxKD~~;;;GAGG;AACH,SAAgB,YAAY,CAAC,OAAe,EAAE,cAAsB;IAClE,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA6BP,cAAc,IAAI,2FAA2F;;;iBAG9F,OAAO,EAAE,CAAC;AAC3B,CAAC;AAED;;;;;;;;;;;;GAYG;AACH,SAAgB,aAAa,CAAC,WAAmB;IAC/C,OAAO~~;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA2FP~~,WAAW,EAAE,CAAC;AAChB,CAAC;AAED;;;GAGG;AACH,SAAgB,YAAY,CAAC,WAAmB;IAC9C,OAAO;;;;;;;;;;;;;;EAcP,WAAW,EAAE,CAAC;AAChB,CAAC"}
1	+ {"version":3,"file":"prompts.js","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":";AAAA;;;;GAIG;;AAMH,oCAkCC;AAeD,sCAkGC;AAMD,oCAgBC;AA7KD;;;GAGG;AACH,SAAgB,YAAY,CAAC,OAAe,EAAE,cAAsB;IAClE,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA6BP,cAAc,IAAI,2FAA2F;;;iBAG9F,OAAO,EAAE,CAAC;AAC3B,CAAC;AAED;;;;;;;;;;;;GAYG;AACH,SAAgB,aAAa,CAAC,WAAmB;IAC/C,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EAgGP,WAAW,EAAE,CAAC;AAChB,CAAC;AAED;;;GAGG;AACH,SAAgB,YAAY,CAAC,WAAmB;IAC9C,OAAO;;;;;;;;;;;;;;EAcP,WAAW,EAAE,CAAC;AAChB,CAAC"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "nimai-core",
-  "version": "0.4.7",
+  "version": "0.4.9",
   "description": "Nimai core library — template loading, lint engine, context extraction. No LLM dependencies.",
   "keywords": [
     "nimai",
@@ -19,7 +19,8 @@
     "directory": "packages/core"
   },
   "files": [
-    "dist/"
+    "dist/",
+    "data/"
   ],
   "main": "dist/index.js",
   "types": "dist/index.d.ts",