nimai-core 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,326 @@
1
+ # FORGE Quick Reference
2
+ > **You are here:** This is the operational tool — the doc an agent uses during active work.
3
+ >
4
+ > **The other docs in this system:**
5
+ > - **Canonical Framework** (`FORGE-canonical.md`) — deep explanation of every concept. Go there to understand *why* something works, not just *what* to do.
6
+ > - **Spec Template** (`FORGE-spec-template.md`) — fill-in-the-blank form for a concrete project. Go there when you have a direction and need to turn it into an agent-ready brief.
7
+
8
+ ---
9
+
10
+ ## Agent Routing: What To Do Based On The Request
11
+
12
+ *Read the request and match it to a route before doing anything else.*
13
+
14
+ | If the request is… | Route |
15
+ |---|---|
16
+ | A new idea or creative problem — no direction yet | Run **Divergence → Convergence Loop** below. Stay in this doc. |
17
+ | A loose or vague request that needs structuring | Run **Self-Spec Agent** prompt (see below) to generate a draft spec. |
18
+ | A specific project ready to execute | Go to **Spec Template** → fill it out → deploy. |
19
+ | How to approach or frame something | Use **4 Layers + 5 Primitives** below. Deeper detail in Canonical doc. |
20
+ | Reviewing, validating, or debugging existing work | Run **Failure Mode Taxonomy** red-team below. |
21
+ | Something feels wrong mid-project | Check **Failure Mode Taxonomy** first, then check Intent Layer for drift. |
22
+ | A concept here that needs deeper explanation | Go to **Canonical Framework** for the full treatment. |
23
+ | Building a multi-agent or long-running system | Go to **Canonical Framework** → Execution Architecture + Extended Patterns. |
24
+
25
+ **Default rule:** Start here. Go to Canonical to understand. Go to Spec Template to execute.
26
+
27
+ ---
28
+
29
+ ## The Deeper Thesis
30
+ Agents fail not from lack of intelligence but from **underspecified control surfaces**. The four control surfaces are specification, intent, context, and prompt. Engineer all four.
31
+
32
+ ---
33
+
34
+ ## The 4 Engineering Disciplines / Control Surfaces
35
+
36
+ | Discipline | Control Surface | Skip It And… |
37
+ |---|---|---|
38
+ | **Prompt Craft** | Sub-task trigger | Agent doesn't know what to do right now |
39
+ | **Context Engineering** | Information environment | Agent works from noise or stale data |
40
+ | **Intent Engineering** | Values, trade-offs, deployment purpose | Agent resolves ambiguity the wrong way |
41
+ | **Specification Engineering** | Complete blueprint for "done" | Agent can't define done and never stops |
42
+
43
+ ---
44
+
45
+ ## The 4 Layers (in order of leverage)
46
+
47
+ | Layer | Role | Anti-Pattern |
48
+ |---|---|---|
49
+ | **1. Specification** | Defines exact "done" state + scope | Vague goals ("improve this") |
50
+ | **2. Intent** | Agent's deployment purpose + trade-off hierarchy | Agent doesn't know its own role |
51
+ | **3. Context** | Curated information environment | Dumping everything you have |
52
+ | **4. Prompt** | Fires sub-task + assigns cognitive mode | Over-engineering this layer |
53
+
54
+ ---
55
+
56
+ ## The 5 Primitives (every sub-task must pass all 5)
57
+
58
+ 1. **Self-Contained** — Zero questions needed to start
59
+ 2. **Constrained** — Must / Must-Not / Prefer / Escalate defined
60
+ 3. **Modular** — Under 2 hours; independently verifiable
61
+ 4. **Acceptance Criteria** — Binary, measurable (not "looks good")
62
+ 5. **Evaluation** — Built-in check before reporting complete
63
+
64
+ ---
65
+
66
+ ## Pre-Execution Decisions (set BEFORE decomposition)
67
+
68
+ **Risk Tier:**
69
+ | Tier | Validation |
70
+ |---|---|
71
+ | Low | Self-check only |
72
+ | Medium | Validator pass |
73
+ | High | Validator + Adversarial Reflection + Human gate |
74
+
75
+ **Resource Governance:** Model tier · Max runtime · Cost budget · Retry limit · Cost escalation trigger
76
+
77
+ ---
78
+
79
+ ## Cognitive Modes (reasoning postures, not personas)
80
+
81
+ | Mode | Use When |
82
+ |---|---|
83
+ | **Deterministic** | Implementation; answer is knowable |
84
+ | **Exploratory** | Research, brainstorming, hypothesis generation |
85
+ | **Adversarial** | Security, risk, stress-testing |
86
+ | **Synthesis** | Integrating multiple outputs |
87
+ | **Audit** | Validation only — do not produce |
88
+
89
+ **Over-specification warning:** Constrain evaluation, not divergence. In Exploratory phases, tight constraints on generation defeat the purpose. Reserve hard constraints for Adversarial and Deterministic phases.
90
+
91
+ **Mode × Risk compatibility:** Exploratory mode is a pre-commitment mode. High risk tasks should be in Deterministic or Audit by final delivery. Exploratory + High risk as a final-delivery combination is a design error.
92
+
93
+ ---
94
+
95
+ ## Divergence → Convergence Loop (brainstorming)
96
+
97
+ ```
98
+ Exploratory → Synthesis → Adversarial → Deterministic
99
+ DIVERGE CLUSTER STRESS-TEST FORMALIZE
100
+ ```
101
+ Generate wide → cluster patterns → attack top candidates → execute selected direction.
102
+
103
+ ---
104
+
105
+ ## Failure Mode Taxonomy (red-team before deployment)
106
+
107
+ | Failure | Cause | Fix |
108
+ |---|---|---|
109
+ | Scope Creep | Ambiguous boundaries | Explicit Must-Not list |
110
+ | Hallucinated Completion | Subjective criteria | Binary acceptance criteria |
111
+ | Intent Drift | Unclear trade-offs or role | Ranked priorities + deployment purpose |
112
+ | Context Collapse | Too much noise | Aggressive curation; MCP for live sources |
113
+ | Runaway Cost | No resource ceiling | Hard caps before decomposition |
114
+ | Overconfident Output | No uncertainty surfacing | Uncertainty reporting for high-stakes tasks |
115
+
116
+ *If a failure doesn't fit these categories, extend the taxonomy — log it, identify root cause, add prevention.*
117
+
118
+ ---
119
+
120
+ ## Intent = Deployment Purpose + Trade-offs
121
+ Every agent must know: **what it is · what it isn't · who consumes its output · what to optimize for when uncertain**
122
+
123
+ ## Adversarial Reflection (Medium/High risk)
124
+ Produce → steelman critique → evaluate → revise → re-validate
125
+
126
+ ## Uncertainty Reporting (non-deterministic domains)
127
+ Confidence % · uncertainty drivers · what would change the answer · alternative interpretations
128
+
129
+ ## Escalation Contract
130
+ WHEN to stop · WHAT to report (include confidence) · WHO decides (named + SLA) · WHAT if no response
131
+
132
+ ## Context Hygiene
133
+ 2k high-signal tokens > 20k low-signal tokens · MCP for live sources · specify when to re-fetch
134
+
135
+ ---
136
+
137
+ ## When To Use This Framework
138
+
139
+ The overhead should be proportional to the cost of failure. Use this as your guide:
140
+
141
+ | Situation | What to use |
142
+ |---|---|
143
+ | Task is reversible, low-stakes, under 1 hour | Routing table above + nothing else. Just start. |
144
+ | Task is consequential or takes more than 1 hour | Quickref checklist minimum. Self-Spec agent if unsure. |
145
+ | Task is high-risk, irreversible, or affects others | Full Spec Template + Adversarial Reflection + Validator |
146
+ | You're not sure | Ask: "What's the cost if this goes wrong?" Scale accordingly. |
147
+
148
+ **The philosophy:** This framework exists to absorb any project — coding, science, writing, business, anything — and give it the structure needed to succeed without fail. Use as much or as little as the project demands. The principles apply universally; the overhead scales with stakes.
149
+
150
+ ---
151
+
152
+ ## Solo Operator Mode
153
+
154
+ If you are working alone without a team, validators, or named escalation contacts, adapt as follows:
155
+
156
+ **Escalation contract:** You are the reviewer. Set a time delay instead of an SLA — *"I will not proceed on this decision for 24 hours."* Distance and time create the same circuit-break that a second person would.
157
+
158
+ **Validation:** For Medium risk, run the Adversarial Reflection yourself or have a second agent session do it cold — paste the output into a fresh context with no prior history and ask it to find flaws.
159
+
160
+ **High risk solo:** Add a mandatory pause before any irreversible action. Write out the decision, the alternatives you considered, and your confidence level before executing. This is your human gate.
161
+
162
+ **The Self-Spec + Reviewer pipeline replaces team infrastructure** — see below.
163
+
164
+ ---
165
+
166
+ ## Self-Spec Agent + Reviewer Pipeline
167
+
168
+ This is the solo operator's complete workflow. Two prompts replace an entire team process.
169
+
170
+ ---
171
+
172
+ ### Prompt 1 — Self-Spec Agent
173
+ *Use when you have a loose idea or vague request and need it turned into a complete spec.*
174
+
175
+ Paste this prompt to any capable model along with your loose request:
176
+
177
+ ```
178
+ You are a Specification Engineering agent operating under the FORGE.
179
+
180
+ Your job is to take the loose request below and generate a complete draft spec
181
+ using the framework's structure. Do not execute the request — only spec it.
182
+
183
+ For each section, fill in what you can infer and mark anything uncertain
184
+ with [NEEDS HUMAN INPUT: reason].
185
+
186
+ Generate:
187
+ 1. Final deliverable (precise, format, measurable quality bar)
188
+ 2. Scope boundaries (in / out)
189
+ 3. Agent deployment purpose (what it is, is not, who consumes output)
190
+ 4. Trade-off hierarchy (ranked: accuracy / speed / cost / safety / other)
191
+ 5. Constraint architecture (Must / Must-Not / Prefer / Escalate)
192
+ 6. Task decomposition (sub-tasks under 2 hours, with acceptance criteria)
193
+ 7. Risk tier (Low / Medium / High with reasoning)
194
+ 8. Cognitive mode per sub-task
195
+ 9. Context needed (what the executing agent requires)
196
+ 10. Architecture Lock details for medium/high coding specs (Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, Auth implementation)
197
+ 11. Proposed validator prompt (what a reviewer should check)
198
+
199
+ Loose request: [PASTE YOUR REQUEST HERE]
200
+ ```
201
+
202
+ Review the output. Resolve all [NEEDS HUMAN INPUT] flags. Adjust anything that
203
+ doesn't match your intent. Approved spec = your deployment brief.
204
+
205
+ ---
206
+
207
+ ### Prompt 1.5 — Spec-Quality Reviewer
208
+ *Run this on a draft spec before approving it. The reviewing LLM checks whether the spec is agent-ready.*
209
+
210
+ ```
211
+ You are the independent reviewer for this draft spec. Evaluate it now.
212
+ You are a Specification Quality Reviewer operating under the FORGE framework.
213
+
214
+ Your job is to evaluate the draft spec below for spec quality only — not implementation correctness.
215
+ Do NOT assess any code, code diffs, or implementation output.
216
+
217
+ Evaluate each dimension. Cite evidence for every verdict — a verdict without citation is treated as NO_GO.
218
+ Classify each issue as HARD_FAIL, SOFT_FAIL, or NOTE. passed: true requires zero HARD_FAILs.
219
+
220
+ 1. Binary acceptance criteria — are all sub-task ACs measurable and unambiguous? Pre-checked [x] ACs are always invalid.
221
+ 2. Scope coherence — are in-scope/out-of-scope boundaries clear and non-contradictory? Terminology mismatches between conceptual and persisted representations are a HARD_FAIL.
222
+ 3. Constraint sufficiency — do Must/Must-Not/Prefer/Escalate constraints cover the key risks?
223
+ 4. Decomposition realism — can each sub-task be done within 2 hours? Sub-task dependencies must be explicit.
224
+ 5. Start-without-clarification viability — can an agent begin immediately with the context provided?
225
+ 6. Internal consistency — are terms, names, and concepts used consistently throughout?
226
+ 7. Mechanism lock — does every core flow commit to exactly ONE implementation path? Any "e.g./or/could/might/such as" in Deliverable, Scope, Constraints, or Tasks is a HARD_FAIL.
227
+ 8. Convergence declaration — does the spec include a Spec Convergence section with open_questions: 0 and ambiguities_remaining: 0? Absent, non-zero, or ready_for_build: no is a HARD_FAIL.
228
+ 9. Adversarial gap scan — does the spec include a substantively filled Edge Cases and Failure Modes section? Steelman as a builder following it exactly: what failure modes are unaddressed? Absent or placeholder-only is a HARD_FAIL for medium/high risk specs.
229
+ 10. Architecture completeness — for medium/high coding specs, does the spec include `## 1.6 Architecture Lock` with resolved fields for Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, and Auth implementation? Missing, blank, `___`, or `TBD` is a HARD_FAIL.
230
+
231
+ Always end your response with:
232
+ ## Verdict
233
+ ```json
234
+ {"passed": <true|false>, "schema_version": "2", "issues": [{"dimension": "<name>", "severity": "<HARD_FAIL|SOFT_FAIL|NOTE>", "detail": "<cited evidence>"}]}
235
+ ```
236
+
237
+ Draft spec: [PASTE DRAFT SPEC HERE]
238
+ ```
239
+
240
+ The host agent parses the `## Verdict` JSON block to drive the review loop:
241
+ - `passed: true` → proceed to implementation
242
+ - `passed: false` → refine the spec using the `issues` list, then re-review
243
+ - Block absent or malformed → escalate to the human
244
+
245
+ Use `nimai_spec_review` to generate this prompt automatically from a spec file.
246
+
247
+ When Prompt 1.5 fails, the reviewer should append a `## Builder Brief` section after the verdict with one actionable fix per issue (section to edit + concrete fix direction).
248
+
249
+ ---
250
+
251
+ ### Prompt 2 — Reviewer / Validator Prompt Generator
252
+ *Run this after Self-Spec is approved. Paste along with the approved spec.*
253
+
254
+ ```
255
+ You are a Specification Engineering agent.
256
+
257
+ Given the approved spec below, generate a Reviewer Prompt — precise instructions
258
+ for a validator agent or solo reviewer to check the executing agent's output.
259
+
260
+ The Reviewer Prompt must:
261
+ - State exactly what is being checked and why
262
+ - List binary pass/fail criteria from the spec's acceptance criteria
263
+ - Include Adversarial Reflection sequence if risk tier is Medium or High
264
+ - Include uncertainty reporting requirements if domain is non-deterministic
265
+ - Specify what PASS looks like and what FAIL triggers (revise / escalate / abort)
266
+ - Be usable by a solo operator with no additional context
267
+
268
+ Approved spec: [PASTE APPROVED SPEC HERE]
269
+ ```
270
+
271
+ ---
272
+
273
+ ### The Complete Solo Pipeline
274
+
275
+ ```
276
+ Loose request
277
+
278
+ Self-Spec Agent (Prompt 1) → draft spec
279
+
280
+ Spec-Quality Reviewer (Prompt 1.5) → passed? → NO → refine spec → loop
281
+ ↓ YES
282
+ Human reviews + resolves [NEEDS HUMAN INPUT] flags
283
+
284
+ Approved spec → Reviewer Prompt Generator (Prompt 2) → validator prompt
285
+
286
+ Deploy executing agent with approved spec
287
+
288
+ Paste output + validator prompt into fresh agent session
289
+
290
+ PASS → done | FAIL → revise or escalate
291
+ ```
292
+
293
+ ---
294
+
295
+ ## Pre-Deployment Checklist
296
+
297
+ **Specification**
298
+ - [ ] Deliverable precise; scope boundaries explicit
299
+ - [ ] Sub-tasks satisfy all 5 Primitives
300
+ - [ ] For medium/high coding specs, `## 1.6 Architecture Lock` is present and fully resolved (no blank/`___`/`TBD` fields)
301
+
302
+ **Intent**
303
+ - [ ] Each agent has explicit deployment purpose
304
+ - [ ] Trade-off hierarchy ranked; escalation contract complete
305
+
306
+ **Context**
307
+ - [ ] Context curated; MCP for live sources; freshness strategy set
308
+
309
+ **Execution** *(set before decomposition)*
310
+ - [ ] Risk tier assigned
311
+ - [ ] Resource Governance parameters set
312
+ - [ ] Cognitive mode per sub-task assigned
313
+ - [ ] Validation routes by risk tier
314
+ - [ ] Adversarial Reflection for medium/high risk
315
+ - [ ] Uncertainty reporting for non-deterministic domains
316
+
317
+ **Meta**
318
+ - [ ] Spec red-teamed against Failure Mode Taxonomy
319
+ - [ ] Planner execution plan will be saved as artifact
320
+ - [ ] Living Specification log set up (multi-session)
321
+ - [ ] Spec Convergence section filled — `open_questions: 0`, `ambiguities_remaining: 0`, `ready_for_build: yes`
322
+
323
+ ---
324
+
325
+ *FORGE v1.0 — Framework for Orchestrating Reliable Generative Execution*
326
+ *System docs: Quick Reference (this doc) · Canonical Framework · Spec Template*
@@ -0,0 +1,454 @@
1
+ # FORGE Spec Template
2
+ > **You are here:** This is the Spec Template — the execution tool. Use it to turn a project direction into a complete agent-ready brief.
3
+ >
4
+ > **The other docs in this system:**
5
+ > - **Quick Reference** (`FORGE-quickref.md`) — start here if you don't have a project direction yet. If the request is a new idea or open-ended, go to the Quick Reference and run the Divergence → Convergence Loop or the Self-Spec Agent first. Come back here once you have a chosen direction.
6
+ > - **Canonical Framework** (`FORGE-canonical.md`) — go there if you encounter a concept in this template you don't fully understand. Each section maps to a section in the canonical doc.
7
+ >
8
+ > **When to be in this doc:** You have a specific project or direction. You are ready to define it completely before handing it to an agent. If you can't fill in a field, that is an unresolved decision — resolve it here, not mid-run.
9
+ >
10
+ > **Solo operator shortcut:** Instead of filling this manually, use the Self-Spec Agent prompt in the Quick Reference to generate a draft, then review and approve it here.
11
+
12
+ ---
13
+
14
+ > Fill in every field before handing to an agent. A blank field is an unresolved decision — resolve it here, not mid-run.
15
+ > *Based on the FORGE*
16
+
17
+ ---
18
+
19
+ ## 0. Pre-Flight Decisions
20
+ > These must be set first. Everything else is built on top of them.
21
+
22
+ **Risk Tier:** [ ] Low [ ] Medium [ ] High
23
+ *(If unsure, go one tier higher)*
24
+
25
+ **Primary Cognitive Mode for this project:**
26
+ [ ] Deterministic [ ] Exploratory [ ] Adversarial [ ] Synthesis [ ] Audit
27
+ [ ] Multi-phase — using Divergence → Convergence Loop (see Section 6)
28
+ *→ Unsure which mode? See Cognitive Mode table in Quick Reference or Canonical doc.*
29
+
30
+ **Resource Governance:**
31
+ - Model tier (Planner): `_______________`
32
+ - Model tier (Workers): `_______________`
33
+ - Max runtime per sub-task: `_______________`
34
+ - Total compute / cost budget: `_______________`
35
+ - Retry limit before escalation: `_______________`
36
+ - Cost threshold that triggers stop-and-report: `_______________`
37
+
38
+ ---
39
+
40
+ ## 1. Specification Layer — The Blueprint
41
+
42
+ ### 1.1 Final Deliverable
43
+ *Describe precisely. Not "a good analysis" — describe the exact artifact, format, and measurable quality.*
44
+
45
+ ```
46
+ Deliverable: _______________________________________________
47
+
48
+ Format: ___________________________________________________
49
+
50
+ Length / Size: _____________________________________________
51
+
52
+ Measurable quality bar: ____________________________________
53
+
54
+ Benchmark dataset (if quality bar requires measurement):
55
+ path: _______________ (e.g. tests/fixtures/receipts/ or N/A)
56
+ schema: _____________ (field names and types of each labeled example, or N/A)
57
+ threshold computation rule: ___ (e.g. "pass rate = correct / total labeled examples", or N/A)
58
+ ```
59
+
60
+ ### 1.2 Scope Boundaries
61
+ *Be explicit. Ambiguity here becomes scope creep mid-run.*
62
+
63
+ **In scope:**
64
+ - `_______________`
65
+ - `_______________`
66
+ - `_______________`
67
+
68
+ **Out of scope (Must-Not):**
69
+ - `_______________`
70
+ - `_______________`
71
+ - `_______________`
72
+
73
+ ### 1.3 Task Decomposition
74
+ *Break the project into sub-tasks. Each must satisfy the 5 Primitives and complete in under 2 hours.*
75
+ *→ Need a reminder of the 5 Primitives? See Quick Reference or Canonical doc Section "5 Primitives."*
76
+
77
+ | # | Sub-task | Cognitive Mode | Risk Tier | Acceptance Criteria | Eval Method |
78
+ |---|---|---|---|---|---|
79
+ | 1 | | | | | |
80
+ | 2 | | | | | |
81
+ | 3 | | | | | |
82
+ | 4 | | | | | |
83
+ | 5 | | | | | |
84
+
85
+ *Add rows as needed. Every sub-task needs all five columns filled.*
86
+
87
+ *Note: Sub-task risk tiers may differ from the overall project tier. A research sub-task and a production deployment sub-task in the same project have different risk profiles — assign each independently. The project-level tier sets the floor for escalation; sub-task tiers govern validation routing.*
88
+
89
+ ### 1.4 Acceptance Criteria — Master Definition of Done
90
+ *Binary. Measurable. No subjective criteria.*
91
+
92
+ The project is complete when ALL of the following are true:
93
+ - [ ] `_______________`
94
+ - [ ] `_______________`
95
+ - [ ] `_______________`
96
+
97
+ ---
98
+
99
+ ## 1.5 Mechanism Decision
100
+ *One block per core architectural or algorithmic choice. If no core mechanism applies, write "N/A" and justify.*
101
+ *A builder must be able to start coding without choosing missing architecture. If any mechanism is undecided, resolve it here first.*
102
+
103
+ ---
104
+
105
+ **Decision: [short name, e.g., "Primary parsing method"]**
106
+
107
+ ```
108
+ Chosen approach: _______________________________________________
109
+
110
+ Rejected alternatives: _________________________________________
111
+
112
+ Why rejected: __________________________________________________
113
+
114
+ Impact on ACs: _________________________________________________
115
+ ```
116
+
117
+ ---
118
+
119
+ ## 1.6 Architecture Lock
120
+ *Required for medium/high coding specs. Advisory for low/unknown/non-coding specs.*
121
+ *All fields must be resolved before builder handoff. Blank placeholders and "TBD" are unresolved.*
122
+
123
+ ```
124
+ Persistence layer: _______________________________________________
125
+
126
+ File/object storage: _____________________________________________
127
+
128
+ External model/service + env var: ________________________________
129
+
130
+ API trigger flow: ________________________________________________
131
+
132
+ Entity status state machine: _____________________________________
133
+
134
+ Auth implementation: _____________________________________________
135
+ ```
136
+
137
+ ---
138
+
139
+ ## 2. Intent Layer — The Compass
140
+
141
+ ### 2.1 Agent Deployment Purpose
142
+ *Tell the agent what it is, what it is not, and who consumes its output. One clear paragraph.*
143
+
144
+ ```
145
+ You are: ___________________________________________________
146
+
147
+ You are NOT responsible for: _______________________________
148
+
149
+ Your output is consumed by: ________________________________
150
+ (human decision-maker / another agent / automated pipeline)
151
+
152
+ Your output feeds into: ____________________________________
153
+ ```
154
+
155
+ ### 2.2 Trade-off Hierarchy
156
+ *Rank these in order of priority. The agent will use this when it hits a fork.*
157
+
158
+ Rank the following from 1 (highest) to n (lowest) for this task:
159
+
160
+ | Priority | Value |
161
+ |---|---|
162
+ | `___` | Accuracy / Correctness |
163
+ | `___` | Speed / Efficiency |
164
+ | `___` | Cost / Token economy |
165
+ | `___` | Safety / Risk avoidance |
166
+ | `___` | Novelty / Creativity |
167
+ | `___` | Completeness |
168
+ | `___` | Simplicity / Readability |
169
+ | `___` | *(add domain-specific value)* |
170
+
171
+ ### 2.3 Constraint Architecture
172
+
173
+ **Must (non-negotiable requirements):**
174
+ - `_______________`
175
+ - `_______________`
176
+
177
+ **Must-Not (hard prohibitions):**
178
+ - `_______________`
179
+ - `_______________`
180
+
181
+ **Prefer (soft preferences when trade-offs arise):**
182
+ - `_______________`
183
+ - `_______________`
184
+
185
+ **Escalate (stop and surface to human when):**
186
+ - `_______________`
187
+ - `_______________`
188
+
189
+ ### 2.4 Forbidden Approaches
190
+ *Specific methods, tools, frameworks, or reasoning patterns to avoid.*
191
+
192
+ - `_______________`
193
+ - `_______________`
194
+
195
+ ---
196
+
197
+ ## 3. Context Layer — The Environment
198
+
199
+ ### 3.1 Provided Context
200
+ *List all documents, artifacts, or data sources being provided. Mark each as AUTHORITATIVE or REFERENCE.*
201
+
202
+ | Source | Type | Authority Level | Notes |
203
+ |---|---|---|---|
204
+ | | | AUTHORITATIVE / REFERENCE | |
205
+ | | | AUTHORITATIVE / REFERENCE | |
206
+ | | | AUTHORITATIVE / REFERENCE | |
207
+
208
+ ### 3.2 Known State
209
+ *What has already been tried? What failed? What is known?*
210
+
211
+ ```
212
+ Prior attempts: ____________________________________________
213
+
214
+ Known failures / dead ends: ________________________________
215
+
216
+ Known constraints: _________________________________________
217
+
218
+ Current blockers: __________________________________________
219
+ ```
220
+
221
+ ### 3.3 Context Freshness
222
+ *Tell the agent when to trust the provided context vs. when to re-fetch or verify.*
223
+
224
+ [ ] All provided context is current — use as ground truth
225
+ [ ] The following sources may be stale and should be verified: `_______________`
226
+ [ ] The agent must re-fetch live data for: `_______________`
227
+ [ ] MCP connections available: `_______________`
228
+
229
+ ### 3.4 Domain Conventions
230
+ *Terminology, style, standards, or norms the agent must match.*
231
+
232
+ ```
233
+ Terminology to use: ________________________________________
234
+
235
+ Terminology to avoid: ______________________________________
236
+
237
+ Style / format standards: __________________________________
238
+
239
+ Domain-specific conventions: _______________________________
240
+ ```
241
+
242
+ ---
243
+
244
+ ## 4. Prompt Layer — The Trigger
245
+
246
+ ### 4.1 Opening System Instruction
247
+ *The master instruction that frames the entire run. Reference Sections 1–3 explicitly.*
248
+
249
+ ```
250
+ You are [deployment purpose from 2.1].
251
+
252
+ Your task is to produce [deliverable from 1.1].
253
+
254
+ You are operating within the following constraints: [from 2.3].
255
+
256
+ Your trade-off priority order is: [from 2.2].
257
+
258
+ The context you have been provided is: [from 3.1].
259
+
260
+ You will complete this task by executing the following sub-tasks in order: [from 1.3].
261
+
262
+ For each sub-task, your completion criterion is: [from 1.4].
263
+ ```
264
+
265
+ ### 4.2 Per-Sub-Task Prompt Template
266
+ *Use this template for each sub-task trigger.*
267
+
268
+ ```
269
+ Sub-task [#]: [name]
270
+ Cognitive mode: [mode]
271
+ Your input: [what you're working from]
272
+ Your output: [exact format and content required]
273
+ Acceptance criteria: [binary criteria from 1.3]
274
+ Evaluation: [how this will be checked]
275
+ Resource cap: [max runtime / tokens for this sub-task]
276
+ ```
277
+
278
+ ---
279
+
280
+ ## 5. Governance & Validation
281
+
282
+ ### 5.1 Escalation Contract
283
+
284
+ **Escalation triggers** *(from 2.3 — repeated here for executor clarity)*:
285
+ - `_______________`
286
+ - `_______________`
287
+
288
+ **Who reviews escalations:**
289
+ ```
290
+ Name / Role: _______________________________________________
291
+ Contact: __________________________________________________
292
+ Response SLA: ______________________________________________
293
+ ```
294
+
295
+ **If no response arrives within SLA, the agent should:**
296
+ [ ] Hold and wait
297
+ [ ] Attempt alternative path: `_______________`
298
+ [ ] Abort and report
299
+
300
+ ### 5.2 Adversarial Reflection Trigger
301
+ *(Required for Medium and High risk tasks)*
302
+
303
+ [ ] Not required (Low risk)
304
+ [ ] Required after sub-task(s): `_______________`
305
+ [ ] Required before final delivery
306
+
307
+ The agent should critique: `_______________`
308
+ The revision threshold is: `_______________` *(what level of critique warrants a revision?)*
309
+
310
+ ### 5.3 Uncertainty Reporting
311
+ *(Required for non-deterministic domains)*
312
+
313
+ [ ] Not required
314
+ [ ] Required — agent must report:
315
+ - Confidence estimate (0–100%) with justification
316
+ - Primary uncertainty drivers
317
+ - What data would most reduce uncertainty
318
+ - Alternative plausible interpretations
319
+
320
+ ---
321
+
322
+ ## 6. Brainstorming Mode (Divergence → Convergence)
323
+ *Complete this section only if primary mode is multi-phase brainstorming.*
324
+
325
+ **Phase 1 — DIVERGE (Exploratory Mode)**
326
+ ```
327
+ Generate: _________________________________________________
328
+ Constraint: No filtering or judgment at this stage.
329
+ Output format: ____________________________________________
330
+ Volume target: ____________________________________________
331
+ ```
332
+
333
+ **Phase 2 — CLUSTER (Synthesis Mode)**
334
+ ```
335
+ Organize the output of Phase 1 by: ________________________
336
+ Identify: _________________________________________________
337
+ Output format: ____________________________________________
338
+ ```
339
+
340
+ **Phase 3 — STRESS-TEST (Adversarial Mode)**
341
+ ```
342
+ Attack the top [n] candidates from Phase 2.
343
+ Criteria to stress-test against: __________________________
344
+ Output format: ____________________________________________
345
+ What constitutes a fatal flaw: ____________________________
346
+ ```
347
+
348
+ **Phase 4 — FORMALIZE (Deterministic Mode)**
349
+ ```
350
+ Selected direction: ______________________________________
351
+ Formalize as: ____________________________________________
352
+ Acceptance criteria: _____________________________________
353
+ ```
354
+
355
+ ---
356
+
357
+ ## 7. Domain-Specific Additions
358
+
359
+ ### If Coding:
360
+ - Language / framework / version: `_______________`
361
+ - Performance targets: `_______________`
362
+ - API / interface contracts: `_______________`
363
+ - Security requirements: `_______________`
364
+ - Test coverage expectation: `_______________`
365
+
366
+ ### If Science / Research:
367
+ - Null hypothesis: `_______________`
368
+ - Falsification criteria: `_______________`
369
+ - Authorized data sources: `_______________`
370
+ - Forbidden data sources: `_______________`
371
+ - Statistical significance threshold: `_______________`
372
+ - Correction method: `_______________`
373
+ - Reproducibility requirements: `_______________`
374
+
375
+ ### If Writing / Content:
376
+ - Audience: `_______________`
377
+ - Purpose: `_______________`
378
+ - Structure: `_______________`
379
+ - Word count / length: `_______________`
380
+ - Style reference: `_______________`
381
+ - Mandatory inclusions: `_______________`
382
+ - Mandatory exclusions: `_______________`
383
+
384
+ ### If Business / Strategy:
385
+ - Decision-maker: `_______________`
386
+ - Decision to be made: `_______________`
387
+ - Options to evaluate: `_______________`
388
+ - Recommendation format: `_______________`
389
+ - Regulatory Must-Nots: `_______________`
390
+ - Stakeholder sensitivities: `_______________`
391
+
392
+ ---
393
+
394
+ ## 8. Spec Validation (Red-Team Checklist)
395
+ *Complete before handing to the agent. Every unchecked box is a known risk.*
396
+
397
+ **5 Primitives — does every sub-task have:**
398
+ - [ ] A self-contained problem statement (zero questions needed to start)
399
+ - [ ] A Constraint Architecture (Must / Must-Not / Prefer / Escalate)
400
+ - [ ] A runtime under 2 hours
401
+ - [ ] Binary acceptance criteria
402
+ - [ ] A built-in evaluation method
403
+
404
+ **Failure Mode Taxonomy — does this spec prevent:**
405
+ - [ ] Scope Creep — explicit Must-Not list and scope boundaries
406
+ - [ ] Hallucinated Completion — binary acceptance criteria defined
407
+ - [ ] Intent Drift — ranked trade-offs and deployment purpose explicit
408
+ - [ ] Context Collapse — context curated; noise removed
409
+ - [ ] Runaway Cost — resource caps set before decomposition
410
+ - [ ] Overconfident Output — uncertainty reporting required (if applicable)
411
+
412
+ **Final gate:**
413
+ - [ ] Risk tier and resource governance set *before* decomposition
414
+ - [ ] Every agent has a deployment purpose statement
415
+ - [ ] Escalation contract complete (who, when, SLA, no-response behavior)
416
+ - [ ] Planner execution plan will be saved as artifact
417
+
418
+ ---
419
+
420
+ ## 8.5 Edge Cases and Failure Modes
421
+ > Required for medium/high risk specs. Red-team the spec before builder handoff.
422
+ > For each scenario, document what the spec says the builder must do. A blank list is a hard stop.
423
+
424
+ **FORGE failure mode coverage:**
425
+ - Scope Creep — which Must-Not explicitly prevents it: `_______________`
426
+ - Hallucinated Completion — which AC is hardest to fake: `_______________`
427
+ - Intent Drift — which decision will the builder most likely get wrong: `_______________`
428
+ - Context Collapse — what context is load-bearing and must not be dropped: `_______________`
429
+ - Runaway Cost — what resource cap prevents unbounded execution: `_______________`
430
+ - Overconfident Output — what requires uncertainty reporting: `_______________`
431
+
432
+ **Domain-specific edge cases:**
433
+ - `_______________`
434
+ - `_______________`
435
+
436
+ ---
437
+
438
+ ## 9. Spec Convergence
439
+ > Fill before handing to a builder. A non-zero count or "no" is a hard stop.
440
+ > Every [NEEDS HUMAN INPUT] flag from spec drafting must be resolved before this section is filled.
441
+
442
+ ```
443
+ open_questions: 0
444
+ ambiguities_remaining: 0
445
+ ready_for_build: yes
446
+ convergence_notes: _______________________________________________
447
+ ```
448
+
449
+ ---
450
+
451
+ *FORGE Spec Template v1.0 — companion to the FORGE system. A blank field is an unresolved decision.*
452
+ *System docs: Quick Reference · Canonical Framework · Spec Template (this doc)*
453
+
454
+ <!-- nimai-spec -->
@@ -0,0 +1,2 @@
1
+ export declare const FORGE_ROOT: string;
2
+ //# sourceMappingURL=forge-root.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"forge-root.d.ts","sourceRoot":"","sources":["../src/forge-root.ts"],"names":[],"mappings":"AAyBA,eAAO,MAAM,UAAU,QAAkB,CAAC"}
@@ -0,0 +1,61 @@
1
+ "use strict";
2
+ var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
3
+ if (k2 === undefined) k2 = k;
4
+ var desc = Object.getOwnPropertyDescriptor(m, k);
5
+ if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
6
+ desc = { enumerable: true, get: function() { return m[k]; } };
7
+ }
8
+ Object.defineProperty(o, k2, desc);
9
+ }) : (function(o, m, k, k2) {
10
+ if (k2 === undefined) k2 = k;
11
+ o[k2] = m[k];
12
+ }));
13
+ var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
14
+ Object.defineProperty(o, "default", { enumerable: true, value: v });
15
+ }) : function(o, v) {
16
+ o["default"] = v;
17
+ });
18
+ var __importStar = (this && this.__importStar) || (function () {
19
+ var ownKeys = function(o) {
20
+ ownKeys = Object.getOwnPropertyNames || function (o) {
21
+ var ar = [];
22
+ for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
23
+ return ar;
24
+ };
25
+ return ownKeys(o);
26
+ };
27
+ return function (mod) {
28
+ if (mod && mod.__esModule) return mod;
29
+ var result = {};
30
+ if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
31
+ __setModuleDefault(result, mod);
32
+ return result;
33
+ };
34
+ })();
35
+ Object.defineProperty(exports, "__esModule", { value: true });
36
+ exports.FORGE_ROOT = void 0;
37
+ const fs = __importStar(require("fs"));
38
+ const path = __importStar(require("path"));
39
+ /**
40
+ * Locate the FORGE docs root by walking up from __dirname until we find
41
+ * FORGE-quickref.md. Works regardless of where the package is installed.
42
+ */
43
+ function findForgeRoot() {
44
+ // 1. Check bundled data/ directory (works when installed via npm)
45
+ const bundled = path.join(__dirname, '..', 'data');
46
+ if (fs.existsSync(path.join(bundled, 'FORGE-quickref.md')))
47
+ return bundled;
48
+ // 2. Walk up from __dirname (works in monorepo dev)
49
+ let dir = __dirname;
50
+ for (let i = 0; i < 10; i++) {
51
+ if (fs.existsSync(path.join(dir, 'FORGE-quickref.md')))
52
+ return dir;
53
+ const parent = path.dirname(dir);
54
+ if (parent === dir)
55
+ break;
56
+ dir = parent;
57
+ }
58
+ throw new Error('Cannot locate FORGE-quickref.md. Ensure the FORGE docs are present at the repo root.');
59
+ }
60
+ exports.FORGE_ROOT = findForgeRoot();
61
+ //# sourceMappingURL=forge-root.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"forge-root.js","sourceRoot":"","sources":["../src/forge-root.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,uCAAyB;AACzB,2CAA6B;AAE7B;;;GAGG;AACH,SAAS,aAAa;IACpB,kEAAkE;IAClE,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,CAAC,SAAS,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;IACnD,IAAI,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,OAAO,EAAE,mBAAmB,CAAC,CAAC;QAAE,OAAO,OAAO,CAAC;IAE3E,oDAAoD;IACpD,IAAI,GAAG,GAAG,SAAS,CAAC;IACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,EAAE,EAAE,CAAC,EAAE,EAAE,CAAC;QAC5B,IAAI,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,mBAAmB,CAAC,CAAC;YAAE,OAAO,GAAG,CAAC;QACnE,MAAM,MAAM,GAAG,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC;QACjC,IAAI,MAAM,KAAK,GAAG;YAAE,MAAM;QAC1B,GAAG,GAAG,MAAM,CAAC;IACf,CAAC;IACD,MAAM,IAAI,KAAK,CACb,sFAAsF,CACvF,CAAC;AACJ,CAAC;AAEY,QAAA,UAAU,GAAG,aAAa,EAAE,CAAC"}
package/dist/index.d.ts CHANGED
@@ -6,4 +6,5 @@ export * from './prompts';
6
6
  export * from './clarification';
7
7
  export * from './discovery';
8
8
  export * from './verdict';
9
+ export * from './forge-root';
9
10
  //# sourceMappingURL=index.d.ts.map
@@ -1 +1 @@
1
- {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,SAAS,CAAC;AACxB,cAAc,YAAY,CAAC;AAC3B,cAAc,QAAQ,CAAC;AACvB,cAAc,WAAW,CAAC;AAC1B,cAAc,WAAW,CAAC;AAC1B,cAAc,iBAAiB,CAAC;AAChC,cAAc,aAAa,CAAC;AAC5B,cAAc,WAAW,CAAC"}
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,SAAS,CAAC;AACxB,cAAc,YAAY,CAAC;AAC3B,cAAc,QAAQ,CAAC;AACvB,cAAc,WAAW,CAAC;AAC1B,cAAc,WAAW,CAAC;AAC1B,cAAc,iBAAiB,CAAC;AAChC,cAAc,aAAa,CAAC;AAC5B,cAAc,WAAW,CAAC;AAC1B,cAAc,cAAc,CAAC"}
package/dist/index.js CHANGED
@@ -22,4 +22,5 @@ __exportStar(require("./prompts"), exports);
22
22
  __exportStar(require("./clarification"), exports);
23
23
  __exportStar(require("./discovery"), exports);
24
24
  __exportStar(require("./verdict"), exports);
25
+ __exportStar(require("./forge-root"), exports);
25
26
  //# sourceMappingURL=index.js.map
package/dist/index.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;AAAA,0CAAwB;AACxB,6CAA2B;AAC3B,yCAAuB;AACvB,4CAA0B;AAC1B,4CAA0B;AAC1B,kDAAgC;AAChC,8CAA4B;AAC5B,4CAA0B"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;AAAA,0CAAwB;AACxB,6CAA2B;AAC3B,yCAAuB;AACvB,4CAA0B;AAC1B,4CAA0B;AAC1B,kDAAgC;AAChC,8CAA4B;AAC5B,4CAA0B;AAC1B,+CAA6B"}
@@ -1 +1 @@
1
- {"version":3,"file":"prompts.d.ts","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH;;;GAGG;AACH,wBAAgB,YAAY,CAAC,OAAO,EAAE,MAAM,EAAE,cAAc,EAAE,MAAM,GAAG,MAAM,CAkC5E;AAED;;;;;;;;;;;;GAYG;AACH,wBAAgB,aAAa,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CA6FzD;AAED;;;GAGG;AACH,wBAAgB,YAAY,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAgBxD"}
1
+ {"version":3,"file":"prompts.d.ts","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH;;;GAGG;AACH,wBAAgB,YAAY,CAAC,OAAO,EAAE,MAAM,EAAE,cAAc,EAAE,MAAM,GAAG,MAAM,CAkC5E;AAED;;;;;;;;;;;;GAYG;AACH,wBAAgB,aAAa,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAkGzD;AAED;;;GAGG;AACH,wBAAgB,YAAY,CAAC,WAAW,EAAE,MAAM,GAAG,MAAM,CAgBxD"}
package/dist/prompts.js CHANGED
@@ -74,15 +74,15 @@ A verdict without citation is treated as NO_GO — do not assert PASS without ev
74
74
 
75
75
  **Dimensions:**
76
76
 
77
- 1. **Binary acceptance criteria** — are all sub-task ACs measurable and unambiguous? Are any ACs pre-checked (- [x]) in the draft, which is always invalid?
77
+ 1. **Binary acceptance criteria** — are all sub-task ACs measurable and unambiguous? Are any ACs pre-checked (- [x]) in the draft, which is always invalid? If Section 1.1 defines a quantitative quality bar, Section 1.4 must include at least one AC that directly enforces that bar (same metric family and threshold intent). Otherwise, FAIL.
78
78
  2. **Scope coherence** — are in-scope and out-of-scope boundaries clearly stated and non-contradictory? Check for conflicts between conceptual terminology (e.g., state names, entity names used in descriptions) and persisted/modelled representations (e.g., enums, schemas, data shapes). Any mismatch is a HARD_FAIL.
79
79
  3. **Constraint sufficiency** — do Must / Must-Not / Prefer / Escalate constraints cover the key risks?
80
- 4. **Decomposition realism** — can each sub-task be completed within the stated 2-hour limit by a skilled agent? Check that sub-task dependencies are stated explicitly (if task B requires task A's output, that must be documented).
80
+ 4. **Decomposition realism** — can each sub-task be completed within the stated 2-hour limit by a skilled agent? Check that sub-task dependencies are stated explicitly (if task B requires task A's output, that must be documented). If Section 1.1 defines a benchmark/dataset path, at least one sub-task must explicitly own creating or curating that benchmark artifact.
81
81
  5. **Start-without-clarification viability** — can an agent begin immediately with the context provided, without asking the human for more information?
82
- 6. **Internal consistency** — are terms, names, and concepts used consistently throughout the spec? Flag any case where the same entity is described differently in different sections (e.g., "webhook event" in scope, "push notification" in ACs — are these the same thing?).
83
- 7. **Mechanism lock** — does every core flow (data pipeline, primary algorithm, key architecture choice) commit to exactly ONE implementation approach? Scan for "e.g.", "or", "could", "might", "such as" in Deliverable, Scope, Constraints, and Task decomposition sections. If any of these offer multiple options without making an explicit decision, that is a HARD_FAIL — a spec that leaves the mechanism unresolved forces the builder to choose architecture, creating drift in multi-builder workflows.
82
+ 6. **Internal consistency** — are terms, names, and concepts used consistently throughout the spec? Flag any case where the same entity is described differently in different sections (e.g., "webhook event" in scope, "push notification" in ACs — are these the same thing?). Treat data-type/precision mismatches as consistency defects (e.g., float vs decimal/cents representation). Also FAIL if obvious template artifact text remains (example guidance that should have been replaced by project-specific content).
83
+ 7. **Mechanism lock** — does every core flow (data pipeline, primary algorithm, key architecture choice) commit to exactly ONE implementation approach? Scan for "e.g.", "or", "could", "might", "such as" in Deliverable, Scope, Constraints, and Task decomposition sections. If any of these offer multiple options without making an explicit decision, that is a HARD_FAIL — a spec that leaves the mechanism unresolved forces the builder to choose architecture, creating drift in multi-builder workflows. Additional lock checks: (a) external service lock requires concrete vendor/model (env var names alone are not sufficient), (b) auth lock requires token/session type plus expiry policy, (c) endpoint lists without request/response field shapes are SOFT_FAIL.
84
84
  8. **Convergence declaration** — does the spec include a \`## Spec Convergence\` section with \`open_questions: 0\` and \`ambiguities_remaining: 0\`? If the section is absent, or either value is non-zero, or \`ready_for_build\` is "no", that is a HARD_FAIL. This is the formal GO gate — a spec with declared open questions is not agent-ready regardless of how well the other sections are written.
85
- 9. **Adversarial gap scan** — does the spec include an \`## Edge Cases and Failure Modes\` section that is substantively filled? Steelman the spec as a builder who will follow it exactly: what will you hit mid-execution that the spec does not answer? What implicit assumption is baked in but undeclared? What failure mode from the FORGE taxonomy (scope creep, hallucinated completion, intent drift, context collapse, runaway cost, overconfident output) does the spec not address? An absent or blank section is a HARD_FAIL for medium/high risk specs. Placeholder-only content (\`___\`) is treated as absent.
85
+ 9. **Adversarial gap scan** — does the spec include an \`## Edge Cases and Failure Modes\` section that is substantively filled? Steelman the spec as a builder who will follow it exactly: what will you hit mid-execution that the spec does not answer? What implicit assumption is baked in but undeclared? What failure mode from the FORGE taxonomy (scope creep, hallucinated completion, intent drift, context collapse, runaway cost, overconfident output) does the spec not address? For mobile/capture flows, explicitly check offline/no-network behavior at capture time and retry/sync handling. An absent or blank section is a HARD_FAIL for medium/high risk specs. Placeholder-only content (\`___\`) is treated as absent.
86
86
  10. **Architecture completeness** — for medium/high coding specs, does the spec include \`## 1.6 Architecture Lock\` with all required fields resolved: Persistence layer, File/object storage, External model/service + env var, API trigger flow, Entity status state machine, Auth implementation? Any missing field, blank placeholder, or \`TBD\` is a HARD_FAIL. For low/unknown/non-coding specs, this is NOTE-only advisory.
87
87
 
88
88
  ## Severity classification
@@ -103,6 +103,11 @@ Keep it brief: one sentence + one evidence citation per dimension. No preamble,
103
103
 
104
104
  Then write a consolidated remediation list for all FAIL dimensions.
105
105
 
106
+ After writing the 10 dimensions, run one extra ambiguity sweep:
107
+ - Ask: "Could two competent builders implement this spec in materially different ways while both claiming compliance?"
108
+ - If yes, add those gaps as issues (SOFT_FAIL or HARD_FAIL based on impact) and include exact sections to fix.
109
+ - If no, state one sentence: "No additional ambiguity gaps found in final sweep."
110
+
106
111
  Note: implementation correctness is explicitly out of scope for this review.
107
112
 
108
113
  ## Re-review protocol
@@ -139,7 +144,7 @@ If the spec passes all dimensions with no HARD_FAIL issues, use:
139
144
  {"passed": true, "schema_version": "2", "issues": []}
140
145
  \`\`\`
141
146
 
142
- If and only if \`passed\` is false, immediately append this section after the verdict block:
147
+ If \`issues\` is non-empty (including SOFT_FAIL/NOTE with \`passed: true\`), immediately append this section after the verdict block:
143
148
 
144
149
  ## Paste this to your builder session:
145
150
 
@@ -1 +1 @@
1
- {"version":3,"file":"prompts.js","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":";AAAA;;;;GAIG;;AAMH,oCAkCC;AAeD,sCA6FC;AAMD,oCAgBC;AAxKD;;;GAGG;AACH,SAAgB,YAAY,CAAC,OAAe,EAAE,cAAsB;IAClE,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA6BP,cAAc,IAAI,2FAA2F;;;iBAG9F,OAAO,EAAE,CAAC;AAC3B,CAAC;AAED;;;;;;;;;;;;GAYG;AACH,SAAgB,aAAa,CAAC,WAAmB;IAC/C,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA2FP,WAAW,EAAE,CAAC;AAChB,CAAC;AAED;;;GAGG;AACH,SAAgB,YAAY,CAAC,WAAmB;IAC9C,OAAO;;;;;;;;;;;;;;EAcP,WAAW,EAAE,CAAC;AAChB,CAAC"}
1
+ {"version":3,"file":"prompts.js","sourceRoot":"","sources":["../src/prompts.ts"],"names":[],"mappings":";AAAA;;;;GAIG;;AAMH,oCAkCC;AAeD,sCAkGC;AAMD,oCAgBC;AA7KD;;;GAGG;AACH,SAAgB,YAAY,CAAC,OAAe,EAAE,cAAsB;IAClE,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EA6BP,cAAc,IAAI,2FAA2F;;;iBAG9F,OAAO,EAAE,CAAC;AAC3B,CAAC;AAED;;;;;;;;;;;;GAYG;AACH,SAAgB,aAAa,CAAC,WAAmB;IAC/C,OAAO;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EAgGP,WAAW,EAAE,CAAC;AAChB,CAAC;AAED;;;GAGG;AACH,SAAgB,YAAY,CAAC,WAAmB;IAC9C,OAAO;;;;;;;;;;;;;;EAcP,WAAW,EAAE,CAAC;AAChB,CAAC"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nimai-core",
3
- "version": "0.4.7",
3
+ "version": "0.4.9",
4
4
  "description": "Nimai core library — template loading, lint engine, context extraction. No LLM dependencies.",
5
5
  "keywords": [
6
6
  "nimai",
@@ -19,7 +19,8 @@
19
19
  "directory": "packages/core"
20
20
  },
21
21
  "files": [
22
- "dist/"
22
+ "dist/",
23
+ "data/"
23
24
  ],
24
25
  "main": "dist/index.js",
25
26
  "types": "dist/index.d.ts",