@nestbox-ai/cli 1.0.63 → 1.0.64
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +528 -10
- package/dist/agents/reportGenerator/REPORT_CONFIG_GUIDE.md +1449 -0
- package/dist/agents/reportGenerator/SYSTEM_PROMPT.md +28 -0
- package/dist/agents/reportGenerator/annual_report_10k.yaml +633 -0
- package/dist/agents/reportGenerator/index.d.ts +18 -0
- package/dist/agents/reportGenerator/index.js +210 -0
- package/dist/agents/reportGenerator/index.js.map +1 -0
- package/dist/agents/reportGenerator/report_config.schema.yaml +905 -0
- package/dist/agents/reportGenerator/vc_portfolio_monitoring.yaml +443 -0
- package/dist/commands/generate/reportComposer.d.ts +2 -0
- package/dist/commands/generate/reportComposer.js +99 -0
- package/dist/commands/generate/reportComposer.js.map +1 -0
- package/dist/commands/generate.js +2 -0
- package/dist/commands/generate.js.map +1 -1
- package/package.json +2 -2
|
@@ -0,0 +1,1449 @@
|
|
|
1
|
+
# Report Configuration Guide — Schema v2.2
|
|
2
|
+
|
|
3
|
+
> **For AI agents:** This document is your complete instructions for generating a valid `report_config.yaml`. Read every section before generating. Pay special attention to the [Quick Checklist](#quick-checklist) and [Computations](#5-computations) sections.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Table of Contents
|
|
8
|
+
|
|
9
|
+
1. [Overview & Pipeline](#overview--pipeline)
|
|
10
|
+
2. [Quick Checklist](#quick-checklist)
|
|
11
|
+
3. [Top-Level Structure](#top-level-structure)
|
|
12
|
+
4. [report](#1-report)
|
|
13
|
+
5. [context](#2-context)
|
|
14
|
+
6. [docsets](#3-docsets)
|
|
15
|
+
7. [llamaindex](#4-llamaindex)
|
|
16
|
+
8. [computations](#5-computations)
|
|
17
|
+
- [fields](#fields)
|
|
18
|
+
- [tables](#tables)
|
|
19
|
+
- [DocsetSubtask (agents)](#docsetsubtask-agents)
|
|
20
|
+
- [Autonomous vs. Explicit Search Mode](#autonomous-vs-explicit-search-mode)
|
|
21
|
+
- [output_schema](#output_schema)
|
|
22
|
+
- [depends_on](#depends_on)
|
|
23
|
+
9. [template](#6-template)
|
|
24
|
+
- [Placeholder Syntax](#placeholder-syntax)
|
|
25
|
+
- [Pipe Filters](#pipe-filters)
|
|
26
|
+
10. [guardrails](#7-guardrails)
|
|
27
|
+
11. [execution](#8-execution)
|
|
28
|
+
12. [storage & doc_repository](#9-storage--doc_repository)
|
|
29
|
+
13. [prompts](#10-prompts)
|
|
30
|
+
14. [mcp](#11-mcp)
|
|
31
|
+
15. [Environment Variables](#environment-variables)
|
|
32
|
+
16. [Complete Minimal Example](#complete-minimal-example)
|
|
33
|
+
17. [Full Annotated Example (Finance)](#full-annotated-example-finance)
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Overview & Pipeline
|
|
38
|
+
|
|
39
|
+
The YAML config is a **declarative specification** — you describe *what* to extract, and the framework drives a ReAct agent loop automatically. Here is how every config section maps to a pipeline stage:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
YAML config
|
|
43
|
+
│
|
|
44
|
+
├── docsets → GraphRAG parquet files loaded (entities, relationships, communities, text_units)
|
|
45
|
+
├── llamaindex → LLM and agent settings (model, timeouts, prompts)
|
|
46
|
+
├── computations → Agent spawned per subtask → searches GraphRAG → returns JSON validated against output_schema
|
|
47
|
+
│ ├── fields → Single structured values (ARR snapshot, liquidity metrics, executive summary)
|
|
48
|
+
│ └── tables → Tabular output rendered as markdown table in the report
|
|
49
|
+
├── template → Computed values injected into markdown via {{field.id.property}} placeholders
|
|
50
|
+
└── guardrails → LLM-judge checks run on computed values and/or the rendered report
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**Execution order:**
|
|
54
|
+
1. All docsets pre-loaded from disk or downloaded from `doc_repository`
|
|
55
|
+
2. Computations with no `depends_on` run in parallel
|
|
56
|
+
3. Computations that declare `depends_on` run only after their dependencies complete
|
|
57
|
+
4. Template rendered after all computations finish
|
|
58
|
+
5. Guardrails run last on the completed outputs
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Quick Checklist
|
|
63
|
+
|
|
64
|
+
Before writing the config, confirm you have answered:
|
|
65
|
+
|
|
66
|
+
- [ ] **What is the source document?** → drives `docsets` and document `locator`
|
|
67
|
+
- [ ] **What metrics/sections need to be extracted?** → drives `computations.fields`
|
|
68
|
+
- [ ] **Do any metrics need tabular/historical data?** → drives `computations.tables`
|
|
69
|
+
- [ ] **What should the final report look like?** → drives `template.content`
|
|
70
|
+
- [ ] **Do any extractions depend on prior results?** → drives `depends_on`
|
|
71
|
+
- [ ] **Which search type fits each extraction task?**
|
|
72
|
+
- Exact numbers, tables → `basic` or autonomous
|
|
73
|
+
- Entity/relationship lookups → `local` or autonomous
|
|
74
|
+
- High-level themes, summaries → `global` or autonomous
|
|
75
|
+
- Multi-hop graph exploration → `drift`
|
|
76
|
+
- [ ] **What quality checks are needed?** → drives `guardrails`
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Top-Level Structure
|
|
81
|
+
|
|
82
|
+
Required fields are marked with `*`.
|
|
83
|
+
|
|
84
|
+
```yaml
|
|
85
|
+
schema_version: "2.2" # * Always "2.2"
|
|
86
|
+
report: ... # * Report identity
|
|
87
|
+
context: ... # Runtime variables referenced in prompts
|
|
88
|
+
prompts: ... # Named reusable prompt strings
|
|
89
|
+
storage: ... # For local file-based documents
|
|
90
|
+
doc_repository: ... # For API-based document downloads
|
|
91
|
+
mcp: [] # MCP server endpoints (use [] if none)
|
|
92
|
+
docsets: [...] # * Document collections to query
|
|
93
|
+
llamaindex: ... # * LLM agent configuration
|
|
94
|
+
computations: ... # * What to extract
|
|
95
|
+
template: ... # * Output template
|
|
96
|
+
guardrails: [...] # LLM-judge quality checks
|
|
97
|
+
execution: ... # Retry and output settings
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## 1. `report`
|
|
103
|
+
|
|
104
|
+
**Purpose:** Identifies the report. Used in file names, template placeholders, and logs.
|
|
105
|
+
|
|
106
|
+
| Field | Required | Type | Notes |
|
|
107
|
+
|-------|----------|------|-------|
|
|
108
|
+
| `id` | Yes | string | Lowercase, alphanumeric, underscores/hyphens. Used in output filenames. |
|
|
109
|
+
| `name` | Yes | string | Human-readable name shown in logs and reports. |
|
|
110
|
+
| `description` | No | string | Longer description of purpose and scope. |
|
|
111
|
+
| `version` | No | string | Reporting period version (e.g., "2025.Q4"). |
|
|
112
|
+
|
|
113
|
+
```yaml
|
|
114
|
+
report:
|
|
115
|
+
id: acme_cfo_kpi_q4_2025
|
|
116
|
+
name: "Acme CFO KPI Pack — Q4 2025"
|
|
117
|
+
description: |
|
|
118
|
+
Quarterly KPI extraction from the Q4 2025 Board Presentation.
|
|
119
|
+
Covers ARR, retention, liquidity, bookings, and marketing metrics.
|
|
120
|
+
version: "2025.Q4"
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## 2. `context`
|
|
126
|
+
|
|
127
|
+
**Purpose:** Defines runtime variables that can be referenced anywhere in prompts and templates as `{{context.variable_name}}`. Also holds policies that guide agent behavior.
|
|
128
|
+
|
|
129
|
+
- All fields are **optional** at the schema level
|
|
130
|
+
- You can add **any custom key** — `context` allows extra properties
|
|
131
|
+
- Use context variables to avoid hardcoding company names, periods, or policies into every prompt
|
|
132
|
+
|
|
133
|
+
### Standard fields
|
|
134
|
+
|
|
135
|
+
| Field | Description |
|
|
136
|
+
|-------|-------------|
|
|
137
|
+
| `company_name` | Injected into prompts to identify the company |
|
|
138
|
+
| `currency` | Default "USD" |
|
|
139
|
+
| `units_policy` | Instructions for how to handle number units ($M vs full dollars, etc.) |
|
|
140
|
+
| `answer_quality_policy` | Numeric precision and citation requirements |
|
|
141
|
+
| `value_types` | Allowed value type labels for classification |
|
|
142
|
+
|
|
143
|
+
### Custom fields
|
|
144
|
+
|
|
145
|
+
Any key you add becomes available in prompts and template as `{{context.your_key}}`.
|
|
146
|
+
|
|
147
|
+
```yaml
|
|
148
|
+
context:
|
|
149
|
+
company_name: "Acme Corp, Inc."
|
|
150
|
+
currency: "USD"
|
|
151
|
+
as_of_period: "CY2025 Q4"
|
|
152
|
+
meeting_date: "2026-01-15"
|
|
153
|
+
source_title: "Q4 2025 Board Meeting Presentation"
|
|
154
|
+
units_policy: |
|
|
155
|
+
Return raw numeric values — no $, commas, or M/K suffixes.
|
|
156
|
+
Convert abbreviated values: "$2.5M" → 2500000, "$350K" → 350000.
|
|
157
|
+
Percentages as whole numbers: 92.3 not 0.923.
|
|
158
|
+
answer_quality_policy:
|
|
159
|
+
numeric_requirements:
|
|
160
|
+
- "Every numeric value must have a citation (c_xxxxxxxx format)"
|
|
161
|
+
- "Return null if the value is not found — never guess"
|
|
162
|
+
labeling_requirements:
|
|
163
|
+
- "Include period label (e.g., 2025Q4) with every metric"
|
|
164
|
+
value_types:
|
|
165
|
+
- "money_usd"
|
|
166
|
+
- "percent"
|
|
167
|
+
- "count"
|
|
168
|
+
- "multiple"
|
|
169
|
+
- "string"
|
|
170
|
+
- "date"
|
|
171
|
+
kpi_namespace: "acme.board.25q4"
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## 3. `docsets`
|
|
177
|
+
|
|
178
|
+
**Purpose:** Declares the GraphRAG-indexed document collections the agent will search. Each docset groups one or more documents and is referenced by `docset_id` in computations.
|
|
179
|
+
|
|
180
|
+
| Field | Required | Type | Notes |
|
|
181
|
+
|-------|----------|------|-------|
|
|
182
|
+
| `id` | Yes | string | Unique identifier referenced in computations |
|
|
183
|
+
| `docs` | Yes | array | At least one document |
|
|
184
|
+
| `description` | No | string | What this docset represents |
|
|
185
|
+
| `api_key` | No | string | OpenAI API key for GraphRAG searches. Falls back to `OPENAI_API_KEY` env var. |
|
|
186
|
+
|
|
187
|
+
### Document fields
|
|
188
|
+
|
|
189
|
+
| Field | Required | Notes |
|
|
190
|
+
|-------|----------|-------|
|
|
191
|
+
| `id` | Yes | Unique doc identifier |
|
|
192
|
+
| `locator` | Recommended | See below for three forms |
|
|
193
|
+
| `description` | No | Human-readable description |
|
|
194
|
+
| `alias` | No | Short name for use in prompts |
|
|
195
|
+
|
|
196
|
+
### Three ways to specify a document location
|
|
197
|
+
|
|
198
|
+
**1. Local filesystem path** — points directly to the GraphRAG output directory:
|
|
199
|
+
```yaml
|
|
200
|
+
locator: /data/graphrag/board_deck/output
|
|
201
|
+
locator: ./documents/doc-abc123/graphrag/output # relative to config file
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**2. `repo:doc-ID`** — download from the doc_repository API (requires `doc_repository` configured):
|
|
205
|
+
```yaml
|
|
206
|
+
locator: "repo:doc-eeb76651"
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
**3. Bare doc-ID** — check local cache first, then download from repo:
|
|
210
|
+
```yaml
|
|
211
|
+
locator: doc-a850ad6f
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
> **Legacy:** `document_id` is deprecated. Use `locator` instead.
|
|
215
|
+
|
|
216
|
+
### Example — single docset, one document
|
|
217
|
+
|
|
218
|
+
```yaml
|
|
219
|
+
docsets:
|
|
220
|
+
- id: acme_board_deck_q4
|
|
221
|
+
description: "Acme Q4 2025 Board Presentation — GraphRAG indexed"
|
|
222
|
+
api_key: ${OPENAI_API_KEY}
|
|
223
|
+
docs:
|
|
224
|
+
- id: board_deck_pdf
|
|
225
|
+
locator: "repo:doc-eeb76651"
|
|
226
|
+
description: "Q4 2025 Board Meeting Presentation"
|
|
227
|
+
alias: q4_deck
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
### Example — two docsets (board deck + financial statements)
|
|
231
|
+
|
|
232
|
+
```yaml
|
|
233
|
+
docsets:
|
|
234
|
+
- id: board_deck
|
|
235
|
+
description: "Board presentation slides"
|
|
236
|
+
api_key: ${OPENAI_API_KEY}
|
|
237
|
+
docs:
|
|
238
|
+
- id: deck_pdf
|
|
239
|
+
locator: "repo:doc-abc12345"
|
|
240
|
+
description: "Q4 2025 Board Deck"
|
|
241
|
+
|
|
242
|
+
- id: financials
|
|
243
|
+
description: "Audited financial statements"
|
|
244
|
+
api_key: ${OPENAI_API_KEY}
|
|
245
|
+
docs:
|
|
246
|
+
- id: income_statement
|
|
247
|
+
locator: ./documents/income_stmt/graphrag/output
|
|
248
|
+
- id: balance_sheet
|
|
249
|
+
locator: ./documents/balance_sheet/graphrag/output
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## 4. `llamaindex`
|
|
255
|
+
|
|
256
|
+
**Purpose:** Configures the LLM and ReAct agent that drives all extraction. One block controls all agents in the report.
|
|
257
|
+
|
|
258
|
+
| Field | Required | Default | Notes |
|
|
259
|
+
|-------|----------|---------|-------|
|
|
260
|
+
| `model` | Yes | — | LLM model ID: `gpt-4o`, `gpt-4.1-mini`, `claude-3-5-sonnet`, etc. |
|
|
261
|
+
| `api_key` | Yes | — | Supports `${ENV_VAR}` |
|
|
262
|
+
| `base_url` | No | OpenAI default | Custom endpoint (Azure, vLLM, Ollama) |
|
|
263
|
+
| `max_tool_calls` | No | 20 | Per-subtask tool call limit (1–100) |
|
|
264
|
+
| `tool_timeout_seconds` | No | 120 | Per-tool-call timeout (1–600) |
|
|
265
|
+
| `max_agent_iterations` | No | 30 | ReAct loop iterations before stopping (1–100) |
|
|
266
|
+
| `max_repair_attempts` | No | 2 | Schema validation repair loops (1–10) |
|
|
267
|
+
| `system_prompt` | No | Built-in | Main system prompt for all agents |
|
|
268
|
+
| `autonomous_search_guidance` | No | Built-in | Guidance for choosing between search tools in autonomous mode |
|
|
269
|
+
| `synthesis_prompt` | No | Built-in | Template combining multiple subtask outputs. Placeholders: `{{subtask_results}}`, `{{output_schema}}` |
|
|
270
|
+
| `validation_repair_prompt` | No | Built-in | Prompt for fixing schema validation failures. Placeholders: `{{raw_response}}`, `{{validation_errors}}`, `{{output_schema}}` |
|
|
271
|
+
| `guardrail_system_prompt` | No | Built-in | Default system prompt for guardrail LLM-judge calls |
|
|
272
|
+
| `json_extraction_prompt` | No | Built-in | Fallback for extracting JSON from non-JSON agent responses |
|
|
273
|
+
| `mcp_system_prompt` | No | Built-in | System prompt for MCP integration agents |
|
|
274
|
+
|
|
275
|
+
### Recommended model choices
|
|
276
|
+
|
|
277
|
+
| Use Case | Recommended Model |
|
|
278
|
+
|----------|------------------|
|
|
279
|
+
| High-accuracy financial extraction | `gpt-4o` |
|
|
280
|
+
| Cost-efficient extraction | `gpt-4.1-mini` |
|
|
281
|
+
| Claude-based extraction | `claude-3-5-sonnet` |
|
|
282
|
+
| Guardrail LLM-judge | `gpt-4.1-mini` (fast, cheap) |
|
|
283
|
+
|
|
284
|
+
### Minimal example
|
|
285
|
+
|
|
286
|
+
```yaml
|
|
287
|
+
llamaindex:
|
|
288
|
+
model: gpt-4o
|
|
289
|
+
api_key: ${OPENAI_API_KEY}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
### Full example with custom prompts
|
|
293
|
+
|
|
294
|
+
```yaml
|
|
295
|
+
llamaindex:
|
|
296
|
+
model: gpt-4o
|
|
297
|
+
base_url: ${OPENAI_BASE_URL:-https://api.openai.com/v1}
|
|
298
|
+
api_key: ${OPENAI_API_KEY}
|
|
299
|
+
max_tool_calls: 50
|
|
300
|
+
tool_timeout_seconds: 180
|
|
301
|
+
max_agent_iterations: 40
|
|
302
|
+
max_repair_attempts: 3
|
|
303
|
+
system_prompt: |
|
|
304
|
+
You are a CFO-grade KPI extraction analyst with expertise in financial document analysis.
|
|
305
|
+
|
|
306
|
+
## ANTI-FABRICATION RULE — READ FIRST
|
|
307
|
+
NEVER return values from this prompt or any instruction text.
|
|
308
|
+
ALL values MUST come from your search tool results.
|
|
309
|
+
If a value is not found in search results, return null.
|
|
310
|
+
|
|
311
|
+
## Critical Rules
|
|
312
|
+
1. NEVER FABRICATE: If a value isn't found, return null — never guess
|
|
313
|
+
2. EXACT VALUES: Use values exactly as found in documents
|
|
314
|
+
3. CITE EVERYTHING: Every numeric value needs a citation
|
|
315
|
+
4. RAW NUMBERS: Return 9697083 not "$9.7M" — no formatting
|
|
316
|
+
5. LABEL PERIODS: Always identify the time period (e.g., 2025Q4)
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
> **Tip for agents:** The built-in system prompt is already well-tuned for financial extraction. Only override `system_prompt` if you need domain-specific behavior or stricter rules. When you do override it, always include an anti-fabrication rule.
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## 5. `computations`
|
|
324
|
+
|
|
325
|
+
**Purpose:** Declares what to extract from the documents. Computations have two subtypes:
|
|
326
|
+
- **`fields`** — produce a single structured value (a snapshot, a set of metrics)
|
|
327
|
+
- **`tables`** — produce tabular data rendered as a markdown table
|
|
328
|
+
|
|
329
|
+
Each computation contains one or more **agent subtasks** (`agents`). Each subtask spawns an independent ReAct agent that queries GraphRAG and returns a JSON output validated against an `output_schema`.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
### Fields
|
|
334
|
+
|
|
335
|
+
A field computation produces a single structured JSON value. Use fields for KPI snapshots, summaries, and any structured metric sets.
|
|
336
|
+
|
|
337
|
+
| Field | Required | Default | Notes |
|
|
338
|
+
|-------|----------|---------|-------|
|
|
339
|
+
| `id` | Yes | — | Lowercase alphanumeric + underscores. Referenced in template as `{{field.id}}` |
|
|
340
|
+
| `label` | Yes | — | Human-readable name shown in logs |
|
|
341
|
+
| `prompt` | Yes | — | Synthesis prompt combining all subtask outputs into the final result |
|
|
342
|
+
| `agents` | No | — | Array of DocsetSubtask (see below). At least one of `agents` or `mcp_scope` needed. |
|
|
343
|
+
| `mcp_scope` | No | — | MCP subtasks to run alongside agent subtasks |
|
|
344
|
+
| `docset_id` | No | — | Default docset for all subtasks (overridden per-subtask) |
|
|
345
|
+
| `type` | No | `object` | Output type hint: `object`, `number`, `string` |
|
|
346
|
+
| `description` | No | — | Detailed description |
|
|
347
|
+
| `priority` | No | 0 | Execution order hint — lower runs first |
|
|
348
|
+
| `depends_on` | No | — | Array of field/table IDs that must complete first |
|
|
349
|
+
| `output_schema` | No | — | Final JSON schema (only needed for multi-agent synthesis where schemas differ) |
|
|
350
|
+
|
|
351
|
+
```yaml
|
|
352
|
+
computations:
|
|
353
|
+
fields:
|
|
354
|
+
- id: arr_snapshot
|
|
355
|
+
label: "ARR & Retention Snapshot (Q4 2025)"
|
|
356
|
+
type: object
|
|
357
|
+
priority: 1
|
|
358
|
+
docset_id: acme_board_deck_q4
|
|
359
|
+
agents:
|
|
360
|
+
- id: arr_extract
|
|
361
|
+
prompt: |
|
|
362
|
+
Extract the Q4 2025 ARR waterfall and retention metrics.
|
|
363
|
+
...
|
|
364
|
+
output_schema:
|
|
365
|
+
type: object
|
|
366
|
+
properties:
|
|
367
|
+
arr_end: { type: number }
|
|
368
|
+
nrr: { type: number }
|
|
369
|
+
citations: { type: array, items: { type: string } }
|
|
370
|
+
mcp_scope: []
|
|
371
|
+
prompt: "Return the Q4 2025 ARR snapshot with citations."
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
---
|
|
375
|
+
|
|
376
|
+
### Tables
|
|
377
|
+
|
|
378
|
+
A table computation produces an array of rows rendered as a markdown table in the report. Use tables for historical data, multi-period comparisons, and any row-per-item data.
|
|
379
|
+
|
|
380
|
+
| Field | Required | Default | Notes |
|
|
381
|
+
|-------|----------|---------|-------|
|
|
382
|
+
| `id` | Yes | — | Referenced in template as `{{table.id}}` |
|
|
383
|
+
| `title` | Yes | — | Table title (used in the rendered table header) |
|
|
384
|
+
| `prompt` | Yes | — | Synthesis prompt |
|
|
385
|
+
| `agents` | No | — | Same as fields |
|
|
386
|
+
| `docset_id` | No | — | Default docset |
|
|
387
|
+
| `priority` | No | 0 | Execution order |
|
|
388
|
+
| `depends_on` | No | — | Dependencies |
|
|
389
|
+
| `output_schema` | No | — | JSON schema for the final table |
|
|
390
|
+
|
|
391
|
+
> **Output format:** The agent's `output_schema` for a table should always use `{"type": "object", "properties": {"rows": {"type": "array", "items": {...}}}}`. The framework extracts the `rows` array and renders it as a markdown table.
|
|
392
|
+
|
|
393
|
+
```yaml
|
|
394
|
+
computations:
|
|
395
|
+
tables:
|
|
396
|
+
- id: arr_quarterly_history
|
|
397
|
+
title: "ARR Waterfall — Quarterly History"
|
|
398
|
+
priority: 2
|
|
399
|
+
docset_id: acme_board_deck_q4
|
|
400
|
+
agents:
|
|
401
|
+
- id: arr_history_extract
|
|
402
|
+
prompt: |
|
|
403
|
+
Extract the full quarterly ARR waterfall table.
|
|
404
|
+
For EACH quarter shown: period, beg_arr, new_logos_arr, net_upsell_arr,
|
|
405
|
+
churn_arr, end_arr, nrr, grr.
|
|
406
|
+
Return {"rows": [...]} with one object per quarter.
|
|
407
|
+
output_schema:
|
|
408
|
+
type: object
|
|
409
|
+
properties:
|
|
410
|
+
rows:
|
|
411
|
+
type: array
|
|
412
|
+
items:
|
|
413
|
+
type: object
|
|
414
|
+
properties:
|
|
415
|
+
period: { type: string }
|
|
416
|
+
beg_arr: { type: number }
|
|
417
|
+
end_arr: { type: number }
|
|
418
|
+
nrr: { type: number }
|
|
419
|
+
citations: { type: array, items: { type: string } }
|
|
420
|
+
mcp_scope: []
|
|
421
|
+
prompt: "Return quarterly ARR history with citations per row."
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
### DocsetSubtask (`agents`)
|
|
427
|
+
|
|
428
|
+
Each entry in `agents` is a **DocsetSubtask** — an independent agent run that queries one docset and returns an intermediate JSON result.
|
|
429
|
+
|
|
430
|
+
| Field | Required | Notes |
|
|
431
|
+
|-------|----------|-------|
|
|
432
|
+
| `id` | Yes | Unique subtask ID (lowercase alphanumeric + underscores) |
|
|
433
|
+
| `prompt` | Yes | What to extract. Be specific and include search strategy guidance. |
|
|
434
|
+
| `output_schema` | Yes | JSON Schema the agent output must conform to |
|
|
435
|
+
| `docset_id` | No | Overrides the computation-level `docset_id` |
|
|
436
|
+
| `search_type` | No | `basic`, `local`, `global`, `drift`. **Omit for autonomous mode.** |
|
|
437
|
+
| `options` | No | Search options (only used when `search_type` is set — see below) |
|
|
438
|
+
|
|
439
|
+
```yaml
|
|
440
|
+
agents:
|
|
441
|
+
- id: liquidity_extract
|
|
442
|
+
docset_id: acme_board_deck_q4 # optional override
|
|
443
|
+
search_type: basic # explicit, or omit for autonomous
|
|
444
|
+
options:
|
|
445
|
+
basic_search:
|
|
446
|
+
k: 15 # retrieve 15 chunks instead of default 10
|
|
447
|
+
prompt: |
|
|
448
|
+
Extract cash and runway metrics from the CFO report section.
|
|
449
|
+
...
|
|
450
|
+
output_schema:
|
|
451
|
+
type: object
|
|
452
|
+
properties:
|
|
453
|
+
cash_current: { type: number }
|
|
454
|
+
runway_months: { type: number }
|
|
455
|
+
citations: { type: array, items: { type: string } }
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
### Autonomous vs. Explicit Search Mode
|
|
461
|
+
|
|
462
|
+
**Autonomous mode** (recommended for most use cases) — omit `search_type`. The agent has access to all four search tools and decides which to use based on the task. The `autonomous_search_guidance` (built-in or custom) teaches it when to pick each tool.
|
|
463
|
+
|
|
464
|
+
```yaml
|
|
465
|
+
agents:
|
|
466
|
+
- id: arr_extract
|
|
467
|
+
# No search_type — agent chooses
|
|
468
|
+
prompt: |
|
|
469
|
+
Extract ARR waterfall metrics for Q4 2025.
|
|
470
|
+
For exact table values: try basic_search first.
|
|
471
|
+
For entity relationships: use local_search.
|
|
472
|
+
output_schema: ...
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
**Explicit mode** — set `search_type` when you have a strong reason to restrict the agent to one search method. This prevents the agent from trying other tools.
|
|
476
|
+
|
|
477
|
+
| `search_type` | Best For |
|
|
478
|
+
|---------------|----------|
|
|
479
|
+
| `basic` | Exact values, tables, specific numbers, slide content |
|
|
480
|
+
| `local` | Entity/relationship lookups (who owns what, person-metric connections) |
|
|
481
|
+
| `global` | High-level summaries, themes, cross-document synthesis |
|
|
482
|
+
| `drift` | Multi-hop exploration, deep graph traversal across entities |
|
|
483
|
+
|
|
484
|
+
```yaml
|
|
485
|
+
agents:
|
|
486
|
+
- id: themes_extract
|
|
487
|
+
search_type: global # force global search for high-level themes
|
|
488
|
+
prompt: |
|
|
489
|
+
Identify the major strategic themes discussed across the entire deck.
|
|
490
|
+
DO NOT use basic_search — you need community-level summaries.
|
|
491
|
+
output_schema: ...
|
|
492
|
+
```
|
|
493
|
+
|
|
494
|
+
### Search Options (when using explicit mode)
|
|
495
|
+
|
|
496
|
+
Options are nested **under the search type key**:
|
|
497
|
+
|
|
498
|
+
```yaml
|
|
499
|
+
agents:
|
|
500
|
+
- id: arr_extract
|
|
501
|
+
search_type: basic
|
|
502
|
+
options:
|
|
503
|
+
basic_search:
|
|
504
|
+
k: 15 # number of chunks to retrieve (default 10)
|
|
505
|
+
chat_model_id: gpt-4o # override model for this search
|
|
506
|
+
embedding_model_id: text-embedding-3-large
|
|
507
|
+
|
|
508
|
+
- id: entity_map
|
|
509
|
+
search_type: local
|
|
510
|
+
options:
|
|
511
|
+
local_search:
|
|
512
|
+
top_k_entities: 20 # retrieve more entities (default 10)
|
|
513
|
+
top_k_relationships: 20
|
|
514
|
+
max_context_tokens: 16000
|
|
515
|
+
|
|
516
|
+
- id: themes_extract
|
|
517
|
+
search_type: global
|
|
518
|
+
options:
|
|
519
|
+
global_search:
|
|
520
|
+
max_context_tokens: 12000
|
|
521
|
+
data_max_tokens: 6000
|
|
522
|
+
map_max_length: 2000
|
|
523
|
+
reduce_max_length: 1800
|
|
524
|
+
|
|
525
|
+
- id: deep_explore
|
|
526
|
+
search_type: drift
|
|
527
|
+
options:
|
|
528
|
+
drift_search:
|
|
529
|
+
n_depth: 3 # hop depth (default 2)
|
|
530
|
+
drift_k_followups: 6 # follow-ups per hop
|
|
531
|
+
concurrency: 4
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
---
|
|
535
|
+
|
|
536
|
+
### `output_schema`
|
|
537
|
+
|
|
538
|
+
The `output_schema` is a JSON Schema object that the agent output must satisfy. The framework validates the agent response and runs a repair loop (up to `max_repair_attempts`) if validation fails.
|
|
539
|
+
|
|
540
|
+
**Rules for writing output_schema:**
|
|
541
|
+
1. Always include `type: object` at the root
|
|
542
|
+
2. Include a `citations` array in every schema — this is how you enforce traceability
|
|
543
|
+
3. Add `notes` string for agent to record caveats about missing data
|
|
544
|
+
4. Use `minimum`/`maximum` constraints to catch unit errors (e.g., retention percentages must be 80–120, not 0.80–1.20)
|
|
545
|
+
5. Use `type: number` for all numeric values — never `integer` unless the value genuinely cannot be a decimal
|
|
546
|
+
6. Keep schemas specific — the tighter the schema, the better the agent's output
|
|
547
|
+
|
|
548
|
+
```yaml
|
|
549
|
+
output_schema:
|
|
550
|
+
type: object
|
|
551
|
+
properties:
|
|
552
|
+
period_label:
|
|
553
|
+
type: string # "2025Q4" — the reporting period
|
|
554
|
+
arr:
|
|
555
|
+
type: object
|
|
556
|
+
properties:
|
|
557
|
+
beg: { type: number } # beginning ARR (full dollars, e.g. 9697083)
|
|
558
|
+
new_logos: { type: number } # ARR from new customers
|
|
559
|
+
net_upsell: { type: number } # net expansion ARR
|
|
560
|
+
churn: { type: number } # lost ARR (negative number, e.g. -81579)
|
|
561
|
+
end: { type: number } # ending ARR
|
|
562
|
+
retention:
|
|
563
|
+
type: object
|
|
564
|
+
properties:
|
|
565
|
+
nrr: { type: number, minimum: 50, maximum: 150 } # whole-number %, e.g. 112.3
|
|
566
|
+
grr: { type: number, minimum: 50, maximum: 100 } # e.g. 88.5
|
|
567
|
+
citations:
|
|
568
|
+
type: array
|
|
569
|
+
items: { type: string } # ["c_926fdeb6", "c_ab12cd34"]
|
|
570
|
+
notes:
|
|
571
|
+
type: string # explain missing values
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
> **For agents:** The `output_schema` IS the spec. Match it exactly. Do not add or remove properties not in the schema.
|
|
575
|
+
|
|
576
|
+
---
|
|
577
|
+
|
|
578
|
+
### `depends_on`
|
|
579
|
+
|
|
580
|
+
Computations can depend on other computations. A computation listed in `depends_on` must complete successfully before this one starts. Use this when:
|
|
581
|
+
- A synthesis computation needs the results of prior extractions
|
|
582
|
+
- One computation's output is referenced in another computation's prompt
|
|
583
|
+
|
|
584
|
+
```yaml
|
|
585
|
+
computations:
|
|
586
|
+
fields:
|
|
587
|
+
# arr_snapshot runs first (no depends_on)
|
|
588
|
+
- id: arr_snapshot
|
|
589
|
+
label: "ARR Snapshot"
|
|
590
|
+
priority: 1
|
|
591
|
+
agents: [...]
|
|
592
|
+
prompt: "Extract Q4 ARR metrics."
|
|
593
|
+
|
|
594
|
+
# executive_summary runs AFTER arr_snapshot completes
|
|
595
|
+
# It can reference arr_snapshot results in its prompt
|
|
596
|
+
- id: executive_summary
|
|
597
|
+
label: "Executive Summary"
|
|
598
|
+
priority: 2
|
|
599
|
+
depends_on: [arr_snapshot] # wait for arr_snapshot
|
|
600
|
+
agents:
|
|
601
|
+
- id: exec_extract
|
|
602
|
+
prompt: |
|
|
603
|
+
Generate the executive summary.
|
|
604
|
+
Note: ARR snapshot results are available in your context.
|
|
605
|
+
output_schema: ...
|
|
606
|
+
prompt: "Synthesize the executive summary."
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
---
|
|
610
|
+
|
|
611
|
+
## 6. `template`
|
|
612
|
+
|
|
613
|
+
**Purpose:** Defines the markdown report structure. Computed values are injected using placeholder syntax.
|
|
614
|
+
|
|
615
|
+
| Field | Required | Notes |
|
|
616
|
+
|-------|----------|-------|
|
|
617
|
+
| `content` | Yes | Main markdown template with placeholders |
|
|
618
|
+
| `format` | No | Only `"markdown"` supported |
|
|
619
|
+
| `sections` | No | Reusable named blocks referenced as `{{sections.name}}` |
|
|
620
|
+
|
|
621
|
+
### Placeholder Syntax
|
|
622
|
+
|
|
623
|
+
| Syntax | Resolves To |
|
|
624
|
+
|--------|-------------|
|
|
625
|
+
| `{{field.id}}` | The full computed value object (JSON) |
|
|
626
|
+
| `{{field.id.property}}` | A single property of a field's output |
|
|
627
|
+
| `{{field.id.nested.property}}` | Nested property access |
|
|
628
|
+
| `{{table.id}}` | The table rendered as a markdown table |
|
|
629
|
+
| `{{context.key}}` | A context variable |
|
|
630
|
+
| `{{report.id}}` / `{{report.name}}` | Report metadata |
|
|
631
|
+
| `{{sections.name}}` | A reusable section block |
|
|
632
|
+
|
|
633
|
+
```yaml
|
|
634
|
+
template:
|
|
635
|
+
format: markdown
|
|
636
|
+
sections:
|
|
637
|
+
report_header: |
|
|
638
|
+
# {{report.name}}
|
|
639
|
+
> **Source:** {{context.source_title}}
|
|
640
|
+
> **Period:** {{context.as_of_period}}
|
|
641
|
+
> **Generated by:** Nestbox AI Report Generator
|
|
642
|
+
|
|
643
|
+
content: |
|
|
644
|
+
{{sections.report_header}}
|
|
645
|
+
|
|
646
|
+
---
|
|
647
|
+
|
|
648
|
+
## ARR & Retention (Q4 2025)
|
|
649
|
+
|
|
650
|
+
| Metric | Value |
|
|
651
|
+
|--------|-------|
|
|
652
|
+
| Beg ARR | {{field.arr_snapshot.arr.beg}} |
|
|
653
|
+
| End ARR | {{field.arr_snapshot.arr.end}} |
|
|
654
|
+
| NRR | {{field.arr_snapshot.retention.nrr}}% |
|
|
655
|
+
|
|
656
|
+
### Quarterly History
|
|
657
|
+
{{table.arr_quarterly_history}}
|
|
658
|
+
|
|
659
|
+
---
|
|
660
|
+
|
|
661
|
+
## Liquidity
|
|
662
|
+
Cash on hand: {{field.liquidity.cash_current}}
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
### Pipe Filters
|
|
666
|
+
|
|
667
|
+
Placeholders support pipe chains to format values. Filters are applied left to right.
|
|
668
|
+
|
|
669
|
+
| Filter | Syntax | Effect | Example |
|
|
670
|
+
|--------|--------|--------|---------|
|
|
671
|
+
| `currency` | `\| currency("$", 0)` | Format as currency with N decimal places | `9697083` → `$9,697,083` |
|
|
672
|
+
| `number` | `\| number(1)` | Format as decimal with N places | `92.3456` → `92.3` |
|
|
673
|
+
| `default` | `\| default("—")` | Fallback if value is null/missing | `null` → `—` |
|
|
674
|
+
|
|
675
|
+
```yaml
|
|
676
|
+
# Examples of pipe usage
|
|
677
|
+
{{ field.arr_snapshot.arr.end | currency("$", 0) | default("—") }}
|
|
678
|
+
# → "$10,144,379" or "—" if null
|
|
679
|
+
|
|
680
|
+
{{ field.arr_snapshot.retention.nrr | number(1) | default("N/A") }}
|
|
681
|
+
# → "92.3" or "N/A" if null
|
|
682
|
+
|
|
683
|
+
{{ field.arr_snapshot.arr.churn | currency("$", 0) | default("—") }}
|
|
684
|
+
# → "-$81,579" (negative values shown with minus sign)
|
|
685
|
+
```
|
|
686
|
+
|
|
687
|
+
> **Note:** Always add `| default("—")` for financial values that may be null. This prevents the template from rendering "None" or breaking.
|
|
688
|
+
|
|
689
|
+
### Full template example
|
|
690
|
+
|
|
691
|
+
```yaml
|
|
692
|
+
template:
|
|
693
|
+
format: markdown
|
|
694
|
+
sections:
|
|
695
|
+
header: |
|
|
696
|
+
# {{report.name}}
|
|
697
|
+
> **Period:** {{context.as_of_period}} | **Company:** {{context.company_name}}
|
|
698
|
+
|
|
699
|
+
content: |
|
|
700
|
+
{{sections.header}}
|
|
701
|
+
|
|
702
|
+
---
|
|
703
|
+
|
|
704
|
+
## ARR & Retention
|
|
705
|
+
|
|
706
|
+
**Period:** {{ field.arr_snapshot.period_label | default("N/A") }}
|
|
707
|
+
|
|
708
|
+
### ARR Waterfall
|
|
709
|
+
| Metric | Value |
|
|
710
|
+
|--------|-------|
|
|
711
|
+
| Beg ARR | {{ field.arr_snapshot.arr.beg | currency("$", 0) | default("—") }} |
|
|
712
|
+
| + New Logos | {{ field.arr_snapshot.arr.new_logos | currency("$", 0) | default("—") }} |
|
|
713
|
+
| + Net Upsell | {{ field.arr_snapshot.arr.net_upsell | currency("$", 0) | default("—") }} |
|
|
714
|
+
| − Churn | {{ field.arr_snapshot.arr.churn | currency("$", 0) | default("—") }} |
|
|
715
|
+
| **End ARR** | **{{ field.arr_snapshot.arr.end | currency("$", 0) | default("—") }}** |
|
|
716
|
+
|
|
717
|
+
**NRR:** {{ field.arr_snapshot.retention.nrr | number(1) | default("—") }}%
|
|
718
|
+
|
|
719
|
+
**Citations:** {{ field.arr_snapshot.citations | default("*None*") }}
|
|
720
|
+
|
|
721
|
+
### Quarterly History
|
|
722
|
+
{{table.arr_quarterly_history}}
|
|
723
|
+
|
|
724
|
+
---
|
|
725
|
+
|
|
726
|
+
## Liquidity & Runway
|
|
727
|
+
|
|
728
|
+
| Metric | Value |
|
|
729
|
+
|--------|-------|
|
|
730
|
+
| Cash on Hand | {{ field.liquidity.cash_current | currency("$", 0) | default("—") }} |
|
|
731
|
+
| Monthly Burn | {{ field.liquidity.monthly_burn | currency("$", 0) | default("—") }}/mo |
|
|
732
|
+
| Runway | {{ field.liquidity.runway_months | number(1) | default("—") }} months |
|
|
733
|
+
|
|
734
|
+
**Citations:** {{ field.liquidity.citations | default("*None*") }}
|
|
735
|
+
```
|
|
736
|
+
|
|
737
|
+
---
|
|
738
|
+
|
|
739
|
+
## 7. `guardrails`
|
|
740
|
+
|
|
741
|
+
**Purpose:** LLM-judge checks that run after all computations complete. Used to enforce data quality (citation completeness, arithmetic correctness, no fabricated values, unit sanity).
|
|
742
|
+
|
|
743
|
+
| Field | Required | Notes |
|
|
744
|
+
|-------|----------|-------|
|
|
745
|
+
| `id` | Yes | Lowercase alphanumeric + underscores. Starts with `gr_` by convention. |
|
|
746
|
+
| `target` | Yes | What to validate — see target types below |
|
|
747
|
+
| `on_fail` | Yes | `"error"` — blocks output and raises error. `"warn"` — reports issue but continues. |
|
|
748
|
+
| `model` | Yes | LLM model for judge (e.g., `gpt-4.1-mini` is cost-effective) |
|
|
749
|
+
| `api_key` | Yes | Supports `${ENV_VAR}` |
|
|
750
|
+
| `prompt` | Yes | Validation prompt. Use `{{content}}` placeholder for the value being checked. |
|
|
751
|
+
| `base_url` | No | Custom endpoint for guardrail LLM |
|
|
752
|
+
| `system_prompt` | No | Overrides `llamaindex.guardrail_system_prompt` for this guardrail |
|
|
753
|
+
| `description` | No | Human-readable description |
|
|
754
|
+
|
|
755
|
+
### Target types
|
|
756
|
+
|
|
757
|
+
| Target | Validates |
|
|
758
|
+
|--------|-----------|
|
|
759
|
+
| `computations` | All computed fields and tables as a single JSON blob |
|
|
760
|
+
| `field.{id}` | A single field computation (e.g., `field.arr_snapshot`) |
|
|
761
|
+
| `table.{id}` | A single table computation (e.g., `table.arr_quarterly_history`) |
|
|
762
|
+
| `final_report` | The rendered markdown report |
|
|
763
|
+
|
|
764
|
+
### Guardrail prompt guidelines
|
|
765
|
+
|
|
766
|
+
1. Use `{{content}}` where the value to check should appear
|
|
767
|
+
2. Always specify the expected output format: `Return JSON: {"pass": boolean, "issues": [...]}`
|
|
768
|
+
3. Define clear, unambiguous pass/fail rules
|
|
769
|
+
4. Set `on_fail: warn` during development; promote to `on_fail: error` for production
|
|
770
|
+
|
|
771
|
+
```yaml
|
|
772
|
+
guardrails:
|
|
773
|
+
# Check 1: All numeric values have citations
|
|
774
|
+
- id: gr_citation_check
|
|
775
|
+
target: computations
|
|
776
|
+
on_fail: warn
|
|
777
|
+
model: gpt-4.1-mini
|
|
778
|
+
api_key: ${OPENAI_API_KEY}
|
|
779
|
+
description: "Verify all numeric values have citations"
|
|
780
|
+
prompt: |
|
|
781
|
+
Review the computed JSON and check that every numeric value
|
|
782
|
+
has at least one citation in c_xxxxxxxx format.
|
|
783
|
+
Empty citations arrays [] are failures.
|
|
784
|
+
|
|
785
|
+
Return JSON: {"pass": boolean, "issues": ["list of fields missing citations"]}
|
|
786
|
+
|
|
787
|
+
Content to check:
|
|
788
|
+
{{content}}
|
|
789
|
+
|
|
790
|
+
# Check 2: ARR waterfall arithmetic
|
|
791
|
+
- id: gr_arr_math
|
|
792
|
+
target: field.arr_snapshot
|
|
793
|
+
on_fail: warn
|
|
794
|
+
model: gpt-4.1-mini
|
|
795
|
+
api_key: ${OPENAI_API_KEY}
|
|
796
|
+
description: "Verify ARR bridge arithmetic"
|
|
797
|
+
prompt: |
|
|
798
|
+
Validate: beg + new_logos + net_upsell + churn = end
|
|
799
|
+
Note: churn is already negative — ADD it, do not subtract.
|
|
800
|
+
Allow 1% tolerance for rounding.
|
|
801
|
+
|
|
802
|
+
Return JSON: {
|
|
803
|
+
"pass": boolean,
|
|
804
|
+
"computed_end": number,
|
|
805
|
+
"stated_end": number,
|
|
806
|
+
"difference": number,
|
|
807
|
+
"issues": []
|
|
808
|
+
}
|
|
809
|
+
|
|
810
|
+
Content: {{content}}
|
|
811
|
+
|
|
812
|
+
# Check 3: Fabrication detection
|
|
813
|
+
- id: gr_no_fabrication
|
|
814
|
+
target: computations
|
|
815
|
+
on_fail: warn
|
|
816
|
+
model: gpt-4.1-mini
|
|
817
|
+
api_key: ${OPENAI_API_KEY}
|
|
818
|
+
description: "Detect placeholder or fabricated values"
|
|
819
|
+
prompt: |
|
|
820
|
+
FAIL if you find ANY:
|
|
821
|
+
1. Citations like "Source A", "Source B" (real ones use c_xxxxxxxx hex format)
|
|
822
|
+
2. All ARR values suspiciously round (100000, 200000, 300000) — indicates fabrication
|
|
823
|
+
3. Retention values below 1.0 (should be whole-number percent: 92.3 not 0.923)
|
|
824
|
+
|
|
825
|
+
Return JSON: {"pass": boolean, "issues": []}
|
|
826
|
+
|
|
827
|
+
Content: {{content}}
|
|
828
|
+
|
|
829
|
+
# Check 4: Final report completeness
|
|
830
|
+
- id: gr_report_populated
|
|
831
|
+
target: final_report
|
|
832
|
+
on_fail: warn
|
|
833
|
+
model: gpt-4.1-mini
|
|
834
|
+
api_key: ${OPENAI_API_KEY}
|
|
835
|
+
description: "Verify final report has real content"
|
|
836
|
+
prompt: |
|
|
837
|
+
Check: Does the report contain actual numeric data (not just "—" placeholders)?
|
|
838
|
+
Are the main sections populated?
|
|
839
|
+
|
|
840
|
+
Return JSON: {
|
|
841
|
+
"pass": boolean,
|
|
842
|
+
"sections_populated": [],
|
|
843
|
+
"sections_empty": [],
|
|
844
|
+
"issues": []
|
|
845
|
+
}
|
|
846
|
+
|
|
847
|
+
Content: {{content}}
|
|
848
|
+
```
|
|
849
|
+
|
|
850
|
+
---
|
|
851
|
+
|
|
852
|
+
## 8. `execution`
|
|
853
|
+
|
|
854
|
+
**Purpose:** Controls retries, output directory, and which output files are generated.
|
|
855
|
+
|
|
856
|
+
```yaml
|
|
857
|
+
execution:
|
|
858
|
+
retries:
|
|
859
|
+
max_attempts: 3 # per-computation retry attempts (default 3)
|
|
860
|
+
backoff_seconds: 2.0 # seconds between retries (default 1.0)
|
|
861
|
+
output:
|
|
862
|
+
directory: ./output # output directory (default "./output")
|
|
863
|
+
timestamp_suffix: false # add timestamp to dir name (default false)
|
|
864
|
+
include_final_report: true # generate final_report.md (default true)
|
|
865
|
+
include_computed_json: true # generate computed.json (default true)
|
|
866
|
+
include_evidence: true # generate evidence.json (default true)
|
|
867
|
+
include_guardrails: true # run guardrails and generate guardrails.json (default true)
|
|
868
|
+
```
|
|
869
|
+
|
|
870
|
+
**Output files generated:**
|
|
871
|
+
|
|
872
|
+
| File | Contents |
|
|
873
|
+
|------|----------|
|
|
874
|
+
| `final_report.md` | Rendered markdown report from template |
|
|
875
|
+
| `computed.json` | Raw extracted values with metadata and validation status |
|
|
876
|
+
| `evidence.json` | Full source traceability — all search results, agent traces, tool calls |
|
|
877
|
+
| `guardrails.json` | Guardrail results — pass/fail per check, issues found |
|
|
878
|
+
|
|
879
|
+
---
|
|
880
|
+
|
|
881
|
+
## 9. `storage` & `doc_repository`
|
|
882
|
+
|
|
883
|
+
### `storage` — for local filesystem documents
|
|
884
|
+
|
|
885
|
+
Use when documents are stored locally on disk under `document_id` folder names (legacy pattern).
|
|
886
|
+
|
|
887
|
+
```yaml
|
|
888
|
+
storage:
|
|
889
|
+
base_path: ./documents # root directory containing doc-XXXXXXXX folders
|
|
890
|
+
graphrag_subpath: graphrag/output # subfolder within each document folder
|
|
891
|
+
```
|
|
892
|
+
|
|
893
|
+
With this config, `document_id: doc-abc123` resolves to `./documents/doc-abc123/graphrag/output`.
|
|
894
|
+
|
|
895
|
+
> **Prefer `locator` in docs** — it's more explicit and supports all three resolution strategies.
|
|
896
|
+
|
|
897
|
+
### `doc_repository` — for API-based document downloads
|
|
898
|
+
|
|
899
|
+
Use when documents are stored in a Nestbox document repository and should be downloaded on demand.
|
|
900
|
+
|
|
901
|
+
```yaml
|
|
902
|
+
doc_repository:
|
|
903
|
+
api_base_url: https://your-doc-repo.example.com
|
|
904
|
+
api_key: ${DOC_REPO_API_KEY}
|
|
905
|
+
rotation: 200 # max cached documents before FIFO eviction (default 200)
|
|
906
|
+
```
|
|
907
|
+
|
|
908
|
+
With this config, `locator: "repo:doc-eeb76651"` downloads the document from the API and caches it locally.
|
|
909
|
+
|
|
910
|
+
---
|
|
911
|
+
|
|
912
|
+
## 10. `prompts`
|
|
913
|
+
|
|
914
|
+
**Purpose:** Define reusable named prompt strings. Reference them by name in `llamaindex` settings or computation prompts.
|
|
915
|
+
|
|
916
|
+
```yaml
|
|
917
|
+
prompts:
|
|
918
|
+
strict_citation_rules: |
|
|
919
|
+
CITATION RULES — ENFORCE STRICTLY:
|
|
920
|
+
Every numeric value MUST have a citation in c_xxxxxxxx format.
|
|
921
|
+
Never return an empty citations array. If you found it in a search result,
|
|
922
|
+
it has a [CITE AS: "c_xxxxxxxx"] marker — use it.
|
|
923
|
+
|
|
924
|
+
units_reminder: |
|
|
925
|
+
UNIT RULES:
|
|
926
|
+
Return raw full numbers: 9697083 not "$9.7M"
|
|
927
|
+
Convert: "$2.5M" → 2500000, "$350K" → 350000
|
|
928
|
+
Percentages as whole numbers: 92.3 not 0.923
|
|
929
|
+
```
|
|
930
|
+
|
|
931
|
+
Then reference in prompts:
|
|
932
|
+
```yaml
|
|
933
|
+
agents:
|
|
934
|
+
- id: arr_extract
|
|
935
|
+
prompt: |
|
|
936
|
+
Extract ARR metrics.
|
|
937
|
+
{{prompts.strict_citation_rules}}
|
|
938
|
+
{{prompts.units_reminder}}
|
|
939
|
+
...
|
|
940
|
+
```
|
|
941
|
+
|
|
942
|
+
---
|
|
943
|
+
|
|
944
|
+
## 11. `mcp`
|
|
945
|
+
|
|
946
|
+
**Purpose:** Model Context Protocol endpoints for writing extracted data to external systems (e.g., a KPI database, CRM, or data warehouse). Leave as `mcp: []` if not using external integrations.
|
|
947
|
+
|
|
948
|
+
```yaml
|
|
949
|
+
mcp:
|
|
950
|
+
- id: system_of_record
|
|
951
|
+
type: streamable-http
|
|
952
|
+
url: ${SOR_MCP_URL}
|
|
953
|
+
headers:
|
|
954
|
+
Authorization: "Bearer ${SOR_MCP_TOKEN}"
|
|
955
|
+
timeout_seconds: 60
|
|
956
|
+
description: "KPI upsert endpoint for board metrics database"
|
|
957
|
+
```
|
|
958
|
+
|
|
959
|
+
MCP subtasks are then added to computations via `mcp_scope`:
|
|
960
|
+
|
|
961
|
+
```yaml
|
|
962
|
+
computations:
|
|
963
|
+
fields:
|
|
964
|
+
- id: arr_snapshot
|
|
965
|
+
agents: [...]
|
|
966
|
+
mcp_scope:
|
|
967
|
+
- id: system_of_record # must match an id in the mcp array
|
|
968
|
+
prompt: |
|
|
969
|
+
Upsert the extracted ARR metrics to the system of record.
|
|
970
|
+
Use namespace: {{context.kpi_namespace}}
|
|
971
|
+
prompt: "Return ARR snapshot and upsert to system of record."
|
|
972
|
+
```
|
|
973
|
+
|
|
974
|
+
---
|
|
975
|
+
|
|
976
|
+
## Environment Variables
|
|
977
|
+
|
|
978
|
+
All string fields support env var substitution:
|
|
979
|
+
|
|
980
|
+
| Syntax | Behavior |
|
|
981
|
+
|--------|----------|
|
|
982
|
+
| `${VAR_NAME}` | Required — error if missing |
|
|
983
|
+
| `${VAR_NAME:-default_value}` | Optional — uses default if missing |
|
|
984
|
+
|
|
985
|
+
```yaml
|
|
986
|
+
llamaindex:
|
|
987
|
+
api_key: ${OPENAI_API_KEY} # required
|
|
988
|
+
base_url: ${OPENAI_BASE_URL:-https://api.openai.com/v1} # optional with default
|
|
989
|
+
|
|
990
|
+
doc_repository:
|
|
991
|
+
api_key: ${DOC_REPO_API_KEY} # required
|
|
992
|
+
api_base_url: ${DOC_REPO_URL:-http://localhost:8080} # optional with default
|
|
993
|
+
```
|
|
994
|
+
|
|
995
|
+
---
|
|
996
|
+
|
|
997
|
+
## Complete Minimal Example
|
|
998
|
+
|
|
999
|
+
The smallest valid configuration — one computation, one subtask, minimal template:
|
|
1000
|
+
|
|
1001
|
+
```yaml
|
|
1002
|
+
schema_version: "2.2"
|
|
1003
|
+
|
|
1004
|
+
report:
|
|
1005
|
+
id: minimal_arr_report
|
|
1006
|
+
name: "Minimal ARR Report"
|
|
1007
|
+
|
|
1008
|
+
context:
|
|
1009
|
+
company_name: "Acme Corp"
|
|
1010
|
+
as_of_period: "Q4 2025"
|
|
1011
|
+
|
|
1012
|
+
docsets:
|
|
1013
|
+
- id: main_deck
|
|
1014
|
+
api_key: ${OPENAI_API_KEY}
|
|
1015
|
+
docs:
|
|
1016
|
+
- id: board_deck
|
|
1017
|
+
locator: "repo:doc-abc12345"
|
|
1018
|
+
description: "Q4 2025 Board Deck"
|
|
1019
|
+
|
|
1020
|
+
llamaindex:
|
|
1021
|
+
model: gpt-4o
|
|
1022
|
+
api_key: ${OPENAI_API_KEY}
|
|
1023
|
+
|
|
1024
|
+
computations:
|
|
1025
|
+
fields:
|
|
1026
|
+
- id: arr_summary
|
|
1027
|
+
label: "ARR Summary"
|
|
1028
|
+
docset_id: main_deck
|
|
1029
|
+
agents:
|
|
1030
|
+
- id: arr_extract
|
|
1031
|
+
prompt: |
|
|
1032
|
+
Extract ending ARR and NRR for the most recent quarter.
|
|
1033
|
+
Use basic_search: "End ARR NRR quarterly"
|
|
1034
|
+
Return null for any value not found.
|
|
1035
|
+
output_schema:
|
|
1036
|
+
type: object
|
|
1037
|
+
properties:
|
|
1038
|
+
end_arr: { type: number }
|
|
1039
|
+
nrr: { type: number }
|
|
1040
|
+
period: { type: string }
|
|
1041
|
+
citations: { type: array, items: { type: string } }
|
|
1042
|
+
mcp_scope: []
|
|
1043
|
+
prompt: "Return ARR summary with citations."
|
|
1044
|
+
|
|
1045
|
+
template:
|
|
1046
|
+
format: markdown
|
|
1047
|
+
content: |
|
|
1048
|
+
# {{report.name}} — {{context.as_of_period}}
|
|
1049
|
+
|
|
1050
|
+
| Metric | Value |
|
|
1051
|
+
|--------|-------|
|
|
1052
|
+
| End ARR | {{ field.arr_summary.end_arr | currency("$", 0) | default("—") }} |
|
|
1053
|
+
| NRR | {{ field.arr_summary.nrr | number(1) | default("—") }}% |
|
|
1054
|
+
| Period | {{ field.arr_summary.period | default("—") }} |
|
|
1055
|
+
```
|
|
1056
|
+
|
|
1057
|
+
---
|
|
1058
|
+
|
|
1059
|
+
## Full Annotated Example (Finance)
|
|
1060
|
+
|
|
1061
|
+
A production-quality configuration for a SaaS company CFO board pack. Covers all major patterns:
|
|
1062
|
+
|
|
1063
|
+
```yaml
|
|
1064
|
+
schema_version: "2.2"
|
|
1065
|
+
|
|
1066
|
+
# ─── Report Identity ──────────────────────────────────────────────────────────
|
|
1067
|
+
report:
|
|
1068
|
+
id: acme_cfo_kpi_q4_2025
|
|
1069
|
+
name: "Acme CFO KPI Pack — Q4 2025"
|
|
1070
|
+
description: |
|
|
1071
|
+
CFO-ready KPI extraction from the Q4 2025 Board Presentation.
|
|
1072
|
+
Covers ARR, retention, liquidity, bookings, and customer success.
|
|
1073
|
+
version: "2025.Q4"
|
|
1074
|
+
|
|
1075
|
+
# ─── Context Variables ────────────────────────────────────────────────────────
|
|
1076
|
+
# These are injected into prompts as {{context.variable_name}}
|
|
1077
|
+
context:
|
|
1078
|
+
company_name: "Acme Corp, Inc."
|
|
1079
|
+
currency: "USD"
|
|
1080
|
+
as_of_period: "CY2025 Q4"
|
|
1081
|
+
source_title: "Q4 2025 Acme Board Meeting Presentation"
|
|
1082
|
+
meeting_date: "2026-01-15"
|
|
1083
|
+
units_policy: |
|
|
1084
|
+
Return raw numeric values. No $, commas, or M/K suffixes.
|
|
1085
|
+
Convert: "$2.5M" → 2500000, "$350K" → 350000.
|
|
1086
|
+
Percentages as whole numbers (92.3, not 0.923).
|
|
1087
|
+
value_types:
|
|
1088
|
+
- "money_usd"
|
|
1089
|
+
- "percent"
|
|
1090
|
+
- "count"
|
|
1091
|
+
- "string"
|
|
1092
|
+
|
|
1093
|
+
# ─── Document Repository ──────────────────────────────────────────────────────
|
|
1094
|
+
doc_repository:
|
|
1095
|
+
api_base_url: ${DOC_REPO_URL:-http://localhost:8080}
|
|
1096
|
+
api_key: ${DOC_REPO_API_KEY}
|
|
1097
|
+
rotation: 200
|
|
1098
|
+
|
|
1099
|
+
mcp: [] # No external system integrations in this example
|
|
1100
|
+
|
|
1101
|
+
# ─── Document Collections ─────────────────────────────────────────────────────
|
|
1102
|
+
docsets:
|
|
1103
|
+
- id: acme_board_deck_q4
|
|
1104
|
+
description: "Acme Q4 2025 Board Deck — single source of truth"
|
|
1105
|
+
api_key: ${OPENAI_API_KEY}
|
|
1106
|
+
docs:
|
|
1107
|
+
- id: board_deck_pdf
|
|
1108
|
+
locator: "repo:doc-abc12345"
|
|
1109
|
+
description: "Q4 2025 Acme Board Meeting Presentation"
|
|
1110
|
+
|
|
1111
|
+
# ─── LLM & Agent Configuration ───────────────────────────────────────────────
|
|
1112
|
+
llamaindex:
|
|
1113
|
+
model: gpt-4o
|
|
1114
|
+
base_url: ${OPENAI_BASE_URL:-https://api.openai.com/v1}
|
|
1115
|
+
api_key: ${OPENAI_API_KEY}
|
|
1116
|
+
max_tool_calls: 50
|
|
1117
|
+
tool_timeout_seconds: 180
|
|
1118
|
+
max_agent_iterations: 40
|
|
1119
|
+
system_prompt: |
|
|
1120
|
+
You are a CFO-grade KPI extraction analyst.
|
|
1121
|
+
|
|
1122
|
+
## ANTI-FABRICATION RULE
|
|
1123
|
+
ALL values MUST come from search tool results.
|
|
1124
|
+
Never use values from this prompt. Never guess. Return null if not found.
|
|
1125
|
+
|
|
1126
|
+
## Rules
|
|
1127
|
+
1. EXACT VALUES: Use values as found (raw numbers, not formatted)
|
|
1128
|
+
2. CITE EVERYTHING: Every numeric value needs a c_xxxxxxxx citation
|
|
1129
|
+
3. LABEL PERIODS: Include the time period with every metric
|
|
1130
|
+
4. NULL IS CORRECT: Return null for any value not explicitly in the source
|
|
1131
|
+
|
|
1132
|
+
# ─── Computations ─────────────────────────────────────────────────────────────
|
|
1133
|
+
computations:
|
|
1134
|
+
fields:
|
|
1135
|
+
|
|
1136
|
+
# ── ARR & Retention ───────────────────────────────────────────────────────
|
|
1137
|
+
- id: arr_snapshot
|
|
1138
|
+
label: "ARR & Retention Snapshot (Q4 2025)"
|
|
1139
|
+
type: object
|
|
1140
|
+
priority: 1
|
|
1141
|
+
docset_id: acme_board_deck_q4
|
|
1142
|
+
agents:
|
|
1143
|
+
- id: arr_extract
|
|
1144
|
+
# Autonomous mode — agent picks basic_search for tables
|
|
1145
|
+
prompt: |
|
|
1146
|
+
Extract the Q4 2025 ARR waterfall and retention metrics.
|
|
1147
|
+
|
|
1148
|
+
## What to Find
|
|
1149
|
+
From the "ARR & Logo Waterfall" table, Q4 2025 column:
|
|
1150
|
+
- beg: Beginning ARR for the quarter
|
|
1151
|
+
- new_logos: ARR added from new customers
|
|
1152
|
+
- net_upsell: Net expansion ARR (positive)
|
|
1153
|
+
- churn: ARR lost to churn (NEGATIVE number)
|
|
1154
|
+
- end: Ending ARR
|
|
1155
|
+
|
|
1156
|
+
Retention metrics (bottom of same table or separate section):
|
|
1157
|
+
- nrr: Net Revenue Retention % (whole number, e.g. 92.3)
|
|
1158
|
+
- grr: Gross Revenue Retention % (whole number, e.g. 88.5)
|
|
1159
|
+
|
|
1160
|
+
## Search Strategy
|
|
1161
|
+
1. basic_search: "ARR waterfall Q4 2025 Beg ARR End ARR new logos churn"
|
|
1162
|
+
2. basic_search: "NRR GRR retention quarterly"
|
|
1163
|
+
3. Try multiple queries if first returns nothing
|
|
1164
|
+
|
|
1165
|
+
## CRITICAL
|
|
1166
|
+
Citations use [CITE AS: "c_xxxxxxxx"] format in search results.
|
|
1167
|
+
Retention must be whole-number scale (92.3 not 0.923).
|
|
1168
|
+
Churn must be negative (e.g. -81579).
|
|
1169
|
+
output_schema:
|
|
1170
|
+
type: object
|
|
1171
|
+
properties:
|
|
1172
|
+
period_label: { type: string }
|
|
1173
|
+
arr:
|
|
1174
|
+
type: object
|
|
1175
|
+
properties:
|
|
1176
|
+
beg: { type: number }
|
|
1177
|
+
new_logos: { type: number }
|
|
1178
|
+
net_upsell: { type: number }
|
|
1179
|
+
churn: { type: number }
|
|
1180
|
+
end: { type: number }
|
|
1181
|
+
retention:
|
|
1182
|
+
type: object
|
|
1183
|
+
properties:
|
|
1184
|
+
nrr: { type: number, minimum: 50, maximum: 150 }
|
|
1185
|
+
grr: { type: number, minimum: 50, maximum: 100 }
|
|
1186
|
+
citations: { type: array, items: { type: string } }
|
|
1187
|
+
notes: { type: string }
|
|
1188
|
+
mcp_scope: []
|
|
1189
|
+
prompt: "Return Q4 2025 ARR waterfall and retention metrics with citations."
|
|
1190
|
+
|
|
1191
|
+
# ── Liquidity & Runway ─────────────────────────────────────────────────────
|
|
1192
|
+
- id: liquidity_snapshot
|
|
1193
|
+
label: "Liquidity & Runway Snapshot"
|
|
1194
|
+
type: object
|
|
1195
|
+
priority: 1
|
|
1196
|
+
docset_id: acme_board_deck_q4
|
|
1197
|
+
agents:
|
|
1198
|
+
- id: liquidity_extract
|
|
1199
|
+
prompt: |
|
|
1200
|
+
Extract liquidity and runway metrics from the CFO report and cash forecast.
|
|
1201
|
+
|
|
1202
|
+
## Values to Find
|
|
1203
|
+
- cash_current: Current cash balance
|
|
1204
|
+
- monthly_burn: Monthly net cash burn
|
|
1205
|
+
- runway_months: Months of runway at current burn
|
|
1206
|
+
- gap_to_breakeven: Additional revenue needed for breakeven
|
|
1207
|
+
|
|
1208
|
+
## Unit Conversion — CRITICAL
|
|
1209
|
+
The deck uses "$X.XM" and "$XXXk" notation. Convert to full dollars:
|
|
1210
|
+
"$2.5M" → 2500000 | "$350K" → 350000 | "$7.3M" → 7300000
|
|
1211
|
+
|
|
1212
|
+
## Search Strategy
|
|
1213
|
+
1. basic_search: "cash balance runway breakeven monthly burn"
|
|
1214
|
+
2. basic_search: "cash forecast projected expense revenue run rate"
|
|
1215
|
+
output_schema:
|
|
1216
|
+
type: object
|
|
1217
|
+
properties:
|
|
1218
|
+
cash_current: { type: number }
|
|
1219
|
+
monthly_burn: { type: number }
|
|
1220
|
+
runway_months: { type: number }
|
|
1221
|
+
gap_to_breakeven: { type: number }
|
|
1222
|
+
citations: { type: array, items: { type: string } }
|
|
1223
|
+
notes: { type: string }
|
|
1224
|
+
mcp_scope: []
|
|
1225
|
+
prompt: "Return liquidity and runway snapshot with citations."
|
|
1226
|
+
|
|
1227
|
+
# ── Executive Summary (depends on ARR + Liquidity) ─────────────────────────
|
|
1228
|
+
- id: executive_summary
|
|
1229
|
+
label: "Executive Summary (CFO-Ready)"
|
|
1230
|
+
type: object
|
|
1231
|
+
priority: 2
|
|
1232
|
+
depends_on: [arr_snapshot, liquidity_snapshot] # runs AFTER these complete
|
|
1233
|
+
docset_id: acme_board_deck_q4
|
|
1234
|
+
agents:
|
|
1235
|
+
- id: exec_extract
|
|
1236
|
+
prompt: |
|
|
1237
|
+
Extract a CFO-ready executive summary from the board deck.
|
|
1238
|
+
|
|
1239
|
+
## What to Extract
|
|
1240
|
+
- highlights: 3-5 key performance bullets (wins, metrics vs plan)
|
|
1241
|
+
- risks: 2-4 operational or financial risks mentioned
|
|
1242
|
+
- asks: 1-3 board decisions or resource requests
|
|
1243
|
+
|
|
1244
|
+
## Search Strategy
|
|
1245
|
+
1. global_search: "strategic themes highlights performance"
|
|
1246
|
+
2. basic_search: "risks challenges board asks decisions"
|
|
1247
|
+
3. Every bullet MUST have citations from [CITE AS] markers
|
|
1248
|
+
|
|
1249
|
+
output_schema:
|
|
1250
|
+
type: object
|
|
1251
|
+
properties:
|
|
1252
|
+
highlights:
|
|
1253
|
+
type: array
|
|
1254
|
+
items:
|
|
1255
|
+
type: object
|
|
1256
|
+
properties:
|
|
1257
|
+
text: { type: string }
|
|
1258
|
+
citations: { type: array, items: { type: string } }
|
|
1259
|
+
risks:
|
|
1260
|
+
type: array
|
|
1261
|
+
items:
|
|
1262
|
+
type: object
|
|
1263
|
+
properties:
|
|
1264
|
+
text: { type: string }
|
|
1265
|
+
citations: { type: array, items: { type: string } }
|
|
1266
|
+
asks:
|
|
1267
|
+
type: array
|
|
1268
|
+
items:
|
|
1269
|
+
type: object
|
|
1270
|
+
properties:
|
|
1271
|
+
text: { type: string }
|
|
1272
|
+
citations: { type: array, items: { type: string } }
|
|
1273
|
+
mcp_scope: []
|
|
1274
|
+
prompt: "Synthesize executive summary with highlights, risks, and asks."
|
|
1275
|
+
|
|
1276
|
+
tables:
|
|
1277
|
+
# ── ARR Quarterly History ──────────────────────────────────────────────────
|
|
1278
|
+
- id: arr_quarterly_history
|
|
1279
|
+
title: "ARR & Logo Waterfall — Quarterly History"
|
|
1280
|
+
priority: 2
|
|
1281
|
+
docset_id: acme_board_deck_q4
|
|
1282
|
+
agents:
|
|
1283
|
+
- id: arr_history_extract
|
|
1284
|
+
prompt: |
|
|
1285
|
+
Extract the full quarterly ARR waterfall table (all periods shown).
|
|
1286
|
+
|
|
1287
|
+
## For EACH quarter in the table
|
|
1288
|
+
- period: Quarter label (e.g., "2024Q1", "2024Q2", ..., "2025Q4")
|
|
1289
|
+
- beg_arr, new_logos_arr, net_upsell_arr, churn_arr, end_arr (full dollar amounts)
|
|
1290
|
+
- nrr, grr (whole-number percentages, e.g. 92.3 not 0.923)
|
|
1291
|
+
|
|
1292
|
+
## MUST try multiple searches — the table is wide
|
|
1293
|
+
1. basic_search: "ARR waterfall quarterly Beg ARR End ARR" (k=10)
|
|
1294
|
+
2. basic_search: "2024Q1 2024Q2 ARR new logos churn" (k=10)
|
|
1295
|
+
3. Combine results to build the complete table
|
|
1296
|
+
|
|
1297
|
+
## VERIFY before returning
|
|
1298
|
+
- At least 3 quarters of data
|
|
1299
|
+
- End ARR values > $5,000,000 for recent quarters
|
|
1300
|
+
- NRR values between 80–120 (not 0.80–1.20)
|
|
1301
|
+
- Each row has at least one citation
|
|
1302
|
+
|
|
1303
|
+
Return {"rows": [...]}
|
|
1304
|
+
output_schema:
|
|
1305
|
+
type: object
|
|
1306
|
+
properties:
|
|
1307
|
+
rows:
|
|
1308
|
+
type: array
|
|
1309
|
+
items:
|
|
1310
|
+
type: object
|
|
1311
|
+
properties:
|
|
1312
|
+
period: { type: string }
|
|
1313
|
+
beg_arr: { type: number }
|
|
1314
|
+
new_logos_arr: { type: number }
|
|
1315
|
+
net_upsell_arr: { type: number }
|
|
1316
|
+
churn_arr: { type: number }
|
|
1317
|
+
end_arr: { type: number }
|
|
1318
|
+
nrr: { type: number, minimum: 50, maximum: 150 }
|
|
1319
|
+
grr: { type: number, minimum: 50, maximum: 100 }
|
|
1320
|
+
citations: { type: array, items: { type: string } }
|
|
1321
|
+
mcp_scope: []
|
|
1322
|
+
prompt: "Return quarterly ARR history with citations per row."
|
|
1323
|
+
|
|
1324
|
+
# ─── Output Template ──────────────────────────────────────────────────────────
|
|
1325
|
+
template:
|
|
1326
|
+
format: markdown
|
|
1327
|
+
sections:
|
|
1328
|
+
report_header: |
|
|
1329
|
+
# {{report.name}}
|
|
1330
|
+
> **Source:** {{context.source_title}} | **Period:** {{context.as_of_period}}
|
|
1331
|
+
|
|
1332
|
+
content: |
|
|
1333
|
+
{{sections.report_header}}
|
|
1334
|
+
|
|
1335
|
+
---
|
|
1336
|
+
|
|
1337
|
+
## Executive Summary
|
|
1338
|
+
|
|
1339
|
+
### Highlights
|
|
1340
|
+
{{ field.executive_summary.highlights | default("*No data extracted*") }}
|
|
1341
|
+
|
|
1342
|
+
### Risks
|
|
1343
|
+
{{ field.executive_summary.risks | default("*No data extracted*") }}
|
|
1344
|
+
|
|
1345
|
+
### Board Asks
|
|
1346
|
+
{{ field.executive_summary.asks | default("*No data extracted*") }}
|
|
1347
|
+
|
|
1348
|
+
---
|
|
1349
|
+
|
|
1350
|
+
## ARR & Retention (Q4 2025)
|
|
1351
|
+
|
|
1352
|
+
**Period:** {{ field.arr_snapshot.period_label | default("N/A") }}
|
|
1353
|
+
|
|
1354
|
+
### ARR Waterfall
|
|
1355
|
+
| Metric | Value |
|
|
1356
|
+
|--------|-------|
|
|
1357
|
+
| Beg ARR | {{ field.arr_snapshot.arr.beg | currency("$", 0) | default("—") }} |
|
|
1358
|
+
| + New Logos | {{ field.arr_snapshot.arr.new_logos | currency("$", 0) | default("—") }} |
|
|
1359
|
+
| + Net Upsell | {{ field.arr_snapshot.arr.net_upsell | currency("$", 0) | default("—") }} |
|
|
1360
|
+
| − Churn | {{ field.arr_snapshot.arr.churn | currency("$", 0) | default("—") }} |
|
|
1361
|
+
| **End ARR** | **{{ field.arr_snapshot.arr.end | currency("$", 0) | default("—") }}** |
|
|
1362
|
+
|
|
1363
|
+
### Retention
|
|
1364
|
+
| Metric | Value |
|
|
1365
|
+
|--------|-------|
|
|
1366
|
+
| NRR | {{ field.arr_snapshot.retention.nrr | number(1) | default("—") }}% |
|
|
1367
|
+
| GRR | {{ field.arr_snapshot.retention.grr | number(1) | default("—") }}% |
|
|
1368
|
+
|
|
1369
|
+
**Citations:** {{ field.arr_snapshot.citations | default("*None*") }}
|
|
1370
|
+
|
|
1371
|
+
### Quarterly ARR History
|
|
1372
|
+
{{table.arr_quarterly_history}}
|
|
1373
|
+
|
|
1374
|
+
---
|
|
1375
|
+
|
|
1376
|
+
## Liquidity & Runway
|
|
1377
|
+
|
|
1378
|
+
| Metric | Value |
|
|
1379
|
+
|--------|-------|
|
|
1380
|
+
| Cash on Hand | {{ field.liquidity_snapshot.cash_current | currency("$", 0) | default("—") }} |
|
|
1381
|
+
| Monthly Burn | {{ field.liquidity_snapshot.monthly_burn | currency("$", 0) | default("—") }}/mo |
|
|
1382
|
+
| Runway | {{ field.liquidity_snapshot.runway_months | number(1) | default("—") }} months |
|
|
1383
|
+
| Gap to Breakeven | {{ field.liquidity_snapshot.gap_to_breakeven | currency("$", 0) | default("—") }} |
|
|
1384
|
+
|
|
1385
|
+
**Citations:** {{ field.liquidity_snapshot.citations | default("*None*") }}
|
|
1386
|
+
|
|
1387
|
+
# ─── Guardrails ───────────────────────────────────────────────────────────────
|
|
1388
|
+
guardrails:
|
|
1389
|
+
- id: gr_citation_completeness
|
|
1390
|
+
target: computations
|
|
1391
|
+
on_fail: warn
|
|
1392
|
+
model: gpt-4.1-mini
|
|
1393
|
+
api_key: ${OPENAI_API_KEY}
|
|
1394
|
+
description: "All numeric values must have citations"
|
|
1395
|
+
prompt: |
|
|
1396
|
+
Check that every numeric value has at least one citation (c_xxxxxxxx format).
|
|
1397
|
+
Empty citation arrays [] are failures.
|
|
1398
|
+
Return JSON: {"pass": boolean, "issues": ["list of fields missing citations"]}
|
|
1399
|
+
Content: {{content}}
|
|
1400
|
+
|
|
1401
|
+
- id: gr_arr_waterfall_math
|
|
1402
|
+
target: field.arr_snapshot
|
|
1403
|
+
on_fail: warn
|
|
1404
|
+
model: gpt-4.1-mini
|
|
1405
|
+
api_key: ${OPENAI_API_KEY}
|
|
1406
|
+
description: "ARR bridge must add up correctly"
|
|
1407
|
+
prompt: |
|
|
1408
|
+
Validate: beg + new_logos + net_upsell + churn = end
|
|
1409
|
+
Churn is already negative — ADD it, don't subtract.
|
|
1410
|
+
Allow 1% tolerance for rounding.
|
|
1411
|
+
Return JSON: {"pass": boolean, "computed_end": number, "stated_end": number, "issues": []}
|
|
1412
|
+
Content: {{content}}
|
|
1413
|
+
|
|
1414
|
+
- id: gr_fabrication_check
|
|
1415
|
+
target: computations
|
|
1416
|
+
on_fail: warn
|
|
1417
|
+
model: gpt-4.1-mini
|
|
1418
|
+
api_key: ${OPENAI_API_KEY}
|
|
1419
|
+
description: "Detect fabricated placeholder values"
|
|
1420
|
+
prompt: |
|
|
1421
|
+
FAIL if you find: citations like "Source A"/"Source B" (not c_xxxxxxxx),
|
|
1422
|
+
ALL dollar values as round multiples of 10000, or retention values below 1.0.
|
|
1423
|
+
Return JSON: {"pass": boolean, "issues": []}
|
|
1424
|
+
Content: {{content}}
|
|
1425
|
+
|
|
1426
|
+
- id: gr_report_completeness
|
|
1427
|
+
target: final_report
|
|
1428
|
+
on_fail: warn
|
|
1429
|
+
model: gpt-4.1-mini
|
|
1430
|
+
api_key: ${OPENAI_API_KEY}
|
|
1431
|
+
description: "Final report must contain real data"
|
|
1432
|
+
prompt: |
|
|
1433
|
+
Check the report has actual numeric data (not just "—" everywhere).
|
|
1434
|
+
Return JSON: {"pass": boolean, "sections_populated": [], "sections_empty": [], "issues": []}
|
|
1435
|
+
Content: {{content}}
|
|
1436
|
+
|
|
1437
|
+
# ─── Execution Settings ────────────────────────────────────────────────────────
|
|
1438
|
+
execution:
|
|
1439
|
+
retries:
|
|
1440
|
+
max_attempts: 3
|
|
1441
|
+
backoff_seconds: 2.0
|
|
1442
|
+
output:
|
|
1443
|
+
directory: ./output
|
|
1444
|
+
timestamp_suffix: false
|
|
1445
|
+
include_final_report: true
|
|
1446
|
+
include_computed_json: true
|
|
1447
|
+
include_evidence: true
|
|
1448
|
+
include_guardrails: true
|
|
1449
|
+
```
|