npm - @talonic/docs - Versions diffs - 0.20.10 → 0.20.12 - Mend

@talonic/docs 0.20.10 → 0.20.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/dist/content.js +1202 -6
package/package.json +1 -1
package/dist/tailwind-preset.d.cts +0 -45
package/dist/tailwind-preset.d.ts +0 -45

package/dist/content.js CHANGED Viewed

@@ -542,6 +542,14 @@ var sections = [
           }
         ]
       },
+      {
+        type: "paragraph",
+        text: 'Understanding the relationship between these concepts is key to getting the most from the platform. When you upload documents, the extraction pipeline discovers every data point and feeds them into the **Field Registry**. The registry uses AI embeddings to cluster semantically similar fields \u2014 so "Vendor Name", "Supplier Name", and "Company Name" are recognized as the same concept. Over time, frequently occurring fields are promoted to higher tiers, and the platform synthesizes master extraction instructions that encode the best way to extract each field.'
+      },
+      {
+        type: "paragraph",
+        text: "The **Schema** layer sits on top of the registry and defines what output you need. You can use auto-generated schemas that the platform creates for each document type, or build custom template schemas by selecting specific fields from the registry. When a schema is applied to documents in a **Job**, the 4-phase pipeline fills every cell \u2014 starting with free graph lookups and falling back to AI agents for the remainder. The result is a structured grid where each row is a document and each column is a field."
+      },
       {
         type: "callout",
         variant: "info",
@@ -617,6 +625,14 @@ var sections = [
         type: "paragraph",
         text: "The pipeline is designed to be **progressive** \u2014 results appear as each phase completes rather than waiting for the entire job to finish. Phase 1 (graph resolve) fills ~30% of cells instantly and for free. Phase 2 (AI extraction) fills the remaining gaps. Phases 3 and 4 handle re-resolution and transformation. You can start reviewing early results while later phases are still running."
       },
+      {
+        type: "paragraph",
+        text: "Use the platform flow as a mental model when planning your workflow. For small, ad-hoc extractions you can go from upload to results in minutes \u2014 upload a few documents, pick an auto-generated schema, and run a job. For production workloads, invest time in the **Define schema** step: map fields to the registry, add reference tables for code lookups, and set format constraints. The upfront effort pays off because every subsequent job reuses the same schema and benefits from the growing knowledge graph."
+      },
+      {
+        type: "paragraph",
+        text: "After results are delivered, the feedback loop closes automatically. Corrections you make during the **Review** stage feed back into the Field Registry, improving future extractions. The platform tracks telemetry across runs \u2014 strategy distribution, capture hit rate, and resolve rate \u2014 so you can monitor how extraction quality improves over time as the knowledge graph accumulates more data."
+      },
       {
         type: "callout",
         variant: "info",
@@ -679,6 +695,14 @@ var sections = [
         title: "Sidebar Navigation",
         caption: "The sidebar provides access to all sections. Click the collapse button to save space. Press Cmd+K for global search."
       },
+      {
+        type: "paragraph",
+        text: "For teams processing documents at scale, the recommended approach is to start with a small representative sample. Upload 5-10 documents of the same type, let the platform extract and classify them, then review the auto-generated schema. This lets you validate the output structure before committing to a large batch. Once the schema looks right, you can upload hundreds or thousands of documents and the knowledge graph will handle an increasing share of cells through instant graph matches."
+      },
+      {
+        type: "paragraph",
+        text: "The platform includes powerful keyboard shortcuts for fast navigation. Press `Cmd+K` (or `Ctrl+K` on Windows) to open **Omnisearch**, which lets you find documents, schemas, jobs, and fields from anywhere. Press `Cmd+I` to open the **AI Agent** for natural language queries about your workspace. The sidebar can be collapsed to give more screen real estate when reviewing extraction results."
+      },
       {
         type: "callout",
         text: "The fastest path to results: upload documents in **Sources**, then go to **Structuring &rarr; Runs &rarr; New** to create your first extraction job."
@@ -735,6 +759,18 @@ var sections2 = [
         type: "paragraph",
         text: "Talonic includes an embedded AI agent accessible from every page via `Cmd+I` (`Ctrl+I` on Windows). The agent understands your workspace context and can inspect schemas, search documents, analyze extraction quality, explore cases, and build schemas \u2014 all through natural language."
       },
+      {
+        type: "paragraph",
+        text: "The agent is context-aware, meaning it automatically knows which page you are on and what data is visible. If you open the agent from a document detail page, it already has that document in scope and can answer questions about its extracted fields, processing status, or classification without you needing to specify which document you mean."
+      },
+      {
+        type: "paragraph",
+        text: "The agent classifies every user message as either a **question** (answered with information) or a **command** (triggers an action). Questions are handled instantly with read-only access, while commands go through the impact-level system to ensure safety. The agent streams its responses in real time, so you can see reasoning unfold as it queries your workspace data."
+      },
+      {
+        type: "paragraph",
+        text: "There are important limitations to be aware of. The agent cannot access external systems or the internet \u2014 it only works with data already in your Talonic workspace. It cannot bypass permission boundaries, so team members with read-only access cannot use the agent to make changes. Long-running operations like full batch extractions cannot be triggered through the agent; those must be initiated from the relevant UI page."
+      },
       { type: "heading", level: 3, id: "agent-capabilities", text: "What the Agent Can Do" },
       {
         type: "paragraph",
@@ -799,6 +835,14 @@ var sections2 = [
       {
         question: "Can the AI agent modify my data?",
         answer: "The agent operates workshop-first: schema changes create drafts, not live versions. Higher-impact operations require progressively more explicit confirmation."
+      },
+      {
+        question: "Is the AI agent context-aware?",
+        answer: "Yes. The agent automatically knows which page you are on and what data is visible. If you open it from a document detail page, it already has that document in scope and can answer questions about its fields, processing status, or classification."
+      },
+      {
+        question: "Can the AI agent access external systems or the internet?",
+        answer: "No. The agent only works with data already in your Talonic workspace. It cannot browse the internet, call external APIs, or access systems outside the platform."
       }
     ],
     mentions: [
@@ -846,6 +890,18 @@ var sections2 = [
           }
         ]
       },
+      {
+        type: "paragraph",
+        text: "The `read` impact level covers the vast majority of agent interactions. Searching documents, inspecting extraction results, browsing the field registry, and checking job status all execute instantly with no side effects. These read operations give you a fast way to explore your workspace without navigating through multiple pages."
+      },
+      {
+        type: "paragraph",
+        text: "The `draft_mutation` level is used when the agent creates or modifies schemas. Because all schema changes go through the workshop system, the agent can freely draft schemas without risk \u2014 nothing goes live until you explicitly review and publish. This makes the agent especially useful for rapid schema prototyping: describe the fields you need in plain language, and the agent creates a draft you can refine."
+      },
+      {
+        type: "paragraph",
+        text: 'The `live_mutation` and `irreversible` levels provide escalating safety gates for operations that affect production data. A `live_mutation` \u2014 such as triggering a job run or publishing a schema \u2014 presents a confirmation dialog that you must accept. An `irreversible` action \u2014 such as deleting a source or purging documents \u2014 requires you to type a confirmation keyword (e.g., "DELETE") to proceed, preventing accidental data loss.'
+      },
       {
         type: "callout",
         text: "The agent always operates workshop-first: schema changes create drafts, not live versions. You review and publish when ready."
@@ -863,6 +919,10 @@ var sections2 = [
       {
         question: "Does the AI agent make changes directly to live data?",
         answer: "No. The agent operates workshop-first. Schema changes create drafts, and live mutations require explicit user confirmation before executing."
+      },
+      {
+        question: "What happens when I ask the agent to delete something?",
+        answer: 'Deletion is classified as an irreversible action. The agent will ask you to type a confirmation keyword (e.g., "DELETE") before proceeding. This prevents accidental data loss from casual or ambiguous requests.'
       }
     ],
     mentions: ["impact levels", "draft mutation", "live mutation", "workshop-first"]
@@ -877,6 +937,18 @@ var sections2 = [
       {
         type: "paragraph",
         text: "The home page (click the Talonic logo) shows smart suggested prompts based on your workspace state. Prompts adapt to what is happening: active runs, schema creation opportunities, document types waiting for extraction. The agent input field lets you type any question directly from the dashboard."
+      },
+      {
+        type: "paragraph",
+        text: "The dashboard provides a workspace-level overview that helps you understand the health of your data pipeline at a glance. You can see document processing statistics, recent activity across sources, and the current state of your field registry. Key metrics like **capture rate**, **resolve rate**, and **synthesize rate** from the telemetry system are surfaced so you can spot trends without drilling into individual jobs."
+      },
+      {
+        type: "paragraph",
+        text: "Suggested prompts are dynamically generated based on what the platform detects in your workspace. If you have new document types that lack schemas, the dashboard suggests creating one. If a job run recently completed, it suggests reviewing the results. If field registry confirmations are pending, it prompts you to review them. This makes the dashboard a natural starting point for your workflow each session."
+      },
+      {
+        type: "paragraph",
+        text: "Every conversation with the agent is preserved in your session history, accessible from the dashboard. You can revisit previous questions and their answers, which is useful for auditing decisions or recalling how you configured a particular schema. The conversation history also provides continuity \u2014 if you asked the agent to analyze extraction quality last week, you can pick up where you left off."
       }
     ],
     related: [
@@ -891,6 +963,10 @@ var sections2 = [
       {
         question: "Do the suggested prompts change based on workspace state?",
         answer: "Yes. Prompts adapt dynamically based on active runs, schema creation opportunities, document types waiting for extraction, and other workspace activity."
+      },
+      {
+        question: "Can I revisit previous conversations with the agent?",
+        answer: "Yes. Every conversation is preserved in your session history, accessible from the dashboard. You can revisit previous questions, recall how you configured a schema, or pick up where you left off in a previous analysis."
       }
     ],
     mentions: ["dashboard", "suggested prompts", "workspace state", "agent input"]
@@ -923,6 +999,10 @@ var sections3 = [
       {
         type: "paragraph",
         text: "Files are deduplicated via SHA-256 hashing \u2014 uploading the same file twice won't create duplicates. Processing runs asynchronously so you can continue working."
+      },
+      {
+        type: "paragraph",
+        text: "When uploading folders or ZIP archives, the original directory structure is preserved as a `source_file_path` metadata field on each document (e.g., `contracts/2026/lease.pdf`). This field is available for filtering, export, and schema mapping \u2014 just like any AI-extracted field. It provides a natural way to organize and trace documents back to their original location in your file system."
       }
     ],
     related: [
@@ -938,6 +1018,10 @@ var sections3 = [
       {
         question: "Does Talonic detect duplicate uploads?",
         answer: "Yes. Files are deduplicated via SHA-256 hashing. Uploading the same file twice will not create duplicates."
+      },
+      {
+        question: "What happens when I upload a folder or ZIP archive?",
+        answer: "ZIP archives are unpacked recursively and each file is processed individually. Folders preserve the original directory structure as a source_file_path metadata field on each document, available for filtering and export."
       }
     ],
     mentions: [
@@ -956,6 +1040,10 @@ var sections3 = [
     seoTitle: "Supported File Formats \u2014 Talonic Docs",
     description: "Talonic supports 25+ file types across four processing paths: text fast-path, AI vision, OCR, and recursive archive unpacking. From PDF to XLSX to images.",
     content: [
+      {
+        type: "paragraph",
+        text: "Talonic supports 25+ file types across four distinct processing paths. Each path is optimized for its file category \u2014 text files are read directly with zero latency, while complex document formats go through OCR to produce high-quality Markdown. The processing path is selected automatically based on the file extension."
+      },
       {
         type: "param-table",
         title: "File processing paths",
@@ -981,6 +1069,23 @@ var sections3 = [
             description: "ZIP \u2014 unpacked and each file processed individually."
           }
         ]
+      },
+      {
+        type: "paragraph",
+        text: "The **OCR path** uses Mistral Document AI as the primary engine, with a Talonic API fallback if the primary service is unavailable. OCR converts documents to structured Markdown, preserving tables, headings, and layout information. For PDF files that exceed the configured chunk size (default 25 pages), the system automatically splits the document into page chunks, processes them in parallel, and merges the results \u2014 so even large documents are handled efficiently."
+      },
+      {
+        type: "paragraph",
+        text: `Image files follow the **AI Vision** path, where they are sent directly to the AI model for multimodal extraction. This means the AI "sees" the image and extracts data visually \u2014 useful for photos of receipts, scanned handwritten notes, or diagrams. If an image was previously OCR'd and produced meaningful Markdown (more than 100 characters), the system uses the Markdown extraction path instead, which enables richer quality metrics.`
+      },
+      {
+        type: "paragraph",
+        text: "The **text fast-path** is the most efficient route: files like CSV, JSON, and plain text are read directly into memory with no external API call. This means they process almost instantly and incur no OCR cost. Email files (EML, MSG) are parsed to extract both the message body and any attachments, with each attachment processed as a separate document."
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "The processing path is selected automatically based on the file extension \u2014 you do not need to configure anything. If a file type is not recognized, the platform will attempt OCR as a fallback before marking it as unsupported."
       }
     ],
     related: [
@@ -995,6 +1100,10 @@ var sections3 = [
       {
         question: "How does Talonic handle image files?",
         answer: "Image files (PNG, JPG, JPEG, GIF, WEBP) are sent to AI for multimodal visual extraction."
+      },
+      {
+        question: "How does Talonic handle large PDF files?",
+        answer: "PDF files that exceed the configured chunk size (default 25 pages) are automatically split into page chunks, processed in parallel, and merged. This ensures even large documents are handled efficiently without timeouts."
       }
     ],
     mentions: ["OCR", "AI vision", "text fast-path", "file formats", "PDF", "DOCX", "ZIP"]
@@ -1061,6 +1170,10 @@ var sections3 = [
       {
         question: "When is a document ready to use in jobs?",
         answer: "Documents are marked complete after AI extraction finishes. You can start using them in jobs immediately without waiting for further processing."
+      },
+      {
+        question: "What happens if OCR or extraction fails on a document?",
+        answer: "The platform automatically retries failed extractions (configurable, default 1 retry). If all retries fail, the document is marked as extraction_failed with a terminal status. OCR failures follow a separate retry path with fallback from Document AI to Talonic API to local parsers."
       }
     ],
     mentions: [
@@ -1087,6 +1200,19 @@ var sections3 = [
       {
         type: "paragraph",
         text: 'Documents sharing the same ontology type are automatically merged into one document type. When a new canonical type appears, it is auto-created with ontology metadata. Unresolvable documents are assigned "Unclassified Document".'
+      },
+      {
+        type: "paragraph",
+        text: `Classification is verified in a two-step process. First, **Document AI OCR** produces an annotation with a free-text type label during the OCR pass. Then, a **type resolution** step verifies that label against the actual document content. If the label and content disagree \u2014 for example, a German *Arbeitsvertrag* incorrectly labelled as "Service Agreement" \u2014 the system trusts the content and resolves the correct canonical type. This ensures accurate classification regardless of the OCR engine's labelling bias.`
+      },
+      {
+        type: "paragraph",
+        text: "Document types drive several downstream features. The platform auto-generates a **schema** for each document type, pre-populated with fields discovered from documents of that type. **Routing rules** can be configured per document type to automatically assign schemas or trigger jobs when new documents arrive. The **Field Registry** tracks which fields appear in which document types, building a cross-type knowledge graph over time."
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "You never need to create document types manually. The ontology is built into the platform and types are assigned automatically during classification. If you disagree with a classification, the AI agent can help you understand why a type was chosen and how the content signals were interpreted."
       }
     ],
     related: [
@@ -1102,6 +1228,10 @@ var sections3 = [
       {
         question: "Does document classification work in non-English languages?",
         answer: "Yes. The classifier works across all languages. For example, a German Arbeitsvertrag and an English Employment Contract map to the same canonical type."
+      },
+      {
+        question: "What happens if a document cannot be classified?",
+        answer: 'Unresolvable documents are assigned the "Unclassified Document" type. They can still be processed and extracted \u2014 the platform simply cannot map them to a specific canonical type in the 529-type ontology.'
       }
     ],
     mentions: [
@@ -1147,6 +1277,23 @@ var sections3 = [
             description: "View or download the source document."
           }
         ]
+      },
+      {
+        type: "paragraph",
+        text: "The **Raw Extraction** tab is the most detailed view, showing every field the AI discovered along with its confidence score and the source text that the value was extracted from. Each field displays a tier badge (Tier 1 green, Tier 2 amber, Tier 3 gray) indicating how well-established that field is across your document corpus. Synthetic metadata fields like `filename` and `source_file_path` appear here too, with full confidence (1.0)."
+      },
+      {
+        type: "paragraph",
+        text: "The **Resolved Data** tab shows how raw extracted fields map to your canonical field registry. Fields that matched automatically (similarity >= 0.80) display their canonical name and cluster. Fields in the confirm band (0.50-0.79) are flagged for review. This view helps you understand how the platform is normalizing field names across different document types and formats."
+      },
+      {
+        type: "paragraph",
+        text: "The **Processing Log** tab provides a stage-by-stage timeline of how the document was processed, including per-stage timing. You can see exactly how long OCR, classification, and extraction took, which is useful for diagnosing slow processing or understanding why a document was classified a particular way. The **Original File** tab lets you view or download the source file, so you can always compare the AI's extraction against the original document."
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "You can open the **AI Agent** (`Cmd+I`) from any document detail page. The agent automatically has the current document in scope and can answer questions about its fields, classification, or processing status without you needing to specify which document you mean."
       }
     ],
     related: [
@@ -1162,6 +1309,10 @@ var sections3 = [
       {
         question: "How can I see the confidence score of an extracted field?",
         answer: "Open the document detail page and navigate to the Raw Extraction tab. Each field displays its confidence score alongside the extracted value and source text."
+      },
+      {
+        question: "What do the tier badges on fields mean?",
+        answer: "Tier badges indicate how well-established a field is across your document corpus. Tier 1 (green) are universal core fields, Tier 2 (amber) are established promoted fields, and Tier 3 (gray) are newly discovered emerging fields."
       }
     ],
     mentions: [
@@ -1182,6 +1333,23 @@ var sections3 = [
       {
         type: "paragraph",
         text: "Routing rules automatically assign actions to documents based on their type. Configure rules to auto-assign schemas, trigger jobs, or route documents to specific workflows. Manage rules from **Documents &rarr; Routing**."
+      },
+      {
+        type: "paragraph",
+        text: "Each routing rule specifies a **document type** as the trigger condition and one or more **actions** to execute when a document of that type is processed. Actions include assigning a specific user schema, automatically creating a job run, or tagging the document for a particular workflow. Rules are evaluated in priority order, so you can layer general rules with more specific overrides."
+      },
+      {
+        type: "paragraph",
+        text: 'Routing rules are especially useful for high-volume ingestion pipelines. If you connect a Google Drive folder that receives hundreds of invoices per week, a routing rule can automatically assign your "Invoice" schema and trigger extraction \u2014 turning what would be manual work into a fully automated pipeline. Combined with **delivery bindings**, this creates an end-to-end flow from document upload to structured output with zero manual intervention.'
+      },
+      {
+        type: "paragraph",
+        text: "You can review rule execution history from the routing page to see which rules fired, which documents they matched, and what actions were taken. This audit trail helps you verify that your routing configuration is working as expected and diagnose cases where documents were not routed correctly."
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "Start with a simple routing rule for your most common document type. Once you verify it works correctly, expand to additional types. Rules are evaluated in priority order, so you can add specific overrides without disrupting existing rules."
       }
     ],
     related: [
@@ -1197,6 +1365,10 @@ var sections3 = [
       {
         question: "Where do I manage routing rules?",
         answer: "Navigate to Documents > Routing to create and manage routing rules for your workspace."
+      },
+      {
+        question: "Can routing rules fully automate my document processing pipeline?",
+        answer: "Yes. By combining routing rules with source connectors and delivery bindings, you can create a fully automated pipeline: documents arrive from a connected source, routing rules assign schemas and trigger extraction jobs, and delivery bindings push approved results to downstream systems."
       }
     ],
     mentions: ["routing rules", "auto-assign", "schema assignment", "document workflows"]
@@ -1272,6 +1444,14 @@ var sections3 = [
         type: "paragraph",
         text: "Google and Microsoft connectors share a single OAuth client each. OAuth tokens are encrypted at rest using `aes-256-gcm`. Each source card includes a **Batch Processing** toggle to defer extraction at 50% cost."
       },
+      {
+        type: "paragraph",
+        text: "OAuth-based connectors (Google Drive, Gmail, SharePoint, OneDrive, Outlook, Teams, Notion) use a consent-based flow where you authorize Talonic to access specific resources. For Microsoft connectors, Teams requires extended scopes that need tenant-admin consent. If a connector's OAuth credentials are revoked or expire, the source enters a disconnected state \u2014 reconnecting via the source settings page automatically refreshes the credentials without losing your existing documents."
+      },
+      {
+        type: "paragraph",
+        text: "Credential-based connectors (SQL, Amazon S3, Azure Blob) authenticate with access keys or connection strings rather than OAuth. SQL connections support PostgreSQL, MySQL, and MSSQL, with a built-in read-only safety layer that prevents accidental writes. S3-compatible storage like MinIO and Cloudflare R2 also works through the S3 connector. All credentials are encrypted at rest before being stored."
+      },
       {
         type: "callout",
         text: "Connectors are feature-gated on their OAuth client ID/secret. Without credentials configured, the connector dropdown entry is disabled."
@@ -1290,6 +1470,10 @@ var sections3 = [
       {
         question: "How are OAuth tokens stored?",
         answer: "OAuth access and refresh tokens are encrypted at rest using AES-256-GCM. The encryption key is SOURCE_ENCRYPTION_KEY (falls back to JWT_SECRET)."
+      },
+      {
+        question: "What happens if a connector loses its credentials or authorization?",
+        answer: "If OAuth credentials are revoked or expire, the source enters a disconnected state. Reconnecting via the source settings page automatically refreshes the credentials without losing your existing documents or configuration."
       }
     ],
     mentions: [
@@ -1331,6 +1515,18 @@ var sections4 = [
         id: "field-registry-table",
         title: "Field Registry \u2014 Registry Table",
         caption: "Fields are organized by tier with occurrence counts, data types, and master instruction status."
+      },
+      {
+        type: "paragraph",
+        text: "The registry grows automatically as documents are processed. During extraction, AI discovers fields from each document and resolves them against existing registry entries using **three-band matching** (exact name match, cluster member match, then semantic embedding similarity). New fields that don't match anything create a Tier 3 entry. Frequently occurring fields are promoted to higher tiers, so the registry naturally converges on a stable set of canonical fields over time."
+      },
+      {
+        type: "paragraph",
+        text: "Each registry entry tracks its **occurrence count** (how many documents contain this field), **data type** (string, number, date, etc.), **synonyms** (alternate names discovered across documents), and **master instruction** (an AI-synthesized extraction directive). The registry also maintains two embedding vectors per field: one for resolution matching and one for graph visualization, ensuring that each concern uses the most appropriate representation."
+      },
+      {
+        type: "paragraph",
+        text: "The registry is the foundation for several downstream features. **Jobs** use registry fields to pre-fill schema values via lookup cascades before resorting to LLM extraction. **Semantic clusters** group related registry fields together. **Generated schemas** are auto-built from registry fields that appear in a given document type. Understanding the registry is key to understanding how Talonic reduces extraction cost and improves accuracy over time."
       }
     ],
     related: [
@@ -1346,6 +1542,10 @@ var sections4 = [
       {
         question: "How does the Field Registry grow?",
         answer: "As documents are processed, AI discovers new fields and resolves them against existing registry entries. New fields create Tier 3 entries; frequently occurring fields are promoted to higher tiers."
+      },
+      {
+        question: "How does the Field Registry reduce extraction cost?",
+        answer: "The registry enables lookup-based resolution during job runs. When a field already exists in the registry with sufficient data, its value can be resolved via graph lookup instead of an AI call. Approximately 30% of cells are filled this way \u2014 instantly and at no cost."
       }
     ],
     mentions: [
@@ -1387,6 +1587,18 @@ var sections4 = [
           }
         ]
       },
+      {
+        type: "paragraph",
+        text: "**Tier 1** fields are the most reliable and cost-efficient. During job runs, Tier 1 fields can often be resolved via lookup tables or registry transfer without any AI call, meaning they cost nothing to extract. These are fields like `invoice_number`, `date`, or `total_amount` that appear universally across document types and have well-established extraction patterns."
+      },
+      {
+        type: "paragraph",
+        text: "**Tier 2** fields are promoted from Tier 3 after meeting frequency thresholds \u2014 specifically, 5 occurrences or a 10% occurrence rate across your documents. Once promoted, these fields gain a synthesized master instruction and become candidates for lookup-based resolution. Promotion is evaluated automatically after every batch resolution run, so fields graduate without manual intervention as your document corpus grows."
+      },
+      {
+        type: "paragraph",
+        text: "**Tier 3** fields are newly discovered and may require a full Claude API call to extract during job runs, making them the most expensive tier. As more documents are processed and a Tier 3 field appears consistently, it is automatically promoted. You can also manually adjust a field's tier from the registry detail page if you know a field is stable enough to promote early."
+      },
       {
         type: "callout",
         text: "Tier badges appear throughout the platform as the primary quality signal. Tier 1 = green, Tier 2 = amber, Tier 3 = gray."
@@ -1404,6 +1616,10 @@ var sections4 = [
       {
         question: "How are fields promoted between tiers?",
         answer: "Fields are promoted automatically based on frequency thresholds. As more documents are processed and a field appears consistently, it moves from Tier 3 to Tier 2 and eventually to Tier 1."
+      },
+      {
+        question: "Can I manually change a field's tier?",
+        answer: "Yes. You can manually adjust a field's tier from the registry detail page. This is useful when you know a field is stable enough to promote early, or when you want to demote a field that was promoted prematurely."
       }
     ],
     mentions: ["tier system", "Tier 1", "Tier 2", "Tier 3", "field promotion", "quality signal"]
@@ -1418,6 +1634,23 @@ var sections4 = [
       {
         type: "paragraph",
         text: 'Fields with similar meanings are automatically grouped using AI embeddings. For example, "Vendor Name", "Supplier Name", and "Company Name" cluster together. You can manually merge or split clusters from the Field Map view.'
+      },
+      {
+        type: "paragraph",
+        text: "Clustering uses the same three-band similarity model as field resolution. Fields with similarity >= 0.80 are automatically grouped into the same cluster. Fields in the 0.50-0.79 range are flagged as potential cluster candidates for manual confirmation. Fields below 0.50 similarity are kept separate. This graduated approach prevents false merges while still surfacing useful grouping suggestions."
+      },
+      {
+        type: "paragraph",
+        text: 'From the **Field Map** view, you can manually **merge** two clusters when you know they represent the same concept (e.g., merging a "Ship To Address" cluster with a "Delivery Address" cluster). You can also **split** a field out of a cluster if it was incorrectly grouped. These manual adjustments are permanent and improve the resolution model for all future documents \u2014 the system learns from your corrections.'
+      },
+      {
+        type: "paragraph",
+        text: 'Semantic clusters serve a practical purpose beyond organization. When a job runs, the resolution engine uses clusters to transfer values between fields that belong to the same cluster. If a document has a field called "Supplier Name" and your schema expects "Vendor Name", the cluster linkage allows the value to transfer automatically without an AI call. This is one of the key mechanisms that reduces extraction cost as your registry matures.'
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "Manual cluster adjustments are permanent and improve the model for all future documents. If you notice the platform grouping unrelated fields together, split them early \u2014 this prevents incorrect value transfers during job runs."
       }
     ],
     related: [
@@ -1433,6 +1666,10 @@ var sections4 = [
       {
         question: "Can I manually adjust semantic clusters?",
         answer: "Yes. You can manually merge or split clusters from the Field Map view in the Field Registry."
+      },
+      {
+        question: "How do semantic clusters reduce extraction cost?",
+        answer: 'When a job runs, the resolution engine uses clusters to transfer values between fields that belong to the same cluster. If a document has "Supplier Name" and your schema expects "Vendor Name", the cluster linkage allows the value to transfer automatically without an AI call.'
       }
     ],
     mentions: [
@@ -1478,6 +1715,10 @@ var sections4 = [
         type: "paragraph",
         text: "Resolution runs concurrently across documents. Each document's fields are resolved in an isolated transaction to prevent lock contention. Occurrence rates are updated after each transaction commits, keeping the registry eventually consistent without blocking concurrent ingestion."
       },
+      {
+        type: "paragraph",
+        text: "After resolution completes, the platform evaluates tier promotions and regenerates affected schemas in a fixed chain: resolve, then promote, then regenerate. This chain ensures that newly promoted fields immediately appear in auto-generated schemas. The resolution process also feeds into the **job pipeline** \u2014 during Phase 1 of a job run, the system uses a 3-tier lookup cascade (string normalization, token fuzzy matching, then AI fallback) to fill 60-80% of cells without a full LLM call, dramatically reducing cost."
+      },
       {
         type: "callout",
         text: "Pending confirmations from the confirm band appear in **Resolution &rarr; Pending Confirmations**. Accept to merge into an existing cluster, or reject to create a new field."
@@ -1496,6 +1737,10 @@ var sections4 = [
       {
         question: "Where can I review pending field confirmations?",
         answer: "Navigate to Resolution > Pending Confirmations to review fields in the confirm band. Accept to merge into an existing cluster, or reject to create a new field."
+      },
+      {
+        question: "What happens after resolution completes?",
+        answer: "After resolution, the platform evaluates tier promotions and regenerates affected schemas in a fixed chain: resolve, then promote, then regenerate. This ensures that newly promoted fields immediately appear in auto-generated schemas."
       }
     ],
     mentions: [
@@ -1517,6 +1762,18 @@ var sections4 = [
         type: "paragraph",
         text: "As the same field is extracted from many documents, AI synthesizes a **master instruction** \u2014 a reusable directive that captures the best way to extract that field. Master instructions improve accuracy over time and are automatically used when running jobs."
       },
+      {
+        type: "paragraph",
+        text: 'Master instructions are synthesized by analyzing the extraction patterns across all documents where a field appears. The AI examines how the field was successfully extracted \u2014 including the source text, confidence scores, and document context \u2014 and distills a concise directive that captures the best extraction approach. For example, a master instruction for "invoice_date" might specify: "Look for the date near the invoice number, typically in the header area. Prefer the issue date over due date. Format as ISO 8601."'
+      },
+      {
+        type: "paragraph",
+        text: "Master instructions fire automatically during **Phase 2** of job runs, when the AI agent extracts values for fields that could not be resolved via lookup. The instruction is injected into the AI prompt alongside the document content, giving the model specific guidance for that field. This is why master instructions improve accuracy: they encode domain-specific knowledge that the base model would otherwise lack."
+      },
+      {
+        type: "paragraph",
+        text: `You can view and edit master instructions from the field detail page in the registry. Editing an instruction overrides the AI-synthesized version, which is useful when you have domain expertise the AI hasn't captured. The **"Synthesize All"** button in the Field Registry triggers the full pipeline \u2014 embedding, resolution, and synthesis \u2014 for all qualifying fields in a single operation.`
+      },
       {
         type: "callout",
         text: 'Click **"Synthesize All"** in the Field Registry to generate instructions for all qualifying fields. This runs the combined pipeline: embed &rarr; resolve &rarr; synthesize.'
@@ -1535,6 +1792,10 @@ var sections4 = [
       {
         question: "How do I generate master instructions?",
         answer: 'Click "Synthesize All" in the Field Registry. This runs the combined pipeline: embed, resolve, and synthesize instructions for all qualifying fields.'
+      },
+      {
+        question: "Can I manually edit a master instruction?",
+        answer: "Yes. You can view and edit master instructions from the field detail page in the registry. Editing overrides the AI-synthesized version, which is useful when you have domain expertise the AI has not captured."
       }
     ],
     mentions: [
@@ -1562,6 +1823,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "For each document type, Talonic generates a schema containing all Tier 1 and Tier 2 fields with occurrences in that type. Generated schemas are versioned \u2014 new versions are created when the registry changes. You can diff any two versions to see what changed."
+      },
+      {
+        type: "paragraph",
+        text: "Behind the scenes, the generation engine scans the **Field Registry** for every field that has been promoted to Tier 1 (core) or Tier 2 (established) within a given document type. It assembles these fields into a schema definition, assigns data types based on observed extraction patterns, and attaches the AI-synthesized **master instruction** for each field. The entire process is automatic \u2014 no manual curation is required."
+      },
+      {
+        type: "paragraph",
+        text: "Generated schemas are most useful as a starting point for understanding what Talonic has discovered about your documents. Review the generated schema for a document type to see which fields the system has identified, then use that knowledge to build a **User Template** containing only the fields you actually need. You can also use the diff view to monitor how your field landscape evolves over time as new documents are processed and new fields are promoted."
+      },
+      {
+        type: "callout",
+        text: "Generated schemas are read-only and cannot be used directly for job execution. To run an extraction job, create a **User Template** and map its fields to the registry."
       }
     ],
     related: [
@@ -1577,6 +1850,10 @@ var sections5 = [
       {
         question: "How are generated schemas updated?",
         answer: "New versions are created automatically when the Field Registry changes (new fields promoted, clusters merged). You can diff any two versions to see what changed."
+      },
+      {
+        question: "Can I run an extraction job using a generated schema?",
+        answer: "No. Generated schemas are read-only references. To run a job, create a User Template, select the fields you need, map them to the registry, and publish a version."
       }
     ],
     mentions: ["generated schemas", "AI-generated", "versioning", "schema diff"]
@@ -1606,6 +1883,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "Most teams start by importing an existing spreadsheet or CSV as a template baseline, then refine field types and add extraction instructions. Once you publish a version, it becomes immutable and available for job execution \u2014 any further changes happen in a new **Workshop** draft, keeping your production schema stable while you iterate."
+      },
+      {
+        type: "paragraph",
+        text: "When adding fields, take advantage of the automatic registry matching system. Fields with names that match existing registry entries are linked instantly, inheriting the AI-synthesized extraction instruction. For fields that do not match, write a clear **manual instruction** describing exactly what the AI should extract from the document. Well-written instructions are the single biggest lever for extraction accuracy."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, keep templates focused on a single document type or closely related group of types. A template with 10-20 well-defined fields will produce higher accuracy than one with 50+ fields spanning unrelated domains. If you need different field sets for different document types, create separate templates and run targeted jobs for each."
+      },
+      {
+        type: "callout",
+        text: "You can import templates from Excel, CSV, or JSON files using the **Import from file** option. Column headers become field names, and data types are inferred automatically. This is the fastest way to bootstrap a template from an existing spreadsheet."
       }
     ],
     related: [
@@ -1621,6 +1910,10 @@ var sections5 = [
       {
         question: "What is the difference between generated schemas and user templates?",
         answer: "Generated schemas are AI-created per document type with all Tier 1/2 fields. User templates are custom-defined output structures where you choose exactly which fields to include and how to map them."
+      },
+      {
+        question: "Can I update a published template?",
+        answer: "Published versions are immutable. To make changes, open the Workshop draft, edit your fields, and publish a new version. The previous version remains available in Version History for reference and diffing."
       }
     ],
     mentions: ["user templates", "schema creation", "field mapping", "reference tables", "publish"]
@@ -1686,6 +1979,14 @@ var sections5 = [
         type: "paragraph",
         text: "When configuring a field, start with the basics \u2014 name, type, and registry mapping \u2014 then layer on advanced features as needed. For example, add a **format constraint** to enforce a date pattern, attach a **reference table** for code lookups, or define **capture submoves** to control the exact extraction sequence. Features compose independently, so you can mix and match without conflicts."
       },
+      {
+        type: "paragraph",
+        text: "The **modifier pipeline** runs in a fixed order during Phase 4 of the extraction pipeline: format transforms first (converting dates or numbers to your target format), then alias mapping (replacing values using a lookup), and finally max_length truncation. Constraint evaluation happens after all modifiers have been applied, so constraints validate the final transformed value, not the raw extraction."
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, use **manual instructions** sparingly and only for fields that the registry cannot match. A well-written instruction should describe the field in plain language, specify where in the document to look, and note any formatting expectations. Avoid vague instructions like "extract the value" \u2014 instead, write something like "Extract the net payment amount from the invoice summary section, excluding VAT."'
+      },
       {
         type: "callout",
         text: "For the complete JSON Schema specification with all features, see the [Full Schema Reference](/docs/platform/schema-features) in the Platform Guide."
@@ -1704,6 +2005,10 @@ var sections5 = [
       {
         question: "Can I override AI extraction instructions with my own?",
         answer: "Yes. Use the Manual instruction feature on a schema field. User-written instructions override the AI-synthesized master instruction from the field registry."
+      },
+      {
+        question: "In what order are modifiers applied to extracted values?",
+        answer: "Modifiers run in a fixed order: format (date/number conversion) first, then alias (value mapping), then max_length (truncation). Constraints are evaluated after all modifiers complete."
       }
     ],
     mentions: [
@@ -1752,6 +2057,22 @@ var sections5 = [
       {
         type: "paragraph",
         text: "When you add a field to a template, the system automatically attempts to match it against the **Field Registry**. Exact name matches are applied instantly, while semantic and composite matches appear as suggestions for your confirmation. If no match is found, the field is marked **Unmapped** and you should provide a manual extraction instruction so the AI knows how to extract that value from your documents."
+      },
+      {
+        type: "paragraph",
+        text: "The matching engine uses a three-band resolution process under the hood. First, it checks for an exact name match against canonical registry field names and their synonyms. If no exact match is found, it computes embedding similarity between your field name and every registry field, surfacing semantic matches above a 0.5 confidence threshold. Matches above 0.8 are auto-accepted; those between 0.5 and 0.8 require your confirmation."
+      },
+      {
+        type: "paragraph",
+        text: "Matched fields inherit the registry's AI-synthesized **master instruction**, which tells the extraction pipeline exactly how to locate and extract that value from documents. This is why matching matters \u2014 a well-matched field leverages all the intelligence the system has built up from processing your document corpus. Unmapped fields rely solely on your manual instruction, so they may need a few correction cycles before reaching the same accuracy."
+      },
+      {
+        type: "paragraph",
+        text: "You can trigger a **Rematch** on all fields at any time from the template editor. This is useful after the registry has grown \u2014 fields that were previously unmapped may now find matches as new extractions contribute to the registry. For best results, use descriptive field names that reflect the actual data (e.g., `contract_start_date` rather than `field_1`)."
+      },
+      {
+        type: "callout",
+        text: "Field matching is read-only against the registry \u2014 it never creates new registry entries. If no match exists, the field stays unmapped until you provide a manual instruction or new documents introduce the field into the registry."
       }
     ],
     related: [
@@ -1767,6 +2088,10 @@ var sections5 = [
       {
         question: "What happens when a field is unmapped?",
         answer: "Unmapped fields have no registry match. They require manual extraction instructions to guide the AI on how to extract the value from documents."
+      },
+      {
+        question: "Can I re-run field matching after adding more documents?",
+        answer: "Yes. Use the Rematch button in the template editor to re-run matching against the current registry. Fields that were previously unmapped may find new matches as your registry grows."
       }
     ],
     mentions: ["field matching", "exact match", "semantic match", "composite", "unmapped"]
@@ -1807,6 +2132,14 @@ var sections5 = [
         type: "paragraph",
         text: "To set up a reference table, upload a CSV or manually enter key-value pairs where the **key** is the code you want in your output and the **value** is the human-readable label found in documents. During extraction, the system tries each tier in order \u2014 most values resolve instantly at Tier 1, so keeping your labels clean and consistent dramatically improves both speed and accuracy."
       },
+      {
+        type: "paragraph",
+        text: "Reference tables are used in two pipeline stages. In **Phase 1**, the lookup cascade runs as part of the resolve step, mapping extracted labels to codes without any AI calls (Tier 1 and Tier 2). In **Phase 3**, the cascade runs again on values produced by Phase 2's AI extraction, normalizing free-text AI output to your canonical codes. This two-pass approach ensures maximum code coverage across the entire pipeline."
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, include common variations and abbreviations as separate value entries all pointing to the same key. For example, if your code is `US`, add values for "United States", "USA", "U.S.A.", and "United States of America". The more variations you cover, the more values resolve at Tier 1 (highest confidence) without falling through to fuzzy or AI matching.'
+      },
       {
         type: "callout",
         text: "Reference table quality directly determines lookup accuracy. A properly loaded table produces 90-100% accurate results within a single run."
@@ -1825,6 +2158,10 @@ var sections5 = [
       {
         question: "How accurate are reference table lookups?",
         answer: "A properly loaded reference table produces 90-100% accurate results within a single run. The cascade provides confidence scores: 0.95 for exact normalization, ~0.70 for fuzzy, and 0.50 for AI fallback."
+      },
+      {
+        question: "How should I format my reference table CSV?",
+        answer: "Use two columns: the first column is the key (output code) and the second is the value (human-readable label). Include common variations and abbreviations as separate rows pointing to the same key for maximum Tier 1 hit rate."
       }
     ],
     mentions: [
@@ -1849,6 +2186,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "Start by editing fields in the **Workshop** draft, then use **Test Extraction** to compare draft results against the live version before publishing. The **Version History** timeline lets you review diff summaries between any two versions, making it easy to trace when a field was added, renamed, or removed and understand the impact on downstream jobs."
+      },
+      {
+        type: "paragraph",
+        text: "The versioning system is append-only \u2014 every time you publish a draft, it creates a new immutable version and the previous version is preserved in the timeline. This means you can always go back and review the exact schema that was used for any historical job. The diff view highlights added fields, removed fields, type changes, and updated instructions, giving you a clear picture of how your schema evolved."
+      },
+      {
+        type: "paragraph",
+        text: "Use the workshop system to iterate safely on your schema without disrupting production jobs. A common workflow is to add a new field in the Workshop, run a **Test Extraction** on a few documents to verify it produces correct values, then publish when satisfied. If a downstream integration depends on a specific field, the breaking change detection will warn you before you accidentally remove or rename it."
+      },
+      {
+        type: "callout",
+        text: "Breaking changes include field removals and type changes. The system surfaces these warnings at publish time so you can assess the impact on active delivery bindings and downstream systems before committing."
       }
     ],
     related: [
@@ -1864,6 +2213,10 @@ var sections5 = [
       {
         question: "What are breaking changes in a schema?",
         answer: "Breaking changes include field removals and type changes. The system detects and warns about these when promoting a draft to live, helping you avoid unintended downstream impacts."
+      },
+      {
+        question: "Can I revert to a previous schema version?",
+        answer: "Version history is append-only, so you cannot revert directly. However, you can review any previous version in the timeline, compare it with the current live version using the diff view, and manually re-add fields or settings that were changed."
       }
     ],
     mentions: ["versioning", "drafts", "workshop", "live version", "breaking changes"]
@@ -1882,6 +2235,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "After running a test, you will see a comparison grid highlighting cells that changed between the draft and live versions. Focus on fields you modified \u2014 new fields, updated instructions, or changed reference tables \u2014 to verify they produce the expected values. This workflow catches regressions before they reach production, so you can iterate on your schema with confidence."
+      },
+      {
+        type: "paragraph",
+        text: "Test extractions run through the same 4-phase pipeline as production jobs, so the results you see are identical to what a full job would produce. The test uses a simplified single-call extraction mode under the hood, which is faster but still applies all schema features including reference table lookups, format constraints, and modifiers. This gives you a reliable preview without the cost of a full pipeline run."
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, select 3-5 representative documents that cover the variety in your corpus \u2014 include at least one "clean" document and one with unusual formatting or missing fields. This gives you confidence that your schema handles both typical and edge-case documents correctly. Run the test after every significant change to a field instruction, reference table, or format constraint.'
+      },
+      {
+        type: "callout",
+        text: "Test extractions do not affect your live data or consume production job credits differently. They are designed for rapid iteration \u2014 run as many tests as you need before publishing."
       }
     ],
     related: [
@@ -1897,6 +2262,10 @@ var sections5 = [
       {
         question: "Do I need to publish a draft before testing it?",
         answer: "No. Test extraction runs against the unpublished draft, comparing its output to the current live version so you can verify changes before publishing."
+      },
+      {
+        question: "How many documents should I use for a test extraction?",
+        answer: "Select 3-5 representative documents that cover the variety in your corpus. Include documents with different layouts, data completeness levels, and edge cases to get a reliable preview of how your schema changes perform."
       }
     ],
     mentions: ["test extraction", "draft comparison", "side-by-side", "preview"]
@@ -1951,6 +2320,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "When working with international data, configure the dialect to match your downstream system requirements. For example, set **number_locale** to `fr-FR` for European comma-decimal formatting, switch the **delimiter** to semicolon for CSV compatibility, and choose **UTF-8-BOM** encoding if your data will be opened in Excel. Creating a shared dialect and reusing it across schemas ensures consistent formatting across all your exports."
+      },
+      {
+        type: "paragraph",
+        text: "Dialect settings are applied during Phase 4 of the extraction pipeline and during CSV/XLSX export. The dialect does not affect how values are stored internally \u2014 it only controls the serialization format when data leaves the platform. This means you can change a dialect at any time without re-running extractions; the new format applies to all future exports and deliveries."
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, create a shared dialect for each downstream system or regional office you deliver to, and name it descriptively (e.g., "SAP Europe" or "US Accounting"). Avoid defining dialects inline on individual schemas unless you have a one-off formatting requirement. Shared dialects reduce maintenance burden and ensure consistency when you add new schemas later.'
+      },
+      {
+        type: "callout",
+        text: "If your CSV files show garbled special characters (accents, umlauts, CJK text), switch the encoding to **UTF-8-BOM**. The BOM (byte order mark) tells Excel to interpret the file as UTF-8 instead of the system default encoding."
       }
     ],
     related: [
@@ -1966,6 +2347,10 @@ var sections5 = [
       {
         question: "Can I share a dialect across multiple schemas?",
         answer: "Yes. A dialect can be shared across schemas or defined inline for a specific schema. Configure them in the Schema > Delivery tab."
+      },
+      {
+        question: "Do I need to re-run extractions when I change a dialect?",
+        answer: "No. Dialects only affect output serialization (exports and deliveries), not how values are stored internally. Changing a dialect takes effect immediately on future exports without re-processing."
       }
     ],
     mentions: [
@@ -2018,6 +2403,14 @@ var sections5 = [
         type: "paragraph",
         text: 'Use bypass strategies for fields whose values are known ahead of time or can be derived without reading the document. For example, set a **constant** of `"USD"` for a currency field that is always the same, or use a **generator** to produce a deterministic ID for each row. Fields with bypass strategies skip the AI extraction phase entirely, reducing processing time and credit usage.'
       },
+      {
+        type: "paragraph",
+        text: "The **reference** bypass strategy is particularly powerful for enrichment fields. Define a `key_expression` that references another field in the schema (e.g., the supplier name), and the system will automatically look up the corresponding code from your reference table without any AI involvement. This is ideal for mapping extracted entity names to internal system identifiers, ERP codes, or classification labels."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, audit your schema for fields that never vary across documents \u2014 these are prime candidates for the **constant** strategy. Fields like currency, data source, or processing batch can be set once and never require AI extraction. This reduces per-document processing cost and improves job completion time, especially on large runs with hundreds of documents."
+      },
       {
         type: "callout",
         text: "When a `generator` strategy fails to produce a value, the field falls through to LLM extraction as a safety net. Strategy values are normalized via generator mappings in Phase 4 of the pipeline."
@@ -2036,6 +2429,10 @@ var sections5 = [
       {
         question: "What happens when a generator bypass fails?",
         answer: "When a generator strategy fails to produce a value, the field falls through to LLM extraction as a safety net, ensuring the cell is still filled."
+      },
+      {
+        question: "Do bypass strategies reduce extraction costs?",
+        answer: "Yes. Fields with bypass strategies skip the AI extraction phase entirely, which reduces both processing time and credit usage. Use constant or reference strategies for fields that do not require document reading."
       }
     ],
     mentions: [
@@ -2081,6 +2478,18 @@ var sections5 = [
       {
         type: "paragraph",
         text: "Define format constraints in the schema field editor. The pattern uses standard regex syntax. The editor provides a live test input so you can verify the pattern before saving."
+      },
+      {
+        type: "paragraph",
+        text: "Format constraints are especially useful for fields with strict formatting requirements in downstream systems. For example, a purchase order number that must follow the pattern `PO-\\d{6}` or a date that must match `\\d{4}-\\d{2}-\\d{2}`. By catching format violations at extraction time, you avoid importing malformed data into your ERP, accounting, or analytics systems."
+      },
+      {
+        type: "paragraph",
+        text: 'Choose the mismatch behavior based on your data quality requirements. Use **empty** (the default) when you prefer no data over bad data \u2014 the downstream system will see a blank cell. Use **flag** when you want to review mismatches manually before deciding \u2014 flagged cells appear with an amber dot in the results grid. Use **constant** when your downstream system needs a specific sentinel value like `"N/A"` or `"INVALID"` to trigger its own error handling.'
+      },
+      {
+        type: "callout",
+        text: "The regex evaluator includes ReDoS protection: nested quantifiers are rejected and input is capped at 1,000 characters. Use the `(?i)` inline flag for case-insensitive matching."
       }
     ],
     related: [
@@ -2096,6 +2505,10 @@ var sections5 = [
       {
         question: "Are original values preserved when format constraints clear a cell?",
         answer: "Yes. Original values are always preserved for audit in the original_extractions table, regardless of the mismatch behavior applied."
+      },
+      {
+        question: "Can I use case-insensitive regex patterns?",
+        answer: "Yes. Use the (?i) inline flag at the start of your pattern for case-insensitive matching. The evaluator supports standard JavaScript regex syntax with inline flags."
       }
     ],
     mentions: [
@@ -2124,6 +2537,18 @@ var sections6 = [
       {
         type: "paragraph",
         text: "Navigate to **Structuring &rarr; Runs &rarr; New**. Select your template and documents, then click Start. Results appear progressively as each phase completes."
+      },
+      {
+        type: "paragraph",
+        text: "When you start a job, the platform runs a pre-flight check to ensure all selected documents have completed their field resolution step. If any document was uploaded recently and has not yet been resolved against the Field Registry, the system automatically resolves it before entering Phase 1. This lazy resolution gate prevents silent data loss where registry-based lookups would return empty results for unresolved documents."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, select documents of the same type or closely related types for a single job. The schema you choose should match the document content \u2014 using an invoice schema on contract documents will produce poor results. Start with a small batch of 5-10 documents to validate your schema, review the output, apply corrections, and then scale up to larger runs once you are confident in the extraction quality."
+      },
+      {
+        type: "callout",
+        text: "Results appear progressively as each pipeline phase completes. You do not need to wait for the entire job to finish \u2014 you can begin reviewing Phase 1 results while Phase 2 is still running."
       }
     ],
     related: [
@@ -2139,6 +2564,10 @@ var sections6 = [
       {
         question: "What does an extraction job produce?",
         answer: "A job produces a structured grid where rows represent documents and columns represent schema fields. Each cell contains an extracted value with confidence and provenance metadata."
+      },
+      {
+        question: "How many documents can I include in a single job?",
+        answer: "Phase 2 supports up to 2,000 documents per job, and Phase 4 supports up to 1,000. For best results, start with smaller batches to validate your schema before scaling up."
       }
     ],
     mentions: ["extraction job", "structured grid", "progressive results", "template selection"]
@@ -2158,11 +2587,23 @@ var sections6 = [
         type: "paragraph",
         text: "Each phase builds on the previous one, progressively filling the output grid. **Phase 1** resolves ~30% of cells instantly using graph matches and lookups. **Phase 2** deploys an AI agent to fill remaining gaps. **Phase 3** runs cross-field validation checks, and **Phase 4** performs targeted re-reads for empty or low-confidence cells. You can monitor fill rate in real time as each phase completes."
       },
+      {
+        type: "paragraph",
+        text: "The pipeline is designed around a key principle: use the cheapest, fastest method first and escalate to AI only when necessary. Phase 1 fills cells using deterministic lookups at zero AI cost. Phase 2 uses AI only for cells that Phase 1 could not resolve. Phase 3 re-runs lookups on Phase 2 output to normalize AI-generated values to canonical codes. Phase 4 performs targeted re-reads with full grid context for the remaining gaps. This cascading approach minimizes both cost and latency."
+      },
+      {
+        type: "paragraph",
+        text: "The grid is flushed to the database after each phase, enabling progressive rendering in the UI. You can watch cells fill in real time and begin reviewing results before the job finishes. The phase timeline on the job detail page shows which phase is currently active, how long each phase took, and the cumulative fill rate at each stage."
+      },
       {
         type: "ui-excerpt",
         id: "job-detail-phase-timeline",
         title: "Job Detail \u2014 Phase Timeline",
         caption: "The phase timeline shows progress through the pipeline. Each dot represents a stage, highlighted when active."
+      },
+      {
+        type: "callout",
+        text: "Phase order is fixed: Phase 1 &rarr; 2 &rarr; 3 &rarr; 4. Phases are never skipped or reordered. This guarantees that high-confidence deterministic values from Phase 1 are always protected by the confidence gate before AI extraction runs."
       }
     ],
     related: [
@@ -2178,6 +2619,10 @@ var sections6 = [
       {
         question: "Can I see results before all phases complete?",
         answer: "Yes. Results are visible as each phase completes. The fill rate increases progressively through the pipeline."
+      },
+      {
+        question: "Why does the pipeline use multiple phases instead of a single AI call?",
+        answer: "The cascading design minimizes cost and latency. Phase 1 fills cells with deterministic lookups at zero AI cost. Only remaining gaps go to the AI agent in Phase 2, and Phase 4 targets specific empty cells with full context. This is significantly cheaper and faster than sending everything to AI."
       }
     ],
     mentions: ["4-phase pipeline", "fill rate", "progressive rendering", "phase timeline"]
@@ -2227,6 +2672,18 @@ var sections6 = [
       {
         type: "paragraph",
         text: "Values are normalized during transfer: dates &rarr; `YYYY/MM/DD`, numbers &rarr; 2 decimal places, strings &rarr; trim + collapse spaces."
+      },
+      {
+        type: "paragraph",
+        text: "Phase 1 is the workhorse of cost efficiency. Because it relies entirely on pre-computed graph matches and deterministic lookups, it fills a large portion of the grid at near-zero cost. The confidence scores assigned during this phase are typically high (0.7-0.95) because they are derived from verified registry matches rather than AI inference. These high-confidence cells are then protected by the confidence gate, meaning later phases cannot overwrite them."
+      },
+      {
+        type: "paragraph",
+        text: "The resolution strategies execute in a fixed order: registry transfer first, then raw extraction mapping, then the 3-tier lookup cascade, and finally deterministic compute (formulas like `Total = Unit Price x Quantity`). Each strategy only attempts to fill cells that are still empty after the previous strategy ran. This ordering ensures that the highest-confidence method always gets priority."
+      },
+      {
+        type: "callout",
+        text: "Phase 1 fill rates improve over time as your Field Registry grows. The more documents you process, the richer the registry becomes, and the more cells Phase 1 can resolve without AI \u2014 reducing both cost and latency for every subsequent job."
       }
     ],
     related: [
@@ -2242,6 +2699,10 @@ var sections6 = [
       {
         question: "What percentage of cells does Phase 1 fill?",
         answer: "Phase 1 typically fills approximately 30% of cells in seconds, using graph matches and lookups without any AI calls."
+      },
+      {
+        question: "Does Phase 1 performance improve over time?",
+        answer: "Yes. As your Field Registry grows from processing more documents, Phase 1 can resolve a higher percentage of cells through graph matches. Mature registries often see Phase 1 fill rates of 60-80%."
       }
     ],
     mentions: [
@@ -2300,6 +2761,14 @@ var sections6 = [
           }
         ]
       },
+      {
+        type: "paragraph",
+        text: "Phase 2 processes documents with grouped extraction calls \u2014 schema fields are divided into batches of up to 10 fields per call to balance extraction quality with throughput. For each document, the agent sends the document text along with the schema field definitions and any already-resolved values from Phase 1 as context. This context-aware approach means the AI can use related values (like a contract start date) to more accurately extract dependent values (like the end date)."
+      },
+      {
+        type: "paragraph",
+        text: "For fields backed by a **reference table**, Phase 2 includes the table's codes and labels directly in the extraction prompt so the AI picks canonical codes rather than free-text labels. This tight integration between reference tables and AI extraction produces cleaner output that requires fewer corrections. Fields with fewer than 50 reference entries get the full table in the prompt; larger tables are handled by the Phase 3 lookup cascade instead."
+      },
       {
         type: "callout",
         variant: "warning",
@@ -2319,6 +2788,10 @@ var sections6 = [
       {
         question: "Can the agent skip a field with manual instructions?",
         answer: "No. Fields with manual instructions always use the extract strategy. Human-written instructions are treated as authoritative and never skipped."
+      },
+      {
+        question: "How many fields does the agent process per AI call?",
+        answer: "Schema fields are grouped into batches of up to 10 fields per extraction call. This balances extraction quality with throughput \u2014 smaller groups help the AI focus on each field without losing recall."
       }
     ],
     mentions: [
@@ -2375,6 +2848,18 @@ var sections6 = [
             description: "Field with >80% registry occurrence rate is empty in this document."
           }
         ]
+      },
+      {
+        type: "paragraph",
+        text: 'Phase 3 also re-runs the lookup cascade (reference table resolution) on values that Phase 2 produced. This is important because AI-extracted values often use natural language labels (e.g., "Frame Agreement") rather than the canonical codes your reference table expects (e.g., `std_master`). The Phase 3 lookup normalizes these labels to codes, improving consistency across your output without requiring manual corrections.'
+      },
+      {
+        type: "paragraph",
+        text: "Validation flags are designed to surface the most impactful issues first. The **low_confidence_outlier** flag is particularly useful \u2014 it highlights cells where the system is uncertain in an otherwise high-confidence row, pointing you to the exact cells most likely to contain errors. For large runs with hundreds of documents, filtering by flags and reviewing those cells first can reduce your review time by 80% or more."
+      },
+      {
+        type: "callout",
+        text: "Validation flags never modify cell values. They are purely informational annotations that help you prioritize review. The actual cell value and confidence score remain unchanged by Phase 3 flagging."
       }
     ],
     related: [
@@ -2390,6 +2875,10 @@ var sections6 = [
       {
         question: "What types of validation flags exist?",
         answer: "Five types: date_sanity (date inconsistencies), amount_mismatch (total discrepancies), lookup_failed (no reference match), low_confidence_outlier (low confidence cells), and unexpected_empty (missing high-frequency fields)."
+      },
+      {
+        question: "Does Phase 3 modify any cell values?",
+        answer: "Phase 3 re-runs the reference table lookup cascade to normalize AI-extracted labels to canonical codes. The validation flags themselves are purely informational and do not modify values."
       }
     ],
     mentions: [
@@ -2415,6 +2904,14 @@ var sections6 = [
         type: "paragraph",
         text: "Because Phase 4 has access to the full grid context \u2014 all values already resolved in earlier phases \u2014 it can use surrounding data as clues. For example, if a contract start date was resolved in Phase 1 but the end date is still empty, Phase 4 re-reads the document knowing the start date, which helps the AI locate the corresponding end date more accurately."
       },
+      {
+        type: "paragraph",
+        text: "Phase 4 also applies deterministic transforms to all cell values: ISO code normalization, date format standardization, and unit conversion. Format constraints (regex patterns defined on schema fields) are evaluated at this stage. If a value fails its format constraint, the configured mismatch behavior kicks in \u2014 the cell is either cleared, flagged with an amber dot, or replaced with a constant. Original values are always preserved in the `original_extractions` table for audit purposes."
+      },
+      {
+        type: "paragraph",
+        text: "Expect Phase 4 to fill 5-15% of remaining empty cells, depending on document complexity and schema coverage. The phase is most effective for fields that require cross-referencing multiple sections of a document or interpreting values in the context of other extracted data. It is less effective for fields that are genuinely absent from the source document \u2014 those will remain empty with an `unresolved` provenance type."
+      },
       {
         type: "callout",
         text: "Phase 4 respects the **confidence gate**: it can only fill empty cells or upgrade cells below the confidence threshold. High-confidence values from Phase 1 are permanently protected."
@@ -2433,6 +2930,10 @@ var sections6 = [
       {
         question: "Can Phase 4 overwrite high-confidence values?",
         answer: "No. Phase 4 respects the confidence gate \u2014 it can only fill empty cells or upgrade cells below the confidence threshold. High-confidence values from earlier phases are permanently protected."
+      },
+      {
+        question: "What else happens in Phase 4 besides gap filling?",
+        answer: "Phase 4 also applies deterministic transforms (ISO codes, dates, units), evaluates format constraints (regex validation), and runs the modifier pipeline (format, alias, max_length). Original values are preserved for audit."
       }
     ],
     mentions: ["Phase 4", "re-read", "gap filling", "confidence gate", "targeted extraction"]
@@ -2457,6 +2958,18 @@ var sections6 = [
       {
         type: "paragraph",
         text: "Start your review by switching to the **Flagged** filter to focus on cells that need attention \u2014 these are values with validation warnings, low confidence, or format mismatches. Click any cell to see its full provenance, including which phase produced it and the reasoning trace. Once you are satisfied, export via **CSV** \u2014 choose the clean export for downstream systems or the full export with metadata for auditing."
+      },
+      {
+        type: "paragraph",
+        text: "The colored dots on each cell are your quickest visual indicator of data quality. Blue dots indicate graph matches from Phase 1 (highest reliability), purple dots indicate computed values, teal dots indicate agent transfers, indigo dots indicate AI extractions, and amber dots indicate lookup results or format flags. A grid dominated by blue and purple dots typically requires minimal review, while one with many indigo and amber dots may need more attention."
+      },
+      {
+        type: "paragraph",
+        text: "For large jobs with hundreds of documents, use a systematic review workflow: first address all **Flagged** rows, then spot-check a random sample of **Clean** rows to build confidence in the overall quality. If you find recurring errors in a specific field, consider updating the schema field's instruction or reference table, then run a new job \u2014 corrections you apply also feed back as training signals for future runs."
+      },
+      {
+        type: "callout",
+        text: "The full CSV export includes metadata columns for each field: confidence score, resolution type, phase number, and reasoning trace. Use this export for audit trails or to analyze extraction performance across your document corpus."
       }
     ],
     related: [
@@ -2472,6 +2985,10 @@ var sections6 = [
       {
         question: "Can I export extraction results?",
         answer: "Yes. Use CSV export from the job detail page. You can export clean data only or full data with metadata including confidence scores and resolution types."
+      },
+      {
+        question: "What is the most efficient way to review a large extraction run?",
+        answer: "Start with the Flagged filter to address cells with validation warnings, low confidence, or format mismatches. Then spot-check a random sample of Clean rows. Focus corrections on recurring field-level patterns rather than individual cells."
       }
     ],
     mentions: [
@@ -2528,6 +3045,14 @@ var sections6 = [
           }
         ]
       },
+      {
+        type: "paragraph",
+        text: "Confidence scores follow predictable patterns by resolution type. Graph matches from Phase 1 typically score 0.7-0.95 because they are derived from verified registry data. Reference table lookups score 0.95 for exact normalization matches, ~0.70 for fuzzy matches, and 0.50 for AI fallback. Agent-derived values from Phase 2 generally score 0.5-0.9 depending on the clarity of the source document and the specificity of the extraction instruction."
+      },
+      {
+        type: "paragraph",
+        text: "Use confidence scores to set your review threshold. Cells above 0.8 are generally reliable and can be trusted without manual verification for most use cases. Cells between 0.5 and 0.8 warrant a quick check. Cells below 0.5 should always be reviewed manually. You can use the full CSV export to filter and sort by confidence, making it easy to batch-review low-confidence cells efficiently."
+      },
       {
         type: "callout",
         variant: "warning",
@@ -2547,6 +3072,10 @@ var sections6 = [
       {
         question: "What is the confidence gate?",
         answer: "The confidence gate prevents any later pipeline phase from overwriting a cell that was filled with confidence >= 0.7. This protects high-quality lookup results from lower-confidence agent extractions."
+      },
+      {
+        question: "What confidence threshold should I use for manual review?",
+        answer: "Cells above 0.8 are generally reliable. Cells between 0.5 and 0.8 warrant a quick check. Cells below 0.5 should always be reviewed manually. Use the CSV export to filter by confidence for efficient batch review."
       }
     ],
     mentions: [
@@ -2571,6 +3100,18 @@ var sections6 = [
       {
         type: "paragraph",
         text: "When correcting a value, consider using **all_similar** propagation if the same mistake appears across multiple documents \u2014 for example, a reference table code that was consistently mapped to the wrong label. This applies your fix to every document in the run that matched the same way, saving you from correcting each cell individually. The system learns from these corrections, so the same error is less likely to recur in future jobs."
+      },
+      {
+        type: "paragraph",
+        text: "Corrections create a full audit trail: the original extracted value, the corrected value, who made the change, and when. This audit log is preserved even after subsequent jobs are run, giving you a complete history of manual interventions. When you export results with the full metadata option, correction history is included so downstream systems can distinguish between AI-extracted and human-corrected values."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, correct the root cause rather than individual symptoms. If a field consistently produces wrong values, update the schema field's **manual instruction** or **reference table** rather than correcting cells one by one. If a reference table code is missing, add it to the table \u2014 future runs will pick it up automatically at Tier 1 confidence (0.95). Corrections are most valuable as a feedback mechanism when they inform schema improvements."
+      },
+      {
+        type: "callout",
+        text: "Corrections with **all_similar** propagation apply instantly across all documents in the run. Use this for systematic errors like wrong reference table mappings, but verify the preview count before confirming \u2014 the system shows how many cells will be affected."
       }
     ],
     related: [
@@ -2586,6 +3127,10 @@ var sections6 = [
       {
         question: "Do corrections improve future extractions?",
         answer: "Yes. Corrections feed back as training signals for future runs, helping the system learn from your corrections and improve accuracy over time."
+      },
+      {
+        question: "Is there an audit trail for corrections?",
+        answer: "Yes. Every correction logs the original value, the corrected value, the user who made the change, and the timestamp. This audit history is preserved and included in full metadata CSV exports."
       }
     ],
     mentions: [
@@ -2639,6 +3184,18 @@ var sections7 = [
       {
         type: "paragraph",
         text: "Most link keys are auto-classified by name patterns. Remaining ambiguous fields are classified by AI. High-frequency entities (>30% of documents) are automatically excluded from case formation."
+      },
+      {
+        type: "paragraph",
+        text: "Behind the scenes, the classification engine applies rule-based heuristics first \u2014 field names like `company_name` or `invoice_number` are recognized instantly. When heuristics are inconclusive, an AI classifier examines the field's extracted values and schema context to determine the correct category. This two-tier approach keeps classification fast for the common case while handling ambiguous fields gracefully."
+      },
+      {
+        type: "paragraph",
+        text: "Use link keys whenever your documents share identifying information that should connect them. For best results, ensure your field names follow clear naming conventions \u2014 this maximizes the hit rate of the automatic classifier and minimizes the need for manual overrides."
+      },
+      {
+        type: "callout",
+        text: "Link key classification runs automatically when new fields appear in the registry. You do not need to trigger it manually \u2014 just upload documents and the system handles the rest."
       }
     ],
     related: [
@@ -2654,6 +3211,10 @@ var sections7 = [
       {
         question: "Why are high-frequency entities excluded from case formation?",
         answer: "Entities appearing in more than 30% of documents are too common to be meaningful connections. They are automatically excluded to prevent overly large, uninformative cases."
+      },
+      {
+        question: "Can I manually classify a field as a link key?",
+        answer: "Yes. Navigate to the Field Registry and change any field's link key category. Manual classifications take precedence over automatic ones and persist across future jobs."
       }
     ],
     mentions: [
@@ -2674,6 +3235,22 @@ var sections7 = [
       {
         type: "paragraph",
         text: 'After extraction, the linking pipeline runs automatically: extracts link key values, normalizes them (lowercasing, stripping suffixes like "Ltd", "Inc"), and builds a bipartite graph of documents &harr; entities.'
+      },
+      {
+        type: "paragraph",
+        text: 'The normalization step is critical for accurate linking. Values like "ACME Corp.", "Acme Corporation", and "acme corp" are all reduced to the same canonical form so they resolve to a single entity node. This prevents duplicate entities from fragmenting your cases and ensures documents that reference the same real-world entity are correctly connected.'
+      },
+      {
+        type: "paragraph",
+        text: "The resulting bipartite graph has two node types: documents and entities. An edge connects a document to an entity whenever the document contains that entity's value in a link key field. Connected components in this graph become the foundation for case formation \u2014 documents that share entities end up in the same case."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, ensure your source documents contain consistent identifiers. The pipeline handles minor variations automatically, but wildly inconsistent naming (e.g., abbreviations vs. full legal names) may require manual link key tuning in the Field Registry."
+      },
+      {
+        type: "callout",
+        text: "Entity linking is incremental \u2014 when new documents arrive, the pipeline extends the existing graph rather than rebuilding it from scratch. Existing cases grow as new connections are discovered."
       }
     ],
     related: [
@@ -2689,6 +3266,10 @@ var sections7 = [
       {
         question: "When does entity linking run?",
         answer: "Entity linking runs automatically after document extraction. It processes link key values and builds connections without manual intervention."
+      },
+      {
+        question: "What normalization does entity linking apply?",
+        answer: "Values are lowercased, common suffixes (Ltd, Inc, Corp, etc.) are stripped, and whitespace is normalized. This ensures minor naming variations resolve to the same entity."
       }
     ],
     mentions: [
@@ -2780,6 +3361,22 @@ var sections7 = [
       {
         type: "paragraph",
         text: "The Document Graph provides a visual D3-force layout of the bipartite graph. Toggle between graph and list views from the Cases page. Case templates are auto-discovered after 3+ cases form \u2014 they identify recurring document type patterns."
+      },
+      {
+        type: "paragraph",
+        text: "In the graph view, document nodes and entity nodes are rendered with distinct visual styles. Edges represent link key connections, and tightly connected clusters naturally pull together through force simulation. Hovering over a node highlights its connections, making it easy to trace how documents relate through shared entities."
+      },
+      {
+        type: "paragraph",
+        text: 'Case templates capture recurring patterns \u2014 for example, "Invoice + Purchase Order + Contract" might emerge as a common template after enough cases form. Templates include a **match threshold** that controls how closely a case must match the expected document type set. Use templates to monitor completeness: if a case is missing a document type that the template expects, an anomaly is raised.'
+      },
+      {
+        type: "paragraph",
+        text: "Most teams use the graph view during initial workspace setup to verify that linking is producing sensible clusters. Once you are confident in your link key configuration, the list view is more practical for day-to-day case review and triage."
+      },
+      {
+        type: "callout",
+        text: "Templates are auto-discovered \u2014 you do not need to define them manually. The system analyzes existing cases and proposes templates when it detects at least 3 cases sharing the same document type pattern."
       }
     ],
     related: [
@@ -2794,6 +3391,10 @@ var sections7 = [
       {
         question: "What are case templates?",
         answer: "Case templates are auto-discovered after 3 or more cases form. They identify recurring document type patterns, helping you understand common document relationships in your workspace."
+      },
+      {
+        question: "Can I switch between graph and list views?",
+        answer: "Yes. Toggle between the visual D3-force graph and a traditional list view from the Cases page. Both views show the same underlying data \u2014 choose whichever suits your workflow."
       }
     ],
     mentions: ["document graph", "D3-force layout", "bipartite graph", "case templates"]
@@ -2843,6 +3444,18 @@ var sections7 = [
       {
         type: "paragraph",
         text: "Anomalies appear in the **Anomalies** tab of the case detail page (Advanced mode). Each anomaly card shows severity, affected fields, and a dismiss button. Dismissed anomalies are hidden by default but visible via the **show dismissed** toggle."
+      },
+      {
+        type: "paragraph",
+        text: "The detection engine runs automatically after case formation and whenever case membership changes (documents added, removed, or cases merged). Each detector operates independently \u2014 a single case can trigger multiple anomaly types simultaneously. Anomaly counts are displayed as badges in the case header for quick triage."
+      },
+      {
+        type: "paragraph",
+        text: "Use anomaly detection to surface data quality issues that would otherwise require manual comparison across documents. For best results, configure case templates so the **Missing Document Type** detector (D4) can flag incomplete cases. Most teams find that D2 (Field Conflict) and D3 (Duplicate Key Divergence) catch the highest-value issues in procurement and financial workflows."
+      },
+      {
+        type: "callout",
+        text: "Anomaly detection requires **Advanced mode** to be enabled. In Simple mode, anomalies are still computed but not displayed in the case detail page."
       }
     ],
     related: [
@@ -2854,6 +3467,14 @@ var sections7 = [
       {
         question: "What anomalies does Talonic detect?",
         answer: "Five structural patterns: validation clusters, field conflicts, duplicate key divergence, missing document types, and value reuse. Each is surfaced as a dismissable card on the case detail page."
+      },
+      {
+        question: "Do anomalies update automatically when cases change?",
+        answer: "Yes. The detection engine re-runs whenever case membership changes \u2014 documents added or removed, cases merged or split. Anomaly badges in the case header update in real time."
+      },
+      {
+        question: "Can I dismiss anomalies?",
+        answer: "Yes. Each anomaly card includes a dismiss button. Dismissed anomalies are hidden by default but can be revealed using the show dismissed toggle on the Anomalies tab."
       }
     ],
     mentions: ["anomaly detection", "validation cluster", "field conflict", "duplicate key divergence", "value reuse"]
@@ -2886,6 +3507,14 @@ var sections7 = [
         type: "paragraph",
         text: "**Domain packs** extend validation with industry-specific rules. The freight domain pack includes DOT number state detection and MC number validation. Additional packs can be added to `domain-packs/` without modifying the core engine."
       },
+      {
+        type: "paragraph",
+        text: "Validation runs automatically after extraction and linking complete. Each field value is checked against every applicable validator \u2014 a single field can trigger multiple rules. Results are displayed as colored badges in the **Evidence** tab: green for pass, red for fail, and amber for warnings. You can filter by status, document, category, or free-text search."
+      },
+      {
+        type: "paragraph",
+        text: "The checksum validator (S7) uses a parameterized factory pattern \u2014 it accepts a checksum algorithm name and applies the corresponding verification logic. Supported algorithms include Luhn (credit card numbers), ABA (bank routing numbers), IBAN (international bank accounts), and ISBN (book identifiers). For best results, ensure your schema fields are typed correctly so the engine knows which checksum to apply."
+      },
       {
         type: "callout",
         text: "Evidence validation results are stored in a separate `evidence_validation_results` table keyed by (document_id, entity_id, field_key) \u2014 not in the extraction or linking tables."
@@ -2904,6 +3533,10 @@ var sections7 = [
       {
         question: "What are domain packs?",
         answer: "Domain packs add industry-specific validation rules. For example, the freight domain pack validates DOT numbers and MC numbers. New packs can be added without modifying the core engine."
+      },
+      {
+        question: "How are evidence validation results displayed?",
+        answer: "Results appear as colored badges in the Evidence tab of the case detail page. Green indicates pass, red indicates fail, and amber indicates a warning. Use the filter bar to narrow results by status, document, or category."
       }
     ],
     mentions: ["evidence validation", "structural validators", "checksum", "Luhn", "IBAN", "domain packs", "freight"]
@@ -2930,6 +3563,18 @@ var sections8 = [
       {
         type: "paragraph",
         text: "Navigate to **Data Products &rarr; Dataset Templates** to manage templates. Each template is linked to a user schema and can be versioned independently. When creating a new job, select a template instead of configuring the output from scratch."
+      },
+      {
+        type: "paragraph",
+        text: "Templates support column mappings that rename, reorder, or exclude fields from the output. Default transforms \u2014 such as date formatting, currency normalization, or unit conversion \u2014 are applied automatically during assembly. This means every data product built from the same template produces structurally identical output regardless of who runs it or when."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, create one template per downstream consumer. If your finance team and operations team need different column subsets from the same schema, define two templates rather than manually reconfiguring each export. Most teams version their templates alongside schema changes to maintain backward compatibility with existing integrations."
+      },
+      {
+        type: "callout",
+        text: "Dataset templates are workspace-scoped. Any team member can create, edit, or use a template \u2014 there is no per-user ownership restriction."
       }
     ],
     related: [
@@ -2945,6 +3590,10 @@ var sections8 = [
       {
         question: "How do dataset templates relate to schemas?",
         answer: "Each dataset template is linked to a user schema and can be versioned independently. When creating a new job, you can select a template instead of configuring output from scratch."
+      },
+      {
+        question: "Can I version dataset templates?",
+        answer: "Yes. Each template is versioned independently from the schema it references. This lets you evolve your output format over time without affecting existing data products built from earlier versions."
       }
     ],
     mentions: [
@@ -2970,6 +3619,14 @@ var sections8 = [
         type: "paragraph",
         text: "Navigate to **Data Products &rarr; Assemblies** to view and create assemblies. Each assembly shows its document count, linked schema, processing status, and the date it was created."
       },
+      {
+        type: "paragraph",
+        text: "When you create an assembly, you select a dataset template and one or more document sources. The system pulls all matching documents, applies the template's column mappings and transforms, and produces a single structured output. The assembly tracks which documents contributed to each row, giving you full traceability from output back to source."
+      },
+      {
+        type: "paragraph",
+        text: "Use assemblies whenever you need a repeatable, auditable output for downstream systems or stakeholders. Most teams create one assembly per reporting period or delivery cycle. Because assemblies reference a template, you can regenerate the same output shape from different document sets without reconfiguring columns or transforms each time."
+      },
       {
         type: "callout",
         text: "Assemblies are the recommended way to produce production datasets. They provide a single audit trail from source documents through extraction, resolution, and validation to the final output."
@@ -2988,6 +3645,10 @@ var sections8 = [
       {
         question: "Why should I use assemblies for production data?",
         answer: "Assemblies provide a single audit trail from source documents through extraction, resolution, and validation to the final output, making them the recommended approach for production datasets."
+      },
+      {
+        question: "Can an assembly pull from multiple sources?",
+        answer: "Yes. An assembly can combine documents from any number of sources \u2014 uploaded files, connected drives, email attachments, and more \u2014 into a single structured dataset."
       }
     ],
     mentions: [
@@ -3033,6 +3694,18 @@ var sections8 = [
       {
         type: "paragraph",
         text: "ID rules are persisted before generating IDs. Navigate to a data product detail page and use **Apply ID Rules** to generate or **Regenerate IDs** to refresh."
+      },
+      {
+        type: "paragraph",
+        text: 'Resolution maps normalize field values before they become part of the ID. For example, a resolution map can collapse "ACME Corp", "ACME Corporation", and "Acme" into a single canonical value "ACME". This prevents duplicate IDs for rows that refer to the same real-world entity under different names.'
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, choose source fields with high uniqueness \u2014 contract numbers or invoice IDs work well, while generic fields like "status" do not. When your documents contain multiple candidate identifiers, configure a fallback chain so the dispenser always has a value to work with. Most teams use the primary reference number as the source field and the document name as the first fallback.'
+      },
+      {
+        type: "callout",
+        text: "ID generation is deterministic \u2014 running **Regenerate IDs** with the same rules and data always produces the same output. This makes ID dispensers safe to re-run without breaking downstream references."
       }
     ],
     related: [
@@ -3044,6 +3717,14 @@ var sections8 = [
       {
         question: "How do ID dispensers handle missing field values?",
         answer: "When the source field is empty, the dispenser tries each field in the fallback chain in order. If all are empty, it generates a prefix-less sequential ID."
+      },
+      {
+        question: "What is a resolution map?",
+        answer: 'A resolution map is a key-value lookup that normalizes field values before ID generation. For example, it can collapse "ACME Corp" and "ACME Corporation" into "ACME" to prevent duplicate IDs for the same entity.'
+      },
+      {
+        question: "Can I regenerate IDs without losing data?",
+        answer: "Yes. Regenerating IDs only updates the ID column \u2014 all other data product values remain unchanged. The operation is deterministic, so the same rules and data always produce the same IDs."
       }
     ],
     mentions: ["ID dispenser", "unique identifiers", "fallback chain", "resolution map"]
@@ -3102,6 +3783,10 @@ var sections8 = [
       {
         question: "Does CSV export preserve leading zeros?",
         answer: "Yes. All CSV exports preserve leading zeros and long numbers \u2014 values are never coerced to numeric types."
+      },
+      {
+        question: "What is auto-resolve singles?",
+        answer: "Auto-resolve singles automatically accepts fields that have only one candidate value, removing them from the manual review queue. Combined with auto-review, this significantly reduces the volume of items requiring human attention."
       }
     ],
     mentions: ["share token", "delivery website", "CSV export", "auto-review", "auto-resolve"]
@@ -3124,6 +3809,22 @@ var sections9 = [
       {
         type: "paragraph",
         text: "Schema-level quality rules run during Phase 3 of every job. Rule types: field format, value range, cross-field consistency, and AI-proposed coherence rules. Rules can be AI-proposed after a job completes, then reviewed and approved before activation."
+      },
+      {
+        type: "paragraph",
+        text: "**Field format** checks verify that values match an expected pattern (e.g., dates in ISO format, phone numbers with country codes). **Value range** checks ensure numeric or date values fall within acceptable bounds. **Cross-field consistency** checks compare two or more fields on the same record \u2014 for example, verifying that a start date precedes an end date."
+      },
+      {
+        type: "paragraph",
+        text: "AI-proposed coherence rules are generated by analyzing patterns in completed job results. The system identifies relationships that hold across most records and proposes them as candidate rules. You review each proposal in the validation settings before it becomes active \u2014 no AI-generated rule runs without explicit approval."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, start with a small set of high-confidence rules and expand over time. Most teams begin with field format checks for critical identifiers (invoice numbers, dates, amounts) and add cross-field consistency rules as they learn their data patterns. Validation failures do not block extraction \u2014 they flag records for review."
+      },
+      {
+        type: "callout",
+        text: "Validation checks are schema-scoped. Rules defined on one schema do not affect other schemas in the same workspace. This lets you tailor quality rules to each document type independently."
       }
     ],
     related: [
@@ -3139,6 +3840,10 @@ var sections9 = [
       {
         question: "Can AI suggest validation rules?",
         answer: "Yes. After a job completes, AI can propose coherence rules based on the data. You review and approve these rules before they are activated."
+      },
+      {
+        question: "Do validation failures block extraction?",
+        answer: "No. Validation checks flag records for review but do not prevent extraction from completing. Failed records appear in the Approval Queue for manual inspection."
       }
     ],
     mentions: [
@@ -3158,6 +3863,22 @@ var sections9 = [
       {
         type: "paragraph",
         text: "Manually-created reference datasets with known-correct values. Create from **Validation &rarr; Golden Samples**. Benchmark runs compare extraction results against golden samples for per-field accuracy scoring with AI judge verdicts."
+      },
+      {
+        type: "paragraph",
+        text: "To create a golden sample, select a document and manually enter the correct value for each field. The system stores these known-correct values as the ground truth baseline. When you run a benchmark, the extraction pipeline processes the same document independently, and the results are compared field by field against your golden sample."
+      },
+      {
+        type: "paragraph",
+        text: 'Benchmark scoring uses an AI judge to evaluate each field comparison. The judge accounts for semantic equivalence \u2014 for example, "United States" and "US" may be scored as a match depending on the field type. Per-field accuracy scores let you identify exactly which fields are underperforming and need schema or instruction tuning.'
+      },
+      {
+        type: "paragraph",
+        text: "For best results, create golden samples from a representative mix of document types and complexity levels. Most teams maintain 5-10 golden samples per schema and re-run benchmarks after schema changes, instruction updates, or model upgrades to track quality trends over time."
+      },
+      {
+        type: "callout",
+        text: "Golden samples are not used during normal extraction \u2014 they exist solely for benchmarking. Changing a golden sample does not affect how documents are processed."
       }
     ],
     related: [
@@ -3173,6 +3894,10 @@ var sections9 = [
       {
         question: "How do benchmark runs work?",
         answer: "Benchmark runs compare extraction results against golden samples, producing per-field accuracy scores with AI judge verdicts to measure extraction quality."
+      },
+      {
+        question: "How many golden samples should I create?",
+        answer: "Most teams maintain 5-10 golden samples per schema, covering a representative mix of document types and complexity levels. Re-run benchmarks after schema changes or model upgrades to track quality trends."
       }
     ],
     mentions: ["golden samples", "ground truth", "benchmark runs", "accuracy scoring", "AI judge"]
@@ -3188,6 +3913,18 @@ var sections9 = [
         type: "paragraph",
         text: "Threshold-based rules for auto-approving or flagging results. Configure per schema with criteria: minimum confidence, validation pass rate, field coverage. Results meeting all thresholds are auto-approved; others go to the manual review queue."
       },
+      {
+        type: "paragraph",
+        text: "Each criterion acts as an independent gate. **Minimum confidence** sets the lowest acceptable extraction confidence score. **Validation pass rate** requires a minimum percentage of validation checks to pass. **Field coverage** ensures that a minimum percentage of schema fields have non-empty values. A result must clear all three gates to be auto-approved."
+      },
+      {
+        type: "paragraph",
+        text: "Start with conservative thresholds \u2014 high confidence, high pass rate, high coverage \u2014 and loosen them as you gain trust in your extraction pipeline. Most teams begin with 90% confidence, 95% validation pass rate, and 80% field coverage, then adjust based on the volume of false positives in the approval queue."
+      },
+      {
+        type: "paragraph",
+        text: "Approval gates integrate directly with the delivery pipeline. When a result passes all gates, a `result.approved` signal is emitted automatically. Bind this signal to a destination to create a fully automated flow from document upload through extraction, validation, approval, and delivery \u2014 no manual steps required for high-confidence results."
+      },
       {
         type: "callout",
         text: "Approval gates feed the delivery pipeline \u2014 bind a `result.approved` signal to a destination to only ship approved rows to your downstream systems."
@@ -3206,6 +3943,10 @@ var sections9 = [
       {
         question: "How do approval gates connect to delivery?",
         answer: "Bind a result.approved signal to a delivery destination to only ship approved rows to your downstream systems. This ensures only quality-checked data is delivered."
+      },
+      {
+        question: "What thresholds should I start with?",
+        answer: "Most teams start with 90% confidence, 95% validation pass rate, and 80% field coverage. Adjust based on the volume of false positives in the approval queue \u2014 loosen thresholds as you gain trust in your pipeline."
       }
     ],
     mentions: [
@@ -3230,6 +3971,22 @@ var sections9 = [
       {
         type: "paragraph",
         text: 'Filter the queue by status (pending, flagged), schema, or confidence range. Click "Review" on any row to inspect the extracted values, provenance trails, and validation check results before approving or rejecting.'
+      },
+      {
+        type: "paragraph",
+        text: "The review detail view shows the extracted values alongside the source document, with provenance trails tracing each value back to its origin in the text. Validation check results are displayed inline \u2014 you can see exactly which rules passed and which failed before making your decision. Batch actions are available for approving or rejecting multiple items at once."
+      },
+      {
+        type: "paragraph",
+        text: "When you approve a result, a `result.approved` signal is emitted to the delivery pipeline. When you reject a result, a `result.rejected` signal fires instead. This event-driven design lets you build automated workflows that respond to review decisions \u2014 for example, routing approved records to a webhook and rejected records to a notification channel."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, review flagged items first \u2014 these are records where at least one validation check failed, making them the most likely to contain errors. Most teams assign a daily review cadence and use confidence range filters to prioritize low-confidence items that need the most attention."
+      },
+      {
+        type: "callout",
+        text: "LLM auto-review is available to accelerate the approval process. When enabled, AI proposes approve or reject decisions for pending items, which you can accept or override with a single click."
       }
     ],
     related: [
@@ -3245,6 +4002,10 @@ var sections9 = [
       {
         question: "How do I review items in the Approval Queue?",
         answer: 'Filter by status (pending, flagged), schema, or confidence range. Click "Review" on any row to inspect extracted values, provenance trails, and validation check results before approving or rejecting.'
+      },
+      {
+        question: "Can I batch approve or reject items?",
+        answer: "Yes. Select multiple items in the queue and use the batch action buttons to approve or reject them all at once. Each item emits the appropriate delivery signal individually."
       }
     ],
     mentions: [
@@ -3309,6 +4070,18 @@ var sections10 = [
       {
         type: "paragraph",
         text: "Every attempt is logged in `delivery_items`. Terminal failures (retry exhausted or permanent 4xx) write a `delivery_dead_letter` row, which is replayable. The outbox, history, DLQ, and catalog are all accessible via the [`/v1/delivery/*` API](/docs)."
+      },
+      {
+        type: "paragraph",
+        text: "The four registries \u2014 signals, deliverables, serializers, and connectors \u2014 are fully orthogonal. Adding a new destination type does not require changes to the signal or serializer code. This composable design means you can mix any supported signal with any compatible serializer and connector without custom integration work."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, start with a webhook destination to verify your binding configuration end-to-end. Once the payload shape and delivery cadence match your expectations, expand to file-based destinations (S3, SFTP) or spreadsheet destinations (Google Sheets). Most teams create separate bindings for different downstream consumers rather than routing all events to a single destination."
+      },
+      {
+        type: "callout",
+        text: "Delivery is at-least-once with deterministic idempotency keys. Receivers should use the `X-Talonic-Idempotency-Key` header (or equivalent metadata for file-based connectors) to deduplicate on their end."
       }
     ],
     related: [
@@ -3324,6 +4097,10 @@ var sections10 = [
       {
         question: "What happens when a delivery fails?",
         answer: "Failed deliveries retry with a backoff ladder. Terminal failures (retry exhausted or permanent 4xx) are written to the dead-letter queue (DLQ), which is fully replayable."
+      },
+      {
+        question: "What serialization formats are supported?",
+        answer: "Ten formats: json, ndjson, csv, csv_file, xlsx, rows, graph, raw, md, and txt. Each serializer declares which deliverable shapes it supports, and the compatibility triangle validates the combination at binding creation time."
       }
     ],
     mentions: [
@@ -3382,6 +4159,22 @@ var sections10 = [
             description: "Slice 2+. Structured data as email attachment."
           }
         ]
+      },
+      {
+        type: "paragraph",
+        text: "Each destination stores its connector type, configuration (URL, bucket, folder path), and optional authentication credentials. Webhook destinations support HMAC-SHA256 signing via a **signing secret** \u2014 every payload includes a signature header so your receiver can verify authenticity. File-based destinations (S3, SFTP, Google Drive) support configurable filename templates with token substitution for binding ID, timestamp, and idempotency key."
+      },
+      {
+        type: "paragraph",
+        text: "A single destination can back multiple bindings. For example, one S3 bucket destination can receive both `document.extracted` and `result.approved` events through separate bindings, each with its own serializer and field map. This keeps your destination inventory small while supporting diverse routing requirements."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, always run a live-ping test after creating a destination. The test exercises the full transport envelope \u2014 SSRF validation, payload cap, and authentication \u2014 with a tiny test payload, so you catch configuration errors before real events start flowing. OAuth-based destinations (Google Drive, Google Sheets) require connecting your account first via the OAuth flow in the dashboard."
+      },
+      {
+        type: "callout",
+        text: "Destinations can be disabled without deleting them. Set **is_active** to false and no bindings will route events to the destination until you re-enable it."
       }
     ],
     related: [
@@ -3397,6 +4190,10 @@ var sections10 = [
       {
         question: "How do I test a destination?",
         answer: "Every destination supports a live-ping test via POST /v1/delivery/destinations/:id/test that exercises the full transport envelope with a tiny test payload."
+      },
+      {
+        question: "Can one destination serve multiple bindings?",
+        answer: "Yes. A single destination can back any number of bindings, each with its own signal filter, serializer, and field map. This lets you route different event types to the same endpoint with different payload shapes."
       }
     ],
     mentions: [
@@ -3423,6 +4220,22 @@ var sections10 = [
       {
         type: "paragraph",
         text: "Optional `field_map` (rename/drop/static rules) lets you reshape the payload without custom code. Optional `delivery_policy` overrides the default retry ladder (6 attempts at `5s, 30s, 2min, 10min, 1h`) and timeout."
+      },
+      {
+        type: "paragraph",
+        text: "The compatibility triangle is enforced on every create and update. The backend checks that your chosen serializer supports the deliverable resolver's output shape, and that the connector accepts the serializer's format. If any predicate fails, the binding is rejected with a descriptive error \u2014 you never end up with a binding that cannot deliver."
+      },
+      {
+        type: "paragraph",
+        text: 'Use `field_map` to tailor the payload for each downstream consumer. **Rename** rules map internal field names to the receiver\'s expected names. **Drop** rules exclude fields the receiver does not need. **Static** rules inject constant values (e.g., a `source: "talonic"` tag) into every payload. These three operations compose in order: drop first, then rename, then static injection.'
+      },
+      {
+        type: "paragraph",
+        text: "For best results, create one binding per downstream consumer per event type. This gives you independent control over payload shape, retry policy, and serialization format for each integration point. Most teams start with a `document.extracted` binding to a webhook and expand to run-level and approval signals as their pipeline matures."
+      },
+      {
+        type: "callout",
+        text: "The binding editor in the dashboard walks you through the compatibility triangle step by step \u2014 only showing serializers and deliverables that are compatible with your chosen signal and destination."
       }
     ],
     related: [
@@ -3438,6 +4251,10 @@ var sections10 = [
       {
         question: "Can I customize the delivery payload?",
         answer: "Yes. Use field_map to rename, drop, or add static fields without custom code. Use delivery_policy to override the default retry ladder and timeout."
+      },
+      {
+        question: "What is the compatibility triangle?",
+        answer: "The compatibility triangle validates that the signal, deliverable resolver, serializer, and connector all form a compatible combination. The backend enforces this on every binding create and update to prevent misconfigured delivery routes."
       }
     ],
     mentions: [
@@ -3520,6 +4337,22 @@ var sections10 = [
             description: "Fired after a terminal delivery failure."
           }
         ]
+      },
+      {
+        type: "paragraph",
+        text: "Signals are typed events emitted by the platform when meaningful state changes occur. Document-level signals fire on extraction success or failure. Run-level signals fire when a job completes across dataspace, structuring, resolution, or extraction runs. Result-level signals fire when a reviewer approves, rejects, or flags a record."
+      },
+      {
+        type: "paragraph",
+        text: "The two `delivery.item.*` entries are **meta-signals** \u2014 they fire when a delivery itself succeeds or fails. Use them for self-monitoring: bind `delivery.item.failed` to a notification webhook to receive alerts when deliveries break. The poller includes built-in loop prevention so a failed meta-signal delivery does not emit another meta-signal."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, use the catalog API to populate dropdown menus and configuration forms rather than hardcoding signal or deliverable lists. The catalog always reflects the running registry contents, so new signal types and deliverables appear automatically as the platform evolves."
+      },
+      {
+        type: "callout",
+        text: "The catalog API exposes four endpoints: `/v1/delivery/catalog/signals`, `/v1/delivery/catalog/deliverables`, `/v1/delivery/catalog/serializers`, and `/v1/delivery/catalog/connectors`. Each returns the full registry for that category."
       }
     ],
     related: [
@@ -3535,6 +4368,10 @@ var sections10 = [
       {
         question: "How do I discover available signals and deliverables?",
         answer: "Use the catalog API at /v1/delivery/catalog/* which exposes the four registries (signals, deliverables, serializers, connectors) that drive the binding picker."
+      },
+      {
+        question: "What are meta-signals?",
+        answer: "Meta-signals (delivery.item.completed and delivery.item.failed) fire when a delivery attempt itself succeeds or fails. Use them for self-monitoring \u2014 for example, binding delivery.item.failed to a notification webhook for delivery failure alerts."
       }
     ],
     mentions: [
@@ -3555,6 +4392,22 @@ var sections10 = [
       {
         type: "paragraph",
         text: "Every delivery attempt writes a row to `/v1/delivery/items` with its status, HTTP code, error code, and request/response bodies. Terminal failures (retry ladder exhausted or permanent 4xx) escalate to `/v1/delivery/dlq`. Both are fully replayable \u2014 replay enqueues a new attempt with a fresh idempotency key. Nothing in history is ever mutated; the log is strictly append-only."
+      },
+      {
+        type: "paragraph",
+        text: "The delivery items log captures the full lifecycle of each attempt: in-flight, succeeded, or failed. Each row includes the attempt number, duration in milliseconds, and truncated request/response bodies (up to 10 KB each). Use the items endpoint with filters for `binding_id`, `destination_id`, or `status` to narrow results when debugging a specific integration."
+      },
+      {
+        type: "paragraph",
+        text: "The dead letter queue (DLQ) is your safety net for terminal failures. When the retry ladder is exhausted or the destination returns a permanent error (e.g., 401 Unauthorized, 403 Forbidden), the failed delivery moves to the DLQ. From there you can inspect the error, fix the destination configuration, and replay the delivery with a single click or API call."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, monitor the DLQ regularly and set up a `delivery.item.failed` meta-signal binding to receive alerts when deliveries fail terminally. Most teams configure a notification webhook for this signal so they are notified immediately rather than discovering failures during a manual review. Request and response bodies older than the configured retention period are automatically cleaned up, but row metadata (status, error code, duration) is retained indefinitely."
+      },
+      {
+        type: "callout",
+        text: "Replay is safe to run multiple times. The idempotency key is deterministic \u2014 receivers that deduplicate on the key will not process the same delivery twice, even after multiple replays."
       }
     ],
     related: [
@@ -3570,6 +4423,10 @@ var sections10 = [
       {
         question: "What is the dead letter queue (DLQ)?",
         answer: "Terminal failures (retry ladder exhausted or permanent 4xx) escalate to /v1/delivery/dlq. DLQ entries are fully replayable \u2014 replay enqueues a fresh attempt with a new idempotency key."
+      },
+      {
+        question: "How long are request and response bodies retained?",
+        answer: "Request and response bodies are cleaned up after the configured retention period (default 30 days). Row metadata \u2014 status, HTTP code, error code, and duration \u2014 is retained indefinitely for audit purposes."
       }
     ],
     mentions: [
@@ -3604,6 +4461,10 @@ var sections11 = [
         type: "paragraph",
         text: "Dialects ensure consistency across all your structured output. When your downstream systems expect dates in `YYYY-MM-DD` format, numbers with `.` as the decimal separator, and CSVs delimited by `;`, you configure this once in the shared dialect rather than repeating it in every schema."
       },
+      {
+        type: "paragraph",
+        text: "Most teams configure their shared dialect during initial workspace setup and rarely change it afterward. If your organization operates across regions with different formatting conventions, create separate workspaces with region-specific dialects rather than overriding at the schema level. This keeps the configuration clean and avoids inconsistencies in delivered data."
+      },
       {
         type: "list",
         ordered: false,
@@ -3674,6 +4535,10 @@ var sections11 = [
         type: "paragraph",
         text: "The lookup convention follows a `key` / `value` structure where the `key` is the output code and the `value` is the human-readable label. During extraction, the platform maps FROM labels found in documents TO the canonical codes defined in the reference primitive. This ensures consistent, machine-readable output regardless of how values appear in source documents."
       },
+      {
+        type: "paragraph",
+        text: "For best results, keep reference primitives focused on a single domain \u2014 for example, one primitive for country codes, another for currency codes, and another for product categories. This makes each primitive reusable across multiple schemas and simplifies maintenance. When updating a primitive, test the new version against a few sample documents before updating the version reference in production schemas."
+      },
       {
         type: "callout",
         variant: "info",
@@ -3741,6 +4606,10 @@ var sections11 = [
         type: "paragraph",
         text: "Change review is particularly important for workspaces that feed downstream systems through delivery bindings. A small change to a schema field mapping or a reference primitive value can ripple through to every document processed after that point. The review process creates a checkpoint where a second pair of eyes can verify the change before it goes live."
       },
+      {
+        type: "paragraph",
+        text: "Most teams enable change review as soon as their workspace transitions from development to production. During the initial setup phase, you can leave it disabled for faster iteration. Once your schemas, dialects, and reference primitives are stable and data is flowing to downstream systems, enable change review to protect against accidental modifications that could disrupt live pipelines."
+      },
       {
         type: "list",
         ordered: false,
@@ -3807,6 +4676,14 @@ var sections12 = [
         type: "paragraph",
         text: "Omnisearch is designed to be the single entry point for finding anything in the platform. Rather than navigating to specific pages to search within them, Omnisearch queries a **materialized values index** that aggregates data across all your content. Results are grouped by category so you can quickly distinguish between a document match and a field name match."
       },
+      {
+        type: "paragraph",
+        text: "The materialized values index is rebuilt automatically whenever documents are processed or schemas change, so search results are always current. There is no manual reindex step \u2014 new documents become searchable as soon as extraction completes. This makes Omnisearch reliable even during high-volume ingestion periods."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, use Omnisearch as your primary navigation tool. Instead of browsing through document lists or clicking through the sidebar, press `Cmd+K` and type what you are looking for \u2014 whether it is a specific invoice number, a field name, or a schema title. Most users find that Omnisearch is faster than manual navigation for any task beyond browsing the most recent documents."
+      },
       {
         type: "callout",
         variant: "info",
@@ -3883,6 +4760,10 @@ var sections12 = [
       {
         type: "paragraph",
         text: "Filter state is encoded in the URL query string using dynamic SQL generation on the backend. This means you can bookmark filtered views, share them with teammates via a link, or save them as **presets** for one-click access to commonly used queries."
+      },
+      {
+        type: "paragraph",
+        text: 'For best results, save your most common filter combinations as presets. Most teams create presets for categories like "high-value invoices this quarter," "documents missing key fields," or "recently failed extractions." Presets appear as one-click buttons on the Documents page, eliminating the need to rebuild complex filter conditions from scratch each time.'
       }
     ],
     related: [
@@ -3937,6 +4818,19 @@ var sections13 = [
         type: "paragraph",
         text: "Manage API keys from **Settings &rarr; API Keys**. Keys are prefixed with `tlnc_` and passed via `Authorization: Bearer`. Keys are SHA-256 hashed \u2014 the full key is only shown once at creation."
       },
+      {
+        type: "paragraph",
+        text: "Each API key is assigned one or more scopes that control what operations it can perform. Scopes follow the principle of least privilege \u2014 create a key with only the scopes your integration needs. For example, a read-only dashboard integration only needs the `read` scope, while an automated ingestion pipeline needs `extract` and `read`."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, create separate API keys for each integration or service that connects to your Talonic workspace. This makes it easy to rotate or revoke a single key without disrupting other integrations. Most teams maintain one key for their ingestion pipeline, one for their BI dashboard, and one for webhook-based automations."
+      },
+      {
+        type: "callout",
+        variant: "warning",
+        text: "Copy the full API key immediately after creation \u2014 it is only displayed once. If you lose the key, you must delete it and create a new one. Existing integrations using the old key will stop working until updated."
+      },
       {
         type: "param-table",
         title: "API key scopes",
@@ -3972,6 +4866,10 @@ var sections13 = [
       {
         question: "What scopes are available for API keys?",
         answer: "Three scopes: extract (use extraction API), read (read documents, extractions, schemas, jobs), and write (create and modify resources)."
+      },
+      {
+        question: "Can I have multiple API keys?",
+        answer: "Yes. You can create as many API keys as needed. Best practice is to create separate keys for each integration so you can rotate or revoke them independently without disrupting other services."
       }
     ],
     mentions: ["API keys", "tlnc_", "SHA-256", "Bearer token", "scopes"]
@@ -3983,6 +4881,27 @@ var sections13 = [
     seoTitle: "Public REST API Overview \u2014 Talonic Docs",
     description: "Full REST API with 20+ namespaces: extract, documents, extractions, schemas, jobs, sources, delivery, linking, matching, batches, cases, quality, and more. Cursor pagination.",
     content: [
+      {
+        type: "paragraph",
+        text: "Talonic exposes a comprehensive REST API with 20+ namespaces covering every aspect of the platform \u2014 from document extraction and schema management to delivery, matching, and quality benchmarking. All endpoints use JSON request and response bodies with cursor-based pagination for list operations."
+      },
+      {
+        type: "paragraph",
+        text: "The API follows standard REST conventions. Authenticate with a `tlnc_` API key via the `Authorization: Bearer` header. Most resources support full CRUD operations, and long-running tasks like matching runs and batch inference are handled asynchronously with polling endpoints for status and progress."
+      },
+      {
+        type: "paragraph",
+        text: "Use the public API to build automated ingestion pipelines, integrate extraction results into downstream systems, or orchestrate complex workflows that combine multiple platform features. The API mirrors every action available in the web interface, so anything you can do manually can be fully automated."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, start with the `/v1/extract` endpoint for document ingestion, then use `/v1/documents` and `/v1/extractions` to retrieve results. As your integration matures, explore delivery bindings, matching configurations, and batch processing to build a fully automated data pipeline."
+      },
+      {
+        type: "callout",
+        variant: "info",
+        text: "See the full [API Documentation](/docs) for detailed endpoint specifications, request/response examples, and authentication guides. The API reference is organized by namespace and includes every parameter, status code, and error response."
+      },
       {
         type: "param-table",
         title: "API namespaces",
@@ -4103,6 +5022,10 @@ var sections13 = [
       {
         question: "Where can I find detailed API documentation?",
         answer: "See the full API Documentation at /docs for complete endpoint documentation with request/response examples, parameter descriptions, and authentication details."
+      },
+      {
+        question: "How does pagination work in the API?",
+        answer: "List endpoints use cursor-based pagination. Each response includes a cursor token that you pass as a query parameter to fetch the next page. This approach is more reliable than offset-based pagination when documents are being added or removed concurrently."
       }
     ],
     mentions: [
@@ -4128,6 +5051,14 @@ var sections13 = [
         type: "paragraph",
         text: "The webhook connector is configured as a **delivery destination**. Bind any of the signal types below to a webhook destination to receive real-time notifications. See `/v1/delivery/catalog/signals` for the exhaustive list."
       },
+      {
+        type: "paragraph",
+        text: "When a webhook fires, the platform constructs the payload from the signal data, signs it with your destination's HMAC-SHA256 signing secret, and delivers it via HTTPS POST. Each delivery includes an idempotency key in the headers so your receiver can safely deduplicate retries. Failed deliveries follow an exponential backoff schedule, and terminal failures are routed to the dead-letter queue for manual replay."
+      },
+      {
+        type: "paragraph",
+        text: "Use webhooks when your downstream system needs to react immediately to platform events \u2014 for example, triggering an ERP import when a document is extracted, or notifying a Slack channel when a reviewer rejects a record. For bulk or periodic data transfers, consider using the SFTP, S3, or cloud storage delivery connectors instead."
+      },
       {
         type: "param-table",
         title: "Delivery signal types (webhook-compatible)",
@@ -4203,6 +5134,10 @@ var sections13 = [
       {
         question: "What happens when a webhook delivery fails?",
         answer: "Failed webhook deliveries retry with exponential backoff. Terminal failures (retry exhausted or permanent 4xx) escalate to the dead-letter queue for manual replay."
+      },
+      {
+        question: "How do I verify webhook signatures?",
+        answer: "Each webhook payload is signed with HMAC-SHA256 using the signing secret from your delivery destination configuration. Compute the HMAC of the raw request body and compare it to the signature header to verify authenticity. This ensures the payload was sent by Talonic and was not tampered with in transit."
       }
     ],
     mentions: [
@@ -4262,6 +5197,10 @@ var sections14 = [
         type: "paragraph",
         text: "New members are added via domain matching: company email domains auto-match to your org with **pending** status requiring admin approval. Manage from the Team page."
       },
+      {
+        type: "paragraph",
+        text: "When a team member is removed, their access is revoked immediately but their past actions \u2014 edits, uploads, approvals, and review decisions \u2014 remain in the audit trail. This preserves data integrity and compliance history. Removed users can be re-added later through the same domain matching process if needed."
+      },
       {
         type: "callout",
         variant: "info",
@@ -4329,6 +5268,14 @@ var sections14 = [
         type: "paragraph",
         text: "Understanding your usage patterns helps optimize costs. For example, if extraction dominates your spend, consider using **batch mode** for non-urgent documents to cut that cost in half. The daily cost chart makes it easy to spot usage spikes and correlate them with specific ingestion events."
       },
+      {
+        type: "paragraph",
+        text: "Behind the scenes, every LLM and OCR call is logged with full detail \u2014 the model used, input and output token counts, latency, and computed cost. This data powers both the per-feature breakdown and the individual call log. The system tracks costs across extraction, OCR, batch inference, matching AI resolution, and quality passes so you always know where your spend is going."
+      },
+      {
+        type: "paragraph",
+        text: "Most teams review the daily cost chart weekly to establish a usage baseline. Unexpected spikes usually correlate with large document uploads or batch completions. For organizations managing multiple workspaces, the **Master view** provides a single pane of glass showing per-customer breakdowns and platform-wide aggregates \u2014 accessible only to platform administrators."
+      },
       {
         type: "param-table",
         title: "Usage views",
@@ -4404,6 +5351,14 @@ var sections14 = [
         type: "paragraph",
         text: "The Admin Panel is the central hub for platform-wide operations. **Customer management** lets you create, view, and delete organizations. **User management** provides a cross-tenant view of all platform users with the ability to remove accounts. The **data clear & rebuild** function wipes all data for a specific customer and reprocesses from scratch \u2014 useful during onboarding or after significant schema changes."
       },
+      {
+        type: "paragraph",
+        text: "The Admin Panel operates across tenant boundaries, giving administrators visibility into all organizations on the platform. The **usage statistics** view aggregates cost and volume data across all customers, making it straightforward to identify high-usage tenants, track platform growth, and forecast infrastructure needs."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, limit Admin Panel access to a small group of trusted platform operators. Use the **master registry** view to audit field definitions and schemas across tenants \u2014 this is particularly useful when standardizing extraction configurations or troubleshooting cross-tenant data quality issues."
+      },
       {
         type: "list",
         ordered: false,
@@ -4463,6 +5418,18 @@ var sections14 = [
         type: "paragraph",
         text: "Talonic provides global keyboard shortcuts that work from any page in the platform. These shortcuts let you access common actions without leaving your current context, significantly speeding up daily workflows."
       },
+      {
+        type: "paragraph",
+        text: "Shortcuts are registered at the application level, meaning they respond regardless of which page or panel is currently active. The platform intercepts the key combination before it reaches the browser, so these shortcuts take priority over default browser bindings when the Talonic window is focused."
+      },
+      {
+        type: "paragraph",
+        text: "The most frequently used shortcut is **Omnisearch** (`Cmd+K` / `Ctrl+K`), which opens a global search overlay that queries documents, extracted values, field names, schemas, and sources simultaneously. Power users rely on it to navigate the platform faster than clicking through the sidebar."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, build muscle memory around the three core shortcuts. Use `Cmd+K` to find anything, `Cmd+J` to upload a document on the fly, and `Escape` to dismiss any overlay or modal. These three actions cover the most common interruptions during a review or configuration session."
+      },
       {
         type: "param-table",
         title: "Shortcuts",
@@ -4533,6 +5500,14 @@ var sections15 = [
         type: "paragraph",
         text: "Under the hood, batch inference leverages the provider's native batch API (Anthropic Message Batches or AWS Bedrock invocation jobs). Documents accumulate in a queue and are submitted together, allowing the provider to schedule processing during off-peak capacity. This is why the cost reduction is possible without any loss in extraction quality."
       },
+      {
+        type: "paragraph",
+        text: "Batch mode is best suited for backlog ingestion, periodic bulk uploads, and any scenario where results are not needed in real time. Most teams use batch mode for overnight processing of large document volumes and reserve real-time processing for time-sensitive documents that need immediate attention."
+      },
+      {
+        type: "paragraph",
+        text: "When batch results arrive, they pass through the same post-processing pipeline as real-time extractions \u2014 including markdown pre-processing, field parsing, quality metrics, and extraction metadata computation. The only difference is that LLM-based quality passes (field estimation, verification, cross-reference enrichment) are skipped in batch mode to preserve the cost savings."
+      },
       {
         type: "list",
         ordered: false,
@@ -4609,6 +5584,10 @@ var sections15 = [
       {
         type: "paragraph",
         text: "While waiting for batch results, documents show a status of `batch_queued`. Once the provider returns results, the platform applies them through the same post-processing pipeline as real-time extraction \u2014 including markdown pre-processing, field parsing, quality metrics, and extraction metadata computation."
+      },
+      {
+        type: "paragraph",
+        text: "You can also enable batch mode on a per-source basis. When a source connection has the batch processing toggle enabled, all documents ingested through that source are automatically routed to the batch queue. This is ideal for source connections that handle non-urgent, high-volume ingestion \u2014 such as a shared drive that collects documents overnight."
       }
     ],
     related: [
@@ -4658,6 +5637,14 @@ var sections15 = [
         type: "paragraph",
         text: "Batches are submitted automatically when the accumulation timer fires (every 15 minutes by default) or when the item count threshold is reached. Once submitted, the platform polls the provider hourly to check for completion. When results arrive, they are applied to the corresponding documents and the batch transitions to **completed** status."
       },
+      {
+        type: "paragraph",
+        text: "The batch detail view shows individual items within a batch, including which documents are included, their current processing state, and any errors that occurred. Use this view to verify that a specific document was included in the expected batch and to troubleshoot items that failed to parse."
+      },
+      {
+        type: "paragraph",
+        text: "The platform includes built-in crash recovery for batch processing. If the application restarts while a batch is in a transient `processing` state, the recovery logic automatically reverts it to `submitted` so the next polling cycle can retry. This means batch jobs are resilient to infrastructure disruptions without requiring manual intervention."
+      },
       {
         type: "param-table",
         title: "Batch statuses",
@@ -4737,6 +5724,14 @@ var sections16 = [
         type: "paragraph",
         text: 'Reference data is the foundation of the matching system. It represents your "ground truth" \u2014 the known records you want to match extracted document data against. Common examples include customer lists, product catalogs, vendor registries, and contract databases.'
       },
+      {
+        type: "paragraph",
+        text: "When you upload a reference dataset, the platform indexes all columns and rows for fast lookup during matching runs. Each dataset is versioned independently, so you can update your reference data without affecting in-progress matching configurations. A single dataset can be shared across multiple schemas and matching configurations."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, ensure your reference data is clean and deduplicated before uploading. Include all columns that you plan to match against \u2014 such as names, identifiers, dates, and amounts. Most teams refresh their reference data periodically by re-uploading from their source system or by using the SQL import option to pull directly from a connected database."
+      },
       {
         type: "callout",
         variant: "info",
@@ -4830,6 +5825,10 @@ var sections16 = [
         type: "paragraph",
         text: "Each field comparison carries a **weight** that determines how much it contributes to the overall confidence score. Set high weights on fields that are strong identifiers (like reference numbers or unique IDs) and lower weights on fields that are common or prone to variation (like names or descriptions). The weighted aggregate produces a final score between 0% and 100%."
       },
+      {
+        type: "paragraph",
+        text: "Most teams start with AI strategy generation and then fine-tune weights based on initial results. A common pattern is to set a high weight on a unique identifier field (like a PO number) with `exact` strategy, combined with lower-weighted `fuzzy` matches on name and description fields as supporting evidence. Review the first batch of results to calibrate thresholds before running at scale."
+      },
       {
         type: "callout",
         variant: "info",
@@ -4884,6 +5883,14 @@ var sections16 = [
         type: "paragraph",
         text: "There are two types of runs: **manual runs** use only the deterministic matching strategies (exact, fuzzy, date_range, numeric_range) and complete quickly. **Smart runs** add an AI resolution pass \u2014 after the initial matching, an embedding-based search with a Haiku LLM resolver attempts to improve low-confidence results."
       },
+      {
+        type: "paragraph",
+        text: "Matching runs are processed asynchronously via a dedicated job queue, so they do not block your workflow. You can continue working in the platform while a run executes in the background. The matching page shows real-time progress with the number of documents processed and estimated time remaining."
+      },
+      {
+        type: "paragraph",
+        text: "For best results, start with a manual run to establish a baseline, then use a smart run if many documents have low-confidence matches. Smart runs take longer because the AI resolver evaluates each ambiguous candidate, but they can significantly improve match quality for data with inconsistent formatting, abbreviations, or multilingual content."
+      },
       {
         type: "list",
         ordered: true,
@@ -4941,6 +5948,14 @@ var sections16 = [
         type: "paragraph",
         text: "The evidence view is designed to make match decisions transparent. For each candidate, you can see exactly which fields matched, what strategy was used, the individual field score, and the actual values that were compared. This makes it straightforward to verify correct matches and investigate false positives."
       },
+      {
+        type: "paragraph",
+        text: "Approved matches flow downstream into delivery pipelines, where they can be included in structured exports alongside extraction data. Rejected matches are excluded from future consideration for that document, which helps the system learn from your decisions when running subsequent matching passes."
+      },
+      {
+        type: "paragraph",
+        text: "When reviewing results, focus on documents where the top candidate has a confidence score between 50% and 85% \u2014 these are the borderline cases that benefit most from human judgment. High-confidence matches (above 85%) are usually correct, while very low scores (below 30%) typically indicate no valid match exists in the reference data."
+      },
       {
         type: "param-table",
         title: "Result fields",
@@ -5013,7 +6028,11 @@ var sections17 = [
     description: "Overview of the Talonic API for extracting structured, schema-validated data from any document with a single API call using HTTPS and JSON.",
     content: [
       { type: "paragraph", text: "Extract any document into schema-validated data with a single API call." },
-      { type: "paragraph", text: "**Base URL:** `https://api.talonic.com` | **Protocol:** HTTPS + JSON | **Auth:** `Bearer tlnc_...`" }
+      { type: "paragraph", text: "**Base URL:** `https://api.talonic.com` | **Protocol:** HTTPS + JSON | **Auth:** `Bearer tlnc_...`" },
+      { type: "paragraph", text: "Most integrations start with `POST /v1/extract` to submit a document and receive structured fields back. A typical workflow is: create an API key, upload a file with an optional schema, and consume the JSON response with per-field confidence scores and cost headers." },
+      { type: "paragraph", text: "The API supports three extraction modes: **auto-detect** (no schema, discovers all fields), **schema-driven** (returns exactly the fields you define), and **query** (filter previously extracted data without re-processing). Every response includes a `request_id` for tracing and support." },
+      { type: "paragraph", text: "Pair the extract endpoint with `GET /v1/documents` and `GET /v1/extractions` to manage your document library and retrieve results later. Webhook callbacks via `extraction.complete` events eliminate the need for polling on async extractions." },
+      { type: "callout", text: "All API keys use the `tlnc_` prefix. Create and rotate keys from **Settings \u2192 API Keys** in the dashboard. Keys carry scopes (`extract`, `read`, `write`, `billing`) that control endpoint access." }
     ],
     related: [
       { label: "Authentication", slug: "authentication" },
@@ -5069,7 +6088,11 @@ var sections17 = [
     description: "The base URL for all Talonic API endpoints. All requests must use HTTPS and are relative to the v1 base path.",
     content: [
       { type: "paragraph", text: "All endpoints are relative to the base URL below. All requests must use HTTPS." },
-      { type: "code", language: "bash", code: "https://api.talonic.com/v1" }
+      { type: "code", language: "bash", code: "https://api.talonic.com/v1" },
+      { type: "paragraph", text: "Most integrations set this as a constant in their HTTP client configuration. A typical request URL looks like `https://api.talonic.com/v1/extract` or `https://api.talonic.com/v1/documents`. All paths in this reference are relative to the `/v1` prefix." },
+      { type: "paragraph", text: "The API uses standard JSON request and response bodies with `Content-Type: application/json`, except for file uploads which use `multipart/form-data`. Responses include standard HTTP status codes and rate limit headers on every call." },
+      { type: "paragraph", text: "There is no versioning in the URL beyond `/v1`. Breaking changes will be communicated in advance and introduced under a new version prefix. Non-breaking additions (new fields, new endpoints) are shipped continuously." },
+      { type: "callout", text: "Plain HTTP requests are rejected. Always use `https://` in your base URL configuration to ensure encrypted transport." }
     ],
     related: [
       { label: "Authentication", slug: "authentication" }
@@ -5189,6 +6212,9 @@ X-Talonic-Cells-Resolved-AI: 5` },
     description: "All list endpoints use cursor-based pagination with cursor, limit, and order parameters. Responses include next_cursor and has_more for iteration.",
     content: [
       { type: "paragraph", text: "All list endpoints use cursor-based pagination. Pass a `cursor` token from the previous response to fetch the next page." },
+      { type: "paragraph", text: "Most integrations call list endpoints after bulk ingestion to iterate through results. A typical workflow is to fetch the first page with a `limit`, then loop using `pagination.next_cursor` until `has_more` is `false`." },
+      { type: "paragraph", text: "The response always includes a `pagination` object with `total`, `limit`, `has_more`, and `next_cursor`. The `total` field reflects the full count of matching items, not just the current page. Use `order` to control sort direction by `created_at`." },
+      { type: "paragraph", text: "Pair pagination with query filters (e.g. `status`, `after`, `before`, `search`) on endpoints like `GET /v1/documents` and `GET /v1/extractions` to narrow results before paginating. Note that cursors are opaque and short-lived \u2014 do not persist or parse them." },
       {
         type: "param-table",
         title: "Request parameters",
@@ -5276,6 +6302,9 @@ print(f"Fetched {len(all_documents)} documents")`
     description: "Use the Idempotency-Key header to safely retry POST requests without creating duplicate extractions. Keys are valid for 24 hours.",
     content: [
       { type: "paragraph", text: "Pass an `Idempotency-Key` header on POST requests to safely retry without creating duplicate work. If a request with the same key has already been processed, the API returns the cached response." },
+      { type: "paragraph", text: "Most integrations use idempotency keys when calling `POST /v1/extract` to guard against network timeouts or duplicate submissions. A typical workflow is to generate a UUID per logical operation, attach it as the `Idempotency-Key` header, and retry the same request on failure without risk of double-processing." },
+      { type: "paragraph", text: "The cached response is stored for **24 hours** and is scoped to your API key. A duplicate request within that window returns the original response body and HTTP status immediately, with no additional credit cost. After 24 hours the key expires and can be reused for a new request." },
+      { type: "paragraph", text: "Pair idempotency with webhook callbacks (`webhook_url` option) for robust async workflows. Note that reusing a key with different request parameters will still return the first request's cached result \u2014 always generate a fresh key for each distinct operation." },
       {
         type: "param-table",
         title: "Idempotency details",
@@ -5759,6 +6788,7 @@ X-Talonic-Cells-Resolved-AI: 5`
     seoTitle: "Extract Options \u2014 Talonic Docs",
     description: "Configure extraction options including output format, strict mode, async processing, webhook callbacks, raw text inclusion, page ranges, and language hints.",
     content: [
+      { type: "paragraph", text: "Pass these options as fields in the `options` JSON object on `POST /v1/extract` to control extraction behavior. Options let you switch between sync and async mode, include raw text, restrict page ranges, and configure webhook delivery." },
       {
         type: "param-table",
         params: [
@@ -5770,7 +6800,11 @@ X-Talonic-Cells-Resolved-AI: 5`
           { name: "page_range", type: "string", description: 'Pages to extract from. E.g. "1-5", "1,3,7-10". PDF only.' },
           { name: "language_hint", type: "string", description: "ISO 639-1 language code hint. Improves extraction for non-English documents." }
         ]
-      }
+      },
+      { type: "paragraph", text: "Most integrations use `strict: true` (default) to receive only the schema-defined fields. Set `strict: false` when you want the AI to also return additional fields it discovers beyond your schema. The `async` and `webhook_url` options are mutually beneficial \u2014 set `webhook_url` to avoid polling entirely." },
+      { type: "paragraph", text: 'The `page_range` option accepts comma-separated page numbers and ranges (e.g. `"1-5"`, `"1,3,7-10"`) and applies only to PDF files. Use `language_hint` with an ISO 639-1 code (e.g. `"de"`, `"ja"`) to improve extraction accuracy for non-English documents, especially when the OCR needs guidance on character sets.' },
+      { type: "paragraph", text: "Pair `include_raw_text: true` with schema-driven extraction when your downstream system needs both structured data and the original text for audit or display purposes. Note that setting `webhook_url` implicitly enables async behavior \u2014 the response will be `202 Accepted` regardless of the `async` flag." },
+      { type: "callout", text: 'The `format` option controls the output shape of the `data` field. Use `"json"` (default) for programmatic consumption. CSV format is available on the `GET /v1/extractions/:id/data` endpoint instead.' }
     ],
     related: [
       { label: "POST /v1/extract", slug: "post-extract" },
@@ -6064,6 +7098,10 @@ var sections19 = [
   }
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint after receiving an `extraction.complete` webhook or after polling a document's status until it reaches `completed`. A typical workflow is to extract a document via `POST /v1/extract`, store the returned `document.id`, then fetch full metadata here when needed." },
+      { type: "paragraph", text: "The response includes the current `status` field which will be `completed` when extraction has finished, `processing` while in progress, or `error` if something went wrong. Use the `latest_extraction_id` to navigate directly to the extraction result via `GET /v1/extractions/:id`." },
+      { type: "paragraph", text: "Pair this with `GET /v1/documents/:id/markdown` to retrieve the raw OCR text, or with `GET /v1/extractions/:id/data` for just the structured field values. Note that the `triage` object is only populated after ingestion completes and may be `null` for documents still in processing." },
+      { type: "callout", variant: "info", text: "The `links.dashboard` URL opens the document directly in the Talonic platform UI, which is useful for sharing with team members who need to review or correct extractions." },
       { type: "heading", level: 2, id: "get-document-errors", text: "Errors" },
       {
         type: "param-table",
@@ -6119,6 +7157,9 @@ var sections19 = [
   "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint as part of a cleanup workflow after data has been exported or when a document was uploaded in error. A typical pattern is to list documents with `GET /v1/documents`, identify candidates for deletion, then call this endpoint for each one." },
+      { type: "paragraph", text: "The response includes a `deleted` field set to `true` and the `id` of the removed document. There is no soft-delete mechanism \u2014 the original file, OCR markdown, and all extraction results are permanently purged from storage." },
+      { type: "paragraph", text: "Pair this with `GET /v1/documents/:id` beforehand to verify you are deleting the correct resource. Note that if the document participated in entity linking or cases, those links are removed and affected cases may be recomputed during the next backfill cycle." },
       { type: "heading", level: 2, id: "delete-document-errors", text: "Errors" },
       {
         type: "param-table",
@@ -6378,6 +7419,9 @@ var sections20 = [
   "due_date": "2024-03-15"
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint to feed extraction output into downstream systems (CRMs, ERPs, data warehouses) that only need the raw key-value data. A typical workflow is to extract a document, then call this endpoint with the `extraction_id` from the response to get a clean data payload without metadata overhead." },
+      { type: "paragraph", text: "The response is a flat JSON object where each key is a field name and each value is the extracted value, typed according to the schema (strings, numbers, dates, arrays). Use `?format=csv` to download the same data as a CSV file with field names as headers \u2014 the `Content-Disposition` header provides a suggested filename." },
+      { type: "paragraph", text: "Pair this with `GET /v1/extractions/:id` when you also need confidence scores, locked field status, or processing metadata. Note that the response shape matches the schema used during extraction \u2014 if no schema was provided, auto-discovered field names are used as keys." },
       { type: "heading", level: 2, id: "get-extraction-fields-errors", text: "Errors" },
       {
         type: "param-table",
@@ -6701,6 +7745,9 @@ var sections21 = [
   }
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint before running an extraction to verify the schema definition is correct, or after an update to confirm the new version was applied. A typical workflow is to create a schema with `POST /v1/schemas`, store the returned `id`, then fetch it here whenever you need the current definition." },
+      { type: "paragraph", text: "The response includes the full `definition` object in normalized JSON Schema format, along with the `version` number and `field_count`. Use the `links.extractions` URL to list all extractions that used this schema, and `links.dashboard` to open it in the platform UI." },
+      { type: "paragraph", text: "Pair this with `PUT /v1/schemas/:id` to update the definition, or pass the `id` as `schema_id` on `POST /v1/extract` to run schema-driven extraction. Note that both UUID and `SCH-` prefixed short IDs are accepted as the `:id` parameter." },
       { type: "heading", level: 2, id: "get-schema-errors", text: "Errors" },
       {
         type: "param-table",
@@ -6898,6 +7945,10 @@ var sections21 = [
   }
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint when extraction requirements evolve \u2014 for example, adding a new field to an invoice schema or renaming an existing one. A typical workflow is to fetch the current schema with `GET /v1/schemas/:id`, modify the `definition`, then send the updated payload here." },
+      { type: "paragraph", text: "The response includes the updated `definition`, `field_count`, and `version` number. The `updated_at` timestamp reflects when the change was applied. All body parameters are optional \u2014 send only `name`, `definition`, or `description` to update that field without touching the others." },
+      { type: "paragraph", text: "Pair this with `GET /v1/extractions?schema_id=:id` to review historical extractions that used previous versions. Note that schema versioning is append-only internally, so you can always compare before-and-after definitions through the dashboard." },
+      { type: "callout", variant: "info", text: "Schema updates do not retroactively change existing extractions. If you need to re-extract documents with the new schema, call `POST /v1/extract` with `document_id` and the updated `schema_id`." },
       { type: "heading", level: 2, id: "update-schema-errors", text: "Errors" },
       {
         type: "param-table",
@@ -6953,6 +8004,9 @@ var sections21 = [
   "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
 }`
       },
+      { type: "paragraph", text: "Most integrations call this endpoint during cleanup when a schema is no longer needed, or when consolidating duplicate schemas. A typical workflow is to list schemas with `GET /v1/schemas`, identify obsolete ones, then delete them individually by `id`." },
+      { type: "paragraph", text: "The response confirms deletion with `deleted: true` and the `id` of the removed schema. All extraction results that used this schema remain intact and queryable via `GET /v1/extractions` \u2014 only the schema definition itself is removed from the system." },
+      { type: "paragraph", text: "Pair this with `GET /v1/schemas/:id` beforehand to review the schema before removing it. Note that deletion is permanent with no undo \u2014 if you need the same structure later, you must recreate it with `POST /v1/schemas`." },
       { type: "heading", level: 2, id: "delete-schema-errors", text: "Errors" },
       {
         type: "param-table",
@@ -7530,6 +8584,9 @@ var sections23 = [
     description: "Create a new input source and receive a source-scoped API key. The key is only shown once in the creation response \u2014 store it securely.",
     content: [
       { type: "paragraph", text: "Create a new source to start ingesting documents. The response includes a **source-scoped API key** (`tlnc_sk_*`) that authenticates uploads to this source's endpoint. This key is shown only once \u2014 store it securely immediately after creation." },
+      { type: "paragraph", text: "The typical workflow is: create a source, store the returned `api_key` securely, then use it to authenticate document uploads to the source's `endpoint` URL. Optionally pass a `default_schema_id` to automatically apply an extraction schema to all documents ingested through this source." },
+      { type: "paragraph", text: "The response returns the source with `status: active`, `document_count: 0`, and the one-time `api_key` field. The `endpoint` URL is the path for `POST` document uploads. The `links` object includes URLs for the source detail, document list, and dashboard view." },
+      { type: "paragraph", text: "Store the `api_key` immediately \u2014 it cannot be retrieved again. If lost, delete the source and create a new one. The source type defaults to `api` (programmatic ingestion); use `upload` for manual file uploads or `connector` for third-party integrations like Google Drive or SharePoint." },
       { type: "callout", variant: "warning", text: "The `api_key` is only returned in the creation response. It cannot be retrieved later. If you lose it, delete the source and create a new one." },
       {
         type: "endpoint",
@@ -7615,6 +8672,10 @@ var sections23 = [
     description: "Get source details, update a source name, or delete a source. Documents are retained but unlinked when a source is deleted.",
     content: [
       { type: "paragraph", text: "Manage an individual source with GET, PATCH, and DELETE operations on the same path. Retrieve source details, update its name, or permanently delete it. When a source is deleted, its documents are **retained** but unlinked from the source." },
+      { type: "paragraph", text: "Use `GET` to inspect a source's current status, document count, and default schema assignment. Use `PATCH` to rename a source. Use `DELETE` when a source is no longer needed \u2014 this immediately invalidates the source-scoped API key, so any integration using it will start receiving `401` errors." },
+      { type: "paragraph", text: "The `GET` response includes `document_count`, `default_schema` (with its `id` if set), and the `endpoint` URL for document ingestion. The `status` field shows the current state \u2014 `active` for API sources, or sync status values for connector-based sources (Google Drive, SharePoint, etc.)." },
+      { type: "paragraph", text: "Deleting a source retains all its documents in your workspace \u2014 they remain accessible via the documents API and any existing extractions are preserved. Only the source-to-document link is removed. Pair `GET /v1/sources/:id` with `GET /v1/sources/:id/documents` to see documents belonging to a specific source." },
+      { type: "callout", variant: "info", text: "Deleting a source immediately invalidates its API key. Any integration using that key will receive `401` errors. Documents are retained but unlinked from the source." },
       {
         type: "endpoint",
         method: "GET",
@@ -8522,7 +9583,11 @@ var sections25 = [
         "`extraction.complete` \u2014 Extraction finished successfully. Payload includes the full extraction result.",
         "`extraction.failed` \u2014 Extraction failed. Payload includes the error details.",
         "`document.ingested` \u2014 A new document has been processed and is ready for extraction."
-      ] }
+      ] },
+      { type: "paragraph", text: "Most integrations subscribe to `extraction.complete` to trigger downstream processing (e.g. writing structured data to a database or notifying a user). A typical workflow is to pass `webhook_url` on `POST /v1/extract`, then handle the callback payload in your server without polling." },
+      { type: "paragraph", text: "The `extraction.complete` payload includes the `extraction_id`, `document_id`, `schema_id`, `status`, and `confidence` score. Use the `extraction_id` to fetch the full result via `GET /v1/extractions/:id` if the payload does not contain all the fields you need." },
+      { type: "paragraph", text: "Pair event handling with [Signature Verification](webhook-security) to ensure payloads are authentic. Note that `extraction.failed` events include an `error` field with a machine-readable code and human-readable message \u2014 use this to decide whether to retry via `POST /v1/extract` with `document_id`." },
+      { type: "callout", text: "Webhook URLs must be HTTPS endpoints. HTTP URLs are rejected at configuration time to ensure payload confidentiality in transit." }
     ],
     related: [
       { label: "Signature Verification", slug: "webhook-security" },
@@ -8640,7 +9705,11 @@ echo -n '{"event":"extraction.complete","delivery_id":"dlv_test123","timestamp":
         "3rd retry \u2014 30 minutes",
         "4th retry (final) \u2014 4 hours"
       ] },
-      { type: "paragraph", text: "After 4 failed attempts, the delivery is marked as failed. You can check delivery status and replay events from the dashboard." }
+      { type: "paragraph", text: "After 4 failed attempts, the delivery is marked as failed. You can check delivery status and replay events from the dashboard." },
+      { type: "paragraph", text: "Most integrations rely on the default retry schedule and only intervene when a delivery reaches the failed state. A typical debugging workflow is to check the delivery history in the dashboard, identify the HTTP status or timeout that caused the failure, then fix the endpoint and replay the event." },
+      { type: "paragraph", text: "Your endpoint must return a `2xx` status code within **30 seconds** to be considered successful. Non-`2xx` responses (including `3xx` redirects) and timeouts trigger retries. The `X-Talonic-Delivery-Id` header remains the same across retries, so use it for idempotent processing on your end." },
+      { type: "paragraph", text: "Pair retry awareness with [Signature Verification](webhook-security) to reject spoofed payloads early. Note that the total retry window spans approximately **4.5 hours** from the initial attempt \u2014 if your endpoint is down longer than that, use the dashboard replay feature to re-send missed events." },
+      { type: "callout", text: "If your endpoint consistently fails, check for firewall rules blocking Talonic IPs, TLS certificate issues, or response timeouts exceeding 30 seconds. The dashboard delivery log shows the HTTP status and error for each attempt." }
     ],
     related: [
       { label: "Webhook Events", slug: "webhook-events" },
@@ -8658,7 +9727,11 @@ echo -n '{"event":"extraction.complete","delivery_id":"dlv_test123","timestamp":
     seoTitle: "Webhook Delivery Format \u2014 Talonic Docs",
     description: "Webhook delivery format details including POST request structure, JSON body format, and standard headers for event type, signature, delivery ID, and timestamp.",
     content: [
-      { type: "paragraph", text: "Webhooks are delivered as `POST` requests with a JSON body. Configure webhook URLs per-source or per-extraction via the `webhook_url` option on the extract endpoint." }
+      { type: "paragraph", text: "Webhooks are delivered as `POST` requests with a JSON body. Configure webhook URLs per-source or per-extraction via the `webhook_url` option on the extract endpoint." },
+      { type: "paragraph", text: "Most integrations configure a single webhook endpoint that handles all event types, using the `X-Talonic-Event` header to route internally. A typical setup is to pass `webhook_url` on `POST /v1/extract` calls, or configure a default URL in the dashboard for all extractions from a specific source." },
+      { type: "paragraph", text: "Each delivery includes four standard headers: `X-Talonic-Event` (event type), `X-Talonic-Signature` (HMAC-SHA256 for verification), `X-Talonic-Delivery-Id` (unique ID for idempotency), and `X-Talonic-Timestamp` (Unix timestamp). Your endpoint must return a `2xx` status within **30 seconds** or the delivery is considered failed." },
+      { type: "paragraph", text: "Pair webhook delivery with the [Signature Verification](webhook-security) guide to authenticate incoming payloads. Note that failed deliveries are retried with exponential backoff up to 4 times \u2014 see [Retry Policy](webhook-retry) for the schedule." },
+      { type: "callout", text: "Use the `X-Talonic-Delivery-Id` header to deduplicate webhook deliveries on your end. Retries reuse the same delivery ID, so you can safely discard duplicates." }
     ],
     related: [
       { label: "Webhook Events", slug: "webhook-events" },
@@ -9214,6 +10287,9 @@ var sections27 = [
     description: "Classify link keys into categories (identity, transaction, reference) using AI. Runs asynchronously on ambiguous fields.",
     content: [
       { type: "paragraph", text: "When new fields are extracted, some may not be automatically classified as link keys. The classify endpoint runs AI-powered classification on ambiguous fields to determine whether they are **identity**, **transaction**, or **reference** link keys. This is useful after onboarding new document types or when the field registry grows." },
+      { type: "paragraph", text: "Call this endpoint after uploading a new batch of documents or after adding a new document type to your workspace. The endpoint returns immediately with the count of fields that were classified \u2014 any graph rebuilding happens asynchronously via a triggered backfill." },
+      { type: "paragraph", text: "The response includes a `classified` count (number of fields newly assigned a category) and a `backfillTriggered` boolean. When `backfillTriggered` is `true`, entity links across all documents are being rebuilt in the background. Poll the **Backfill** progress endpoint to monitor completion." },
+      { type: "paragraph", text: "Only fields with a `null` category are evaluated \u2014 already-classified link keys are not re-assessed. To verify which fields were classified, call the **Link Keys** endpoint before and after. If no ambiguous fields remain, `classified` returns `0` and no backfill is triggered." },
       { type: "callout", variant: "info", text: "Classification uses a two-pass approach: rule-based heuristics handle obvious cases (e.g. fields named `invoice_number`), then an LLM call classifies the remaining ambiguous fields. A backfill is automatically triggered when new link keys are identified." },
       {
         type: "endpoint",
@@ -9268,6 +10344,9 @@ var sections27 = [
     description: "Get all entity links for a specific document showing entity values, types, link keys, and linked document IDs.",
     content: [
       { type: "paragraph", text: "Retrieve all entity links discovered for a specific document. Each link represents a shared field value \u2014 such as a customer ID or PO number \u2014 that connects this document to others in the workspace. Use this endpoint to understand how a document relates to the rest of your corpus." },
+      { type: "paragraph", text: "Call this endpoint when building a document detail view or when you need to trace the relationships of a single document before exploring the broader graph. Pass the document UUID as a path parameter \u2014 the endpoint returns all entity links regardless of link key category." },
+      { type: "paragraph", text: "Each entry in the response includes the **entity_value** (the raw shared value), the **field_key** (which field it was extracted from), and the **link_key_category** (`identity`, `transaction`, or `reference`). Documents with no extracted field values matching other documents return an empty `data` array." },
+      { type: "paragraph", text: "Use this alongside the **Full Graph** subgraph endpoint to progressively explore the linking graph. Start here for a flat list of connections, then call the subgraph endpoint with `depth=2` to expand outward from the document and discover second-degree relationships." },
       { type: "callout", variant: "info", text: "The `document_count` field on each entity indicates how many documents share that value. A high count on an identity entity (e.g. a vendor ID appearing in 50+ documents) is expected, while a high count on a transaction entity may indicate a data quality issue." },
       {
         type: "endpoint",
@@ -9582,6 +10661,10 @@ var sections27 = [
     description: "List and retrieve cases \u2014 automatically created groups of 2+ related documents linked through shared field values with narrative summaries.",
     content: [
       { type: "paragraph", text: "Cases are automatically created groups of two or more documents that are connected through shared **transaction** or **reference** entity values. For example, an invoice, a purchase order, and a delivery note sharing the same PO number form a case. Cases provide a high-level view of document relationships without needing to navigate the full graph." },
+      { type: "paragraph", text: "Use this endpoint to retrieve all cases in your workspace for building case lists, dashboards, or approval queues. The response is ordered by most recent first based on the earliest document timestamp in each case. Each case includes a `document_count` and a stable `case_key` that you can use for subsequent detail lookups." },
+      { type: "paragraph", text: "The response includes a `links.self` URL for each case that points to the case detail endpoint. The `label` field contains an auto-generated human-readable name when available, or `null` for cases that have not yet been labelled. The `created_at` field reflects the timestamp of the earliest document in the group." },
+      { type: "callout", variant: "info", text: "Each document belongs to at most one case. Documents linked only through identity entities (e.g. shared vendor ID) appear as entity groups in the full graph but are not returned by this endpoint." },
+      { type: "paragraph", text: "Pair this endpoint with **Case Graph** to visualize individual cases, or with **Document-Case Map** for a flat document-to-case lookup. Cases are rebuilt automatically during backfill \u2014 if you have recently reclassified link keys, trigger a backfill first to ensure case assignments are up to date." },
       { type: "list", ordered: false, items: [
         "Each case has a deterministic **case key** (hex hash of its document IDs)",
         "Cases are created by the linking pipeline during backfill or real-time processing",
@@ -9656,6 +10739,10 @@ var sections27 = [
     description: "Retrieve the D3-compatible graph visualization for a single case, showing document nodes and entity edges within the case boundary.",
     content: [
       { type: "paragraph", text: "Retrieve the graph structure for a single case, formatted for **D3.js** or similar graph visualization libraries. The response contains only the nodes and edges within the case boundary, making it suitable for rendering focused relationship diagrams." },
+      { type: "paragraph", text: "The typical workflow is to first list cases via the **Cases** endpoint, then call this endpoint with a specific `case_key` to fetch the renderable graph. This is the primary endpoint for building case-level visualizations in custom UIs or embedded dashboards." },
+      { type: "paragraph", text: "The response includes both **document nodes** (with filename and inferred document type) and **entity nodes** (with the shared value and link key category). Edges always connect a document to an entity \u2014 never document-to-document directly. Node IDs are stable across requests, so you can preserve force-layout positions between refreshes." },
+      { type: "callout", variant: "info", text: "The case graph is a strict subset of the full workspace graph. Only entities that contributed to forming the case are included \u2014 high-frequency entities excluded from BFS do not appear." },
+      { type: "paragraph", text: "Pair this endpoint with **Document Links** to enrich each node with additional entity metadata, or with **Full Graph** when you need cross-case visibility. The graph structure mirrors the full graph format, so the same rendering code works for both." },
       {
         type: "endpoint",
         method: "GET",
@@ -9723,6 +10810,9 @@ var sections27 = [
     description: "Get the mapping of documents to their resolved cases. Returns a mapping of document IDs to assigned case keys.",
     content: [
       { type: "paragraph", text: "The document-case map provides a flat lookup from document ID to case assignment. Use it to quickly determine which case a document belongs to, or to identify documents that are not part of any case. Documents in **entity groups** (linked only through identity entities) are included with `is_case: false`." },
+      { type: "paragraph", text: "Call this endpoint when you need to enrich a document list with case membership \u2014 for example, to display a case badge next to each document in a table view. The response is a flat object keyed by document UUID, so lookups are O(1) without client-side joins." },
+      { type: "paragraph", text: "Each entry includes a `case_key` (the deterministic hex hash identifying the case), a `document_count` (total documents in that case or entity group), and an `is_case` boolean. When `is_case` is `false`, the `case_key` is an empty string \u2014 the document is linked via identity entities only." },
+      { type: "paragraph", text: "This endpoint pairs well with the **Cases** list endpoint. Use the map for bulk lookups across your document set, and the Cases endpoint when you need case-level metadata like labels or timestamps. Documents with no entity links at all are omitted from the map entirely." },
       { type: "callout", variant: "info", text: "Documents with `is_case: false` are linked to other documents only through identity entities (e.g. same vendor). They appear in the map but do not form a case. Documents with no links at all are not included in the map." },
       {
         type: "endpoint",
@@ -12058,6 +13148,9 @@ var sections31 = [
     description: "Get metric trends over time for a schema. Returns time-series telemetry data across recent runs for tracking quality changes.",
     content: [
       { type: "paragraph", text: "Track how structuring metrics evolve over successive runs for a schema. This endpoint returns a **time-series** of telemetry snapshots, allowing you to detect quality improvements, regressions, or shifts in strategy distribution as your field registry matures." },
+      { type: "paragraph", text: "Call this endpoint after several extraction runs to build trend charts or to detect regressions. The default window returns the 10 most recent runs \u2014 use the `window` query parameter to expand up to 50 runs for longer-term analysis." },
+      { type: "paragraph", text: "Each snapshot in the `data` array contains the same metrics as the **Schema Summary** \u2014 `capture_hit_rate`, `synthesize_rate`, `strategy_distribution`, and `tier_funnel` \u2014 plus a `created_at` timestamp and `run_id`. The array is ordered by most recent run first." },
+      { type: "paragraph", text: "Compare the trend data with the **Schema Fields** endpoint to pinpoint which specific fields are driving changes. A sudden spike in `synthesize_rate` across runs may indicate a new document type that the field registry has not yet learned, while a steady decrease signals healthy registry maturation." },
       { type: "callout", variant: "info", text: "A rising `capture_hit_rate` over time indicates the field registry is learning from extractions and resolving more fields deterministically, reducing LLM costs." },
       {
         type: "endpoint",
@@ -12162,6 +13255,9 @@ var sections31 = [
     description: "Get per-field structuring metrics for a schema including field-level state distribution, capture rates, and strategy breakdown.",
     content: [
       { type: "paragraph", text: "Drill down to **individual field performance** within a schema. This endpoint returns per-field capture rates, synthesis rates, the most common strategy used, and the distribution of cell states (filled, empty, skipped). Use it to identify underperforming fields that may need instruction tuning or manual review." },
+      { type: "paragraph", text: "Call this endpoint after reviewing the **Schema Summary** to investigate which fields are driving low capture rates or high synthesis costs. The field-level breakdown reveals whether issues are concentrated in a few problematic fields or spread evenly across the schema." },
+      { type: "paragraph", text: "Each entry in the `data` array includes the `field_name`, `capture_rate` and `synthesize_rate` (both 0-1 fractions), the dominant `strategy` (one of `transfer`, `extract`, `compute`, `skip`), and a `state_distribution` object with `filled`, `empty`, and `skipped` counts. Fields with a `strategy` of `extract` are LLM-dependent and contribute most to cost." },
+      { type: "paragraph", text: "Pair this with the **Schema Trend** endpoint to track how individual field performance changes across runs. Fields that remain stuck on `extract` strategy after multiple runs are strong candidates for adding explicit instructions or seeding the field registry with example values." },
       { type: "callout", variant: "info", text: "Fields with a high `synthesize_rate` and low `capture_rate` are candidates for field registry enrichment or instruction refinement to reduce LLM dependency." },
       {
         type: "endpoint",
@@ -12243,6 +13339,9 @@ var sections31 = [
     description: "Get aggregate structuring metrics for a single job run including strategy distribution, tier funnel, and capture hit rate.",
     content: [
       { type: "paragraph", text: "Retrieve structuring telemetry for a **specific job run** rather than the latest run for a schema. Use this when you need to inspect the performance of a particular execution, compare two runs side by side, or debug a run that produced unexpected results." },
+      { type: "paragraph", text: "The typical workflow is to list runs from your jobs pipeline, then call this endpoint with the run UUID to inspect its metrics. This is especially useful when a run produces unexpected accuracy \u2014 the telemetry reveals whether the issue is in capture (registry gaps), synthesis (LLM errors), or strategy selection." },
+      { type: "paragraph", text: "The response includes `capture_hit_rate`, `synthesize_rate`, `strategy_distribution`, and `tier_funnel` \u2014 identical in shape to the **Schema Summary**. The `schema_id` field identifies which schema was used, allowing you to cross-reference with field-level telemetry. Runs that are still `pending` or `running` return a `404` until they complete." },
+      { type: "paragraph", text: "To compare two runs, call this endpoint twice with different run IDs and diff the `strategy_distribution` and `tier_funnel` values. Pair with the **Schema Trend** endpoint when you need the full historical view rather than a point-in-time comparison." },
       { type: "callout", variant: "info", text: "The response shape is identical to the Schema Summary endpoint. The only difference is that this endpoint targets a specific run by ID instead of returning the latest run for a schema." },
       {
         type: "endpoint",
@@ -12392,6 +13491,9 @@ var sections32 = [
     description: "Get detail with expected values or delete a ground-truth dataset. Supports GET (read scope) and DELETE (write scope) on the same path.",
     content: [
       { type: "paragraph", text: "Retrieve the full details of a ground-truth dataset including all expected value entries, or permanently delete the dataset. The GET response includes every document-field pair with the expected value, which you can use to audit the benchmark data before running a validation." },
+      { type: "paragraph", text: "Call GET before starting a validation run to verify that expected values are correct and complete. The `values` array contains every document-field pair with its `expected_value`, `document_id`, and `field_name` \u2014 review these to ensure the benchmark data reflects your current extraction requirements." },
+      { type: "paragraph", text: "The response includes `entry_count` for a quick size check and `user_schema_id` to confirm schema scope. The `values` array entries each have their own UUID (`id`) and `created_at` timestamp. If the dataset is unscoped (`user_schema_id: null`), it can validate fields across any schema." },
+      { type: "paragraph", text: "Use DELETE only when the dataset is no longer relevant. Existing validation runs that referenced this dataset are retained with their results intact, but you cannot create new runs against a deleted dataset. To update individual entries, delete and recreate the dataset with corrected values." },
       { type: "callout", variant: "warning", text: "Deleting a ground-truth dataset also removes all associated expected value entries. Existing validation runs that used this dataset are retained but can no longer be re-run." },
       {
         type: "endpoint",
@@ -12653,6 +13755,10 @@ var sections32 = [
     description: "Get validation run detail with accuracy summary or delete a run. Supports GET (read scope) and DELETE (write scope) on the same path.",
     content: [
       { type: "paragraph", text: "Retrieve the full details of a validation run including its status, accuracy score, and total comparisons. Or permanently delete a run and its associated results. Use GET to poll a run's status until it reaches `completed`, then fetch the detailed results." },
+      { type: "paragraph", text: "After creating a validation run, poll this endpoint until the `status` field transitions from `pending` or `running` to `completed` or `failed`. Once completed, the `accuracy` field contains the overall score (0-1) and `total_comparisons` shows how many field-level comparisons were made." },
+      { type: "paragraph", text: "The response includes `links.results` which points directly to the per-field results endpoint. Once the run reaches `completed` status, follow this link to retrieve the granular comparison data including match types, similarity scores, and LLM judge verdicts." },
+      { type: "callout", variant: "warning", text: "Deleting a validation run permanently removes all per-field results. The ground-truth dataset and the original job run are not affected. Use DELETE only when you want to clean up outdated or erroneous runs." },
+      { type: "paragraph", text: "Pair this endpoint with **Create Validation Run** for the create-then-poll workflow, or with **List Validation Runs** to find specific runs by recency. Comparing the `accuracy` values of multiple runs against the same ground-truth dataset is the primary way to track extraction quality over time." },
       {
         type: "endpoint",
         method: "GET",
@@ -12904,6 +14010,9 @@ var sections33 = [
     description: "Get credit transaction history including purchases, deductions, and adjustments with page-based pagination.",
     content: [
       { type: "paragraph", text: "Retrieve a chronological log of every credit transaction on your account. Transactions include **purchases** (positive amounts), **consumption deductions** (negative amounts), **bonuses**, and **manual adjustments**. Use this to audit spending and reconcile usage." },
+      { type: "paragraph", text: "Call this endpoint to build a transaction ledger view or to reconcile credit changes over a billing period. The response uses page-based pagination \u2014 pass `page` and `limit` query parameters to navigate through large transaction histories. The default page size is 20 with a maximum of 100." },
+      { type: "paragraph", text: "Each transaction includes an `amount` (negative for deductions, positive for purchases), a `type` field (`consumption`, `purchase`, `bonus`, or `adjustment`), and an `operation_type` that identifies the pipeline operation responsible. The `total` field in the response gives the full count for pagination math." },
+      { type: "paragraph", text: "Use this alongside the **Balance** endpoint to understand how your balance arrived at its current value. For aggregate cost analysis by operation type and model, the **Usage Summary** endpoint provides a more efficient grouped view without per-transaction detail." },
       { type: "callout", variant: "info", text: "Transactions are ordered by most recent first. Each entry includes the `operation_type` that triggered it (e.g. `extraction`, `manual`), making it easy to trace costs back to specific pipeline operations." },
       {
         type: "endpoint",
@@ -12986,6 +14095,9 @@ var sections33 = [
     description: "Get aggregate credit usage summary broken down by operation type and model for a configurable time period.",
     content: [
       { type: "paragraph", text: "Get a high-level view of your API usage grouped by **operation type** and **model**. This endpoint aggregates call counts, token consumption, and estimated costs over a configurable lookback period. Use it to understand which operations drive your spending." },
+      { type: "paragraph", text: "Call this endpoint to build cost dashboards or to identify which pipeline operations consume the most credits. The default lookback is 30 days \u2014 pass the `days` query parameter to adjust. Each row in the `stats` array represents a unique combination of `operation_type` and `model`." },
+      { type: "paragraph", text: "The response includes `call_count`, `total_input_tokens`, `total_output_tokens`, `total_cache_read_tokens`, and `total_cost_usd` per grouping. Note that token-based operations (e.g. `extraction` via Claude) report full token breakdowns, while page-based operations (e.g. `document_ai_ocr`) report zero tokens since cost is calculated from pages processed." },
+      { type: "paragraph", text: "Pair with **Daily Usage** for time-series analysis of the same period, or with **Usage Log** to drill into individual requests behind a high-cost grouping. The `period_days` field in the response confirms the actual lookback window applied." },
       { type: "callout", variant: "info", text: "Cost estimates include all token classes: input tokens, output tokens, cache creation tokens, and cache read tokens. Each is priced at the model-specific rate." },
       {
         type: "endpoint",
@@ -13074,6 +14186,10 @@ var sections33 = [
     description: "Get per-day credit usage breakdown for the specified period (default last 30 days) with call counts and token totals per day.",
     content: [
       { type: "paragraph", text: "Get a per-day breakdown of API usage over a configurable period. Each entry includes the total number of API calls, input/output token counts, and estimated cost for that calendar date. Use this for usage trend analysis and daily cost monitoring." },
+      { type: "paragraph", text: "Call this endpoint to populate daily usage charts or to set up alerting on cost spikes. The default lookback is 30 days \u2014 use the `days` query parameter to widen or narrow the window. Days with zero API calls are omitted from the response array." },
+      { type: "paragraph", text: "Each entry contains a `date` (YYYY-MM-DD in UTC), `calls` (total API calls), `input_tokens`, `output_tokens`, and `cost_usd`. All timestamps are UTC \u2014 a call made at 23:59 UTC on a given date appears under that UTC date, not the caller's local date." },
+      { type: "callout", variant: "info", text: "Daily usage is ordered by date ascending, making it ready for time-series charting without client-side sorting. Pair with the **Usage Summary** endpoint for operation-level breakdowns within the same period." },
+      { type: "paragraph", text: "Combine this endpoint with **Balance** to correlate daily burn against remaining runway. If you notice a cost spike on a specific date, drill into the **Usage Log** to identify the individual requests responsible." },
       {
         type: "endpoint",
         method: "GET",
@@ -13343,6 +14459,9 @@ var sections34 = [
     description: "List all tools available to the embedded agent including their impact level (read/write) and descriptions for discovering agent capabilities.",
     content: [
       { type: "paragraph", text: "Discover all tools available to the embedded AI agent. Each tool declares its **impact level** \u2014 whether it performs a read-only operation or a mutation \u2014 so you can build permission-aware integrations. Use this endpoint to dynamically generate tool descriptions for external AI agents or to audit available capabilities." },
+      { type: "paragraph", text: "Call this endpoint at startup to populate your integration's tool registry, or periodically to detect newly added capabilities. The response includes every tool the agent can invoke, with a stable `name` identifier, a human-readable `description`, and the `impact` classification." },
+      { type: "paragraph", text: "The `totalCount` field gives the total number of tools available. Each tool's `impact` field follows a four-level severity scale: `read`, `draft_mutation`, `live_mutation`, and `irreversible`. Use these levels to build confirmation gates \u2014 for example, auto-approve `read` tools but require user confirmation for `live_mutation` and above." },
+      { type: "paragraph", text: "Pair this with the **Workspace Context** endpoint to give your external AI agent both situational awareness (context) and available actions (tools). The tool names returned here are stable identifiers that can be referenced in custom orchestration logic or permission policies." },
       { type: "callout", variant: "info", text: "Impact levels follow a severity scale: `read` (no side effects), `draft_mutation` (creates drafts only), `live_mutation` (modifies live data), and `irreversible` (permanent changes like deletion). Use these to implement confirmation gates in your integration." },
       {
         type: "endpoint",
@@ -13521,6 +14640,9 @@ var sections35 = [
     description: "Create a matching configuration with field mappings, comparison strategies (exact, fuzzy, date_range, numeric_range), and per-field weights that sum to 1.0.",
     content: [
       { type: "paragraph", text: "Create a matching configuration that defines how documents are compared against a reference dataset. Each field mapping specifies a source field (from extracted documents), a target column (in the reference data), a comparison strategy, and a relative weight." },
+      { type: "paragraph", text: "The typical workflow is: upload reference data via `POST /v1/matching/reference-data`, create a config with field mappings, then trigger a run via `POST /v1/matching/configs/:id/run`. For complex datasets, use `POST /v1/matching/strategies/generate` first to get AI-recommended mappings and weights." },
+      { type: "paragraph", text: "The response returns the config with the saved `field_mappings`, `threshold` (defaults to 0.85), and `links.runs` URL for triggering runs. The `reference_data_id` is fixed at creation \u2014 to match against a different dataset, create a new config." },
+      { type: "paragraph", text: "Choose strategies carefully: use `exact` for standardized codes and IDs, `fuzzy` for names with potential typos, `date_range` for dates with tolerance, and `numeric_range` for amounts with rounding differences. Weights must sum to 1.0 \u2014 fields with higher weights have more influence on the overall confidence score." },
       { type: "callout", variant: "info", text: "Field weights should sum to 1.0. The overall confidence score for a match is the weighted sum of per-field scores. Use the **generate strategy** endpoint to get AI-recommended mappings if you are unsure which fields and weights to use." },
       {
         type: "list",
@@ -13642,6 +14764,10 @@ var sections35 = [
     description: "Get matching configuration details, update field mappings and weights, or delete a configuration. Deleting a config does not remove past run results.",
     content: [
       { type: "paragraph", text: "Retrieve, update, or delete a matching configuration. Updates to field mappings and thresholds take effect on the next run \u2014 they do not retroactively change past results. Deleting a config removes the configuration but preserves all historical run results for audit purposes." },
+      { type: "paragraph", text: "Use `GET` to inspect the current field mappings, threshold, and targeting mode before running a match. Use `PUT` to adjust weights, swap strategies, or change the threshold \u2014 a common pattern is to lower the threshold after reviewing low-confidence results, then re-run to capture more matches." },
+      { type: "paragraph", text: "The `PUT` response returns the full updated config. The `reference_data_id` cannot be changed after creation \u2014 to match against a different dataset, create a new config. The `links.runs` URL provides a convenient shortcut to trigger a new run with the updated config." },
+      { type: "paragraph", text: "Deleting a config is safe for audit \u2014 all historical run results, including per-document evidence and confidence scores, are preserved. Pair config updates with the generate strategy endpoint to get AI-recommended adjustments based on your reference dataset." },
+      { type: "callout", variant: "info", text: "Past run results are immutable. Updating field mappings or thresholds only affects future runs \u2014 re-run matching after config changes to see the updated results." },
       {
         type: "endpoint",
         method: "GET",
@@ -13920,6 +15046,9 @@ var sections35 = [
     description: "Get the status, progress, and summary of a matching run. Status progresses from queued to running to completed or failed.",
     content: [
       { type: "paragraph", text: "Retrieve the current state of a matching run. Poll this endpoint while `status` is `queued` or `running` to track progress. Once `completed`, the response includes the top 50 results by confidence. Use the results endpoint for full paginated access." },
+      { type: "paragraph", text: "Poll this endpoint after triggering a run via `POST /v1/matching/configs/:id/run`. A typical polling pattern is to check every 5-10 seconds while `status` is `queued` or `running`. Use `GET /v1/matching/runs/:id/progress` for lighter-weight progress updates during long runs." },
+      { type: "paragraph", text: "Once completed, the response includes `rows_processed`, `rows_matched`, and `avg_confidence` at the run level, plus a `results` array with the top 50 matches by confidence. Each result includes `document_id`, `matched_reference_row_id`, `confidence` score, review `status` (`pending`, `approved`, `rejected`), and per-field `evidence` breakdown." },
+      { type: "paragraph", text: "For the full result set beyond the top 50, use `GET /v1/matching/runs/:id/results` with pagination. Use `POST /v1/matching/runs/:runId/results/:resultId/review` to approve or reject individual matches. If `status` is `ai_resolving`, the run is using Claude Haiku to disambiguate borderline matches \u2014 this phase adds latency but can significantly improve accuracy on ambiguous rows." },
       { type: "callout", variant: "info", text: "The `ai_resolving` status indicates that the run has finished standard matching and is now running an AI resolution pass on low-confidence rows. This pass uses Claude Haiku to disambiguate borderline matches." },
       {
         type: "endpoint",
@@ -14022,6 +15151,9 @@ var sections35 = [
     description: "Retrieve matching results for a completed run. Returns the top 5 candidates per document with weighted confidence scores and per-field evidence breakdowns.",
     content: [
       { type: "paragraph", text: "Retrieve the full paginated results for a completed matching run. Each result represents a document matched (or unmatched) against the reference dataset, with a weighted confidence score and per-field evidence breakdown showing how each field contributed to the overall score." },
+      { type: "paragraph", text: "Use this endpoint after a run completes to review all matches. Filter by `status=pending` to see matches awaiting review, or `status=approved` to see confirmed matches. Paginate with `page` and `limit` \u2014 the run detail endpoint only shows the top 50 results, while this endpoint provides full access." },
+      { type: "paragraph", text: "Each result includes a per-field `evidence` object showing the strategy used and individual score for each field mapping. A `null` `matched_reference_row_id` means no reference row scored above the configured threshold for that document. The `confidence` score is the weighted sum of per-field scores using the weights from the matching config." },
+      { type: "paragraph", text: "Use `POST /v1/matching/runs/:runId/results/:resultId/review` to approve or reject individual matches programmatically. Pair with the config detail endpoint to understand which field mappings and thresholds produced these results. Re-run matching with adjusted weights or a lower threshold to capture more matches." },
       { type: "callout", variant: "info", text: "Results with `status: pending` have not been reviewed. Use `POST /v1/matching/runs/:runId/results/:resultId/review` to approve or reject individual matches. Approved matches can be used downstream for data enrichment and reconciliation workflows." },
       {
         type: "endpoint",
@@ -14316,6 +15448,9 @@ var sections36 = [
     description: "Create a delivery destination with connector type, transport config, and authentication. Supported types: webhook, sftp, s3, azure_blob, google_drive, onedrive.",
     content: [
       { type: "paragraph", text: "Create a new delivery destination by specifying the connector type, transport configuration, and optional authentication. The `config` and `auth_config` schemas vary by destination type \u2014 see the catalog endpoint for connector capabilities." },
+      { type: "paragraph", text: "The typical workflow is: create a destination first, then create one or more bindings that route signals to it. Call `GET /v1/delivery/catalog/connectors` to see which connector types are available and what `config` and `auth_config` schemas each expects." },
+      { type: "paragraph", text: "The response returns the created destination with `is_active: true` and `last_delivery_at: null`. Auth credentials are never echoed back \u2014 use the `has_auth_config` and `has_signing_secret` booleans to confirm they were stored. After creation, use `POST /v1/delivery/destinations/:id/test` to verify connectivity before setting up bindings." },
+      { type: "paragraph", text: "For webhook destinations, include a `signing_secret` in `auth_config` to enable HMAC-SHA256 request signing. For file-drop destinations (S3, SFTP, Azure Blob), set `payload_cap_bytes` if you need to override the global 5 MiB cap. OAuth destinations (Google Drive, OneDrive) require completing the OAuth flow first." },
       { type: "callout", variant: "info", text: "OAuth-based destinations (google_drive, onedrive) require completing an OAuth flow before creating the destination. Use the OAuth start endpoint to initiate the flow and obtain tokens." },
       {
         type: "endpoint",
@@ -14418,6 +15553,9 @@ var sections36 = [
     description: "Get destination details, update config, delete a destination, or send a test payload to verify connectivity. Auth credentials are always redacted in responses.",
     content: [
       { type: "paragraph", text: "Manage a single destination: retrieve its current config, update transport settings or credentials, delete it, or test connectivity. The **test** endpoint probes the destination without delivering real data \u2014 file-drop connectors (S3, SFTP, Azure Blob) verify bucket/container reachability without writing any objects." },
+      { type: "paragraph", text: "Use `GET` to inspect current config and delivery status. Use `PUT` to rotate credentials or change the target URL/bucket. Use `POST /test` after updating credentials to verify the new config works before live traffic flows through it. Use `DELETE` only when permanently removing a destination." },
+      { type: "paragraph", text: "The `GET` response includes `last_delivery_at` and `last_delivery_status` to show the most recent delivery attempt. The `is_active` flag indicates whether the destination is enabled \u2014 destinations are automatically disabled on `auth_failed` or `ssrf_blocked` errors. The test endpoint returns `success`, `durationMs`, and an optional `message` describing what was probed." },
+      { type: "paragraph", text: "If a destination becomes inactive due to auth failure, fix the credentials via `PUT`, then call the test endpoint to verify. The destination will be re-enabled automatically on a successful update. Prefer disabling (`is_active: false` via `PUT`) over deleting when you want to pause delivery but keep the history." },
       { type: "callout", variant: "warning", text: "Deleting a destination cascades to all its bindings, delivery items, and DLQ entries. This is irreversible. Disable the destination (`is_active: false`) instead if you want to preserve history." },
       {
         type: "endpoint",
@@ -14706,6 +15844,9 @@ var sections36 = [
     description: "Create a delivery binding that routes domain signals through a deliverable resolver and serializer to a destination. Includes field mapping and retry policy configuration.",
     content: [
       { type: "paragraph", text: "Create a binding that wires a domain event to a destination. The **compatibility triangle** is validated on creation: the signal event type must be compatible with the deliverable resolver, the serializer must support the deliverable shape, and the connector must support the serializer format." },
+      { type: "paragraph", text: "The typical workflow is: query the catalog endpoints top-down (signals, then deliverables, then serializers, then connectors), pick compatible values, and create the binding. A single event can fan out to multiple bindings \u2014 create separate bindings for each destination or output format you need." },
+      { type: "paragraph", text: "The response returns the binding with `is_active: true` and `last_status: null`. The `field_map` controls payload projection: use `static` to inject fixed values, `drop` to remove fields, and key-value pairs to rename fields. The `delivery_policy` defaults to 7 attempts with exponential backoff over ~10 hours if omitted." },
+      { type: "paragraph", text: "After creation, the binding is immediately live \u2014 the next matching signal will trigger delivery. Use `POST /v1/delivery/bindings/:id/preview` (internal) to dry-run the resolve-project-serialize pipeline. Monitor delivery health via the history and DLQ endpoints." },
       { type: "callout", variant: "info", text: "Use the catalog endpoints (`/v1/delivery/catalog/*`) to discover valid combinations before creating a binding. The catalog lists all available signals, deliverables, serializers, and connectors with their compatibility constraints." },
       {
         type: "endpoint",
@@ -14808,6 +15949,10 @@ var sections36 = [
     description: "Get binding details, update signal filters or field maps, delete a binding, or preview the resolved payload for a binding without sending it.",
     content: [
       { type: "paragraph", text: "Manage a single delivery binding: retrieve its configuration, update the signal filter or field map, delete it, or preview the payload it would produce. Updates re-validate the compatibility triangle. Deleting a binding stops future routing but allows in-flight deliveries to complete." },
+      { type: "paragraph", text: "Use `GET` to inspect the current binding config and `last_status`. Use `PUT` to adjust the signal filter, field map, or retry policy \u2014 changes take effect on the next matching event. Use `DELETE` when the binding is no longer needed; in-flight deliveries already in the job queue will still complete." },
+      { type: "paragraph", text: "The `PUT` response returns the full updated binding. The compatibility triangle is re-validated on every update \u2014 if you change the `signal_filter.event_type` or `serializer_format`, the system verifies the new combination is still valid. The preview endpoint (`POST /preview`) walks the resolve-project-serialize pipeline with a synthetic signal and returns the wire output without delivering." },
+      { type: "paragraph", text: "Pair updates with the delivery history endpoint to verify the binding is producing expected results. If `last_status` shows `failed`, check the DLQ for error details before adjusting the binding config." },
+      { type: "callout", variant: "info", text: "The public API preview endpoint currently returns a stub response. The internal preview endpoint is fully functional and walks the full resolve, project, and serialize pipeline with structural fallback." },
       {
         type: "endpoint",
         method: "GET",
@@ -15016,6 +16161,9 @@ var sections36 = [
     description: "View delivery attempt history with status, HTTP codes, and timing. Get detail for a single item or replay a failed delivery attempt.",
     content: [
       { type: "paragraph", text: "The delivery history tracks every attempt to deliver a payload to a destination. Each attempt is recorded as a **delivery item** with status, timing, HTTP response code, and optional request/response bodies. Use this endpoint to audit delivery performance and debug failures." },
+      { type: "paragraph", text: "Query items by `binding_id` or `destination_id` to narrow results to a specific delivery path. Filter by `status` to find failures (`failed`) or in-progress attempts (`in_flight`). Use `GET /v1/delivery/items/:id` to inspect the full request and response bodies for a single attempt." },
+      { type: "paragraph", text: "Each item includes an `idempotency_key` (deterministic SHA-256 of binding ID and event ID) that is sent on the wire so receivers can deduplicate. The `attempt` field is 1-indexed \u2014 multiple items with the same `event_id` and `binding_id` represent retries of the same delivery. Status values are `in_flight`, `succeeded`, or `failed`." },
+      { type: "paragraph", text: "Use `POST /v1/delivery/items/:id/replay` to re-enqueue a specific attempt with a fresh attempt number but the same idempotency key. For terminal failures, check the DLQ endpoint instead \u2014 items that exhausted all retries are moved there automatically. Pair history inspection with binding and destination detail to diagnose delivery issues end-to-end." },
       { type: "callout", variant: "info", text: "Request and response bodies are truncated to 10 KB and retained for a configurable period (default 30 days). After the retention period, bodies are nulled but metadata (status, HTTP code, duration, error code) is preserved indefinitely." },
       {
         type: "endpoint",
@@ -15674,6 +16822,9 @@ var sections37 = [
     description: "Get detailed information for a single extraction batch including item counts, provider, status, and timing. Shows per-item breakdown when the batch is completed.",
     content: [
       { type: "paragraph", text: "Retrieve the full batch record including per-item status. Poll this endpoint while `status` is `submitted` to track progress. Once `completed`, each item shows its individual outcome and processing timestamp." },
+      { type: "paragraph", text: "Use this endpoint to monitor a batch after submission. Poll periodically while `status` is `submitted` \u2014 typically results arrive within 24 hours. Once `status` changes to `completed`, `failed`, or `cancelled`, polling can stop. Use the sync endpoint to force an immediate provider check instead of waiting for the hourly poll." },
+      { type: "paragraph", text: "The response includes `items` \u2014 an array of per-document results. Each item has a `status` (`pending`, `processing`, `completed`, or `failed`), the associated `document_id` and `document_filename`, and a `processed_at` timestamp. The `custom_id` field shows the provider-assigned identifier used when submitting to Anthropic or Bedrock." },
+      { type: "paragraph", text: "Failed items are automatically retried via **realtime** extraction, never re-batched, to preserve the 48-hour SLA. Check the `errored_count` and `expired_count` fields at the batch level, and individual `items[].error_message` for per-document failure details. Pair with `GET /v1/documents/:id` to check the final extraction status of any document in the batch." },
       { type: "callout", variant: "info", text: "Items that fail extraction in the batch are retried via **realtime** extraction (never re-batched) to preserve the original 48-hour SLA. Check `items[].status` for per-document outcomes." },
       {
         type: "endpoint",
@@ -15772,6 +16923,9 @@ var sections37 = [
     description: "Force a sync with the provider to check for batch results. Useful when you do not want to wait for the hourly automatic poll.",
     content: [
       { type: "paragraph", text: "Force an immediate check with the batch provider (Anthropic or Bedrock) for results. By default, batches are polled automatically every hour. Use this endpoint when you need results sooner or want to verify the current provider-side status." },
+      { type: "paragraph", text: "Call sync when you need results before the next hourly poll. A typical pattern is to submit documents in batch mode, wait a few hours, then call sync to check if results are ready. If the batch is still processing, the response reflects the current provider-side status without changing anything." },
+      { type: "paragraph", text: "The response returns the full batch object with updated counts. If results are ready, `status` transitions to `completed` and `succeeded_count`, `errored_count`, and `expired_count` are populated. If the batch is still processing on the provider side, `status` remains `submitted` and counts stay at zero." },
+      { type: "paragraph", text: "Syncing an `accumulating` batch has no effect since it has not been submitted to the provider yet. Syncing a `completed` or `cancelled` batch is safe but returns the same data. Pair with `GET /v1/batches/:id` to inspect per-item results after the sync completes." },
       {
         type: "endpoint",
         method: "POST",
@@ -15849,6 +17003,9 @@ var sections37 = [
     description: "Cancel an in-progress extraction batch. Only batches in accumulating or submitted status can be cancelled. Completed batches cannot be rolled back.",
     content: [
       { type: "paragraph", text: "Cancel a batch that is still `accumulating` or `submitted`. Cancellation sends a stop request to the provider if the batch was already submitted. Documents in the cancelled batch revert to `batch_queued` status and can be resubmitted or processed via realtime extraction." },
+      { type: "paragraph", text: "Use cancellation when you need to abort a batch \u2014 for example, if documents were submitted with an incorrect schema or you need results faster via realtime extraction. Cancel as early as possible; items already processed by the provider before the cancellation lands may still have their results applied." },
+      { type: "paragraph", text: "The response returns the batch with `status: cancelled`. The `succeeded_count` may be non-zero if some items were processed before cancellation took effect. Documents revert to `batch_queued` status and can be re-processed by updating their `processing_mode` to `realtime` or by including them in a new batch." },
+      { type: "paragraph", text: "Only batches in `accumulating` or `submitted` status can be cancelled \u2014 calling cancel on a `completed`, `failed`, or already `cancelled` batch returns `400`. Pair with `GET /v1/batches/:id` after cancellation to inspect which items were processed before the stop request landed." },
       {
         type: "endpoint",
         method: "POST",
@@ -16017,6 +17174,9 @@ var sections38 = [
     description: "Retrieve a case by its key (e.g. CASE-001) including linked documents, shared entities, AI-generated narration, label, and anomaly count.",
     content: [
       { type: "paragraph", text: "Retrieve the full detail of a case including its documents, AI-generated narrative summary, and anomaly count. The narrative is generated by Claude and summarizes the relationships between documents in the case." },
+      { type: "paragraph", text: "Call this endpoint after listing cases to drill into a specific case. The typical workflow is to list cases with filters, then fetch detail for cases that need review. The response includes the full document list and anomaly count, so you can assess case health in a single call." },
+      { type: "paragraph", text: "The response includes `documents` (array of document objects with `id`, `filename`, `document_type`, and `created_at`), a `narrative` string (or `null` if narration has not been triggered), and `anomaly_count`. The `links` object provides convenience URLs for the case itself and its documents list." },
+      { type: "paragraph", text: "Pair with `POST /v1/cases/:key/narrate` to generate narratives, and `GET /v1/cases/:key/evidence` to inspect the field-level linking data. If `anomaly_count` is non-zero, fetch the anomalies endpoint to see which structural issues were detected." },
       { type: "callout", variant: "info", text: "The `narrative` field is generated on demand via `POST /v1/cases/:key/narrate`. It will be `null` until narration is triggered for this case." },
       {
         type: "endpoint",
@@ -16222,6 +17382,9 @@ var sections38 = [
     description: "List evidence items within a case. Filter by validation status, source document, category, or free-text search across evidence fields.",
     content: [
       { type: "paragraph", text: "Evidence items are the extracted field values from documents in a case, annotated with validation status and confidence scores. Use evidence to audit the data quality within a case and understand which fields link documents together." },
+      { type: "paragraph", text: "Use this endpoint after fetching case detail to inspect the field-level data that forms the case. A typical workflow is to filter by `status=invalid` to surface extraction issues, or by `document_id` to audit a specific document's contribution to the case." },
+      { type: "paragraph", text: "Each evidence item includes a `field_key`, extracted `value`, validation `status` (`valid`, `invalid`, or `pending`), the source `document_id`, an optional `category` (e.g. `identity`, `financial`), and a `confidence` score between 0 and 1. The confidence score reflects extraction certainty and is independent of the validation outcome." },
+      { type: "paragraph", text: "Combine evidence with the anomalies endpoint to get a complete quality picture. Evidence shows individual field values; anomalies show structural patterns across multiple evidence items (e.g. conflicting values for the same field). Use the `search` parameter for free-text queries across all evidence fields." },
       { type: "callout", variant: "info", text: "Evidence is produced by the evidence validation engine, which runs rule-based validators (structural checks, checksum validation, domain packs) against extracted values. Each evidence item records the validation outcome for a specific field on a specific document." },
       {
         type: "endpoint",
@@ -16485,6 +17648,9 @@ var sections38 = [
     description: "Pin or remove documents within a case. Pinned documents are highlighted in the case view and preserved during case operations.",
     content: [
       { type: "paragraph", text: "Manage document membership within a case. **Pin** a document to mark it as important \u2014 pinned documents are highlighted in the UI and preserved during split operations. **Remove** a document to detach it from the case entirely." },
+      { type: "paragraph", text: "Use pinning to flag key documents during case review \u2014 for example, pin the primary invoice in a multi-document case so it stays visible. Use removal when a document was incorrectly linked and should not belong to this case. Both operations are immediate and do not require a recompute." },
+      { type: "paragraph", text: 'Pin returns `{ "success": true }` on success. Remove also returns `{ "success": true }`. Both endpoints return `404` if the case or document is not found. The pin status is reflected in the case detail response from `GET /v1/cases/:key`.' },
+      { type: "paragraph", text: "Pinned documents are preserved in the original partition during split operations \u2014 they always stay with the case they are pinned to. If you plan to split a case, pin the anchor documents first. Removed documents may reappear in the case after a recompute if linking edges still connect them." },
       { type: "callout", variant: "info", text: "Removing a document from a case does not delete the document itself. The document remains in your workspace and may be re-linked into a case during the next recompute cycle if linking edges still exist." },
       {
         type: "endpoint",
@@ -17147,6 +18313,9 @@ var sections40 = [
     description: "List all ground truth datasets used for benchmarking extraction accuracy. Each dataset contains manually verified entries that serve as the gold standard.",
     content: [
       { type: "paragraph", text: "Ground truth datasets contain manually verified data entries that serve as the gold standard for measuring extraction accuracy. Create datasets, add entries, then run benchmarks against extraction results." },
+      { type: "paragraph", text: "Use this endpoint to see all available datasets before creating a benchmark run. A typical workflow is to list datasets, select the one covering the document type you want to evaluate, then pass its `id` to `POST /v1/quality/benchmarks` to start a run." },
+      { type: "paragraph", text: "Each dataset includes a `name`, optional `description`, `user_schema_id` (if scoped to a schema), `document_count` (number of verified entries), and a `links.self` URL for the detail endpoint. Datasets are returned in descending creation order with cursor-based pagination." },
+      { type: "paragraph", text: "Create separate datasets for different document types or schema versions to track accuracy independently. Pair with the benchmark endpoints to measure extraction quality over time \u2014 run benchmarks after schema changes or pipeline updates to detect regressions." },
       { type: "list", ordered: false, items: [
         "Each dataset contains verified entries mapping documents to expected field values",
         "Datasets can be scoped to a specific user schema via `user_schema_id`",
@@ -17241,6 +18410,10 @@ var sections40 = [
     description: "Create a new ground truth dataset linked to a schema. The dataset defines the expected extraction output used for accuracy benchmarking.",
     content: [
       { type: "paragraph", text: "Create an empty ground truth dataset that you can populate with verified entries. Datasets serve as the baseline for benchmark runs that measure extraction accuracy. After creating a dataset, add entries individually or import them in bulk via CSV." },
+      { type: "paragraph", text: "The typical workflow is: create the dataset, then populate it using `POST /v1/quality/ground-truth/:id/entries` for individual entries or `POST /v1/quality/ground-truth/:id/entries/import-csv` for bulk import. Once populated, create a benchmark run with `POST /v1/quality/benchmarks`." },
+      { type: "paragraph", text: "The response returns the dataset with `document_count: 0` since it is initially empty. The `user_schema_id` is `null` unless you associate it with a schema. The `links.self` URL points to the detail endpoint where you can retrieve entries or delete the dataset." },
+      { type: "paragraph", text: "For best results, aim for at least 30-50 entries per dataset. Linking a dataset to a `user_schema_id` ensures ground truth field names align with your extraction schema, producing more meaningful benchmark comparisons." },
+      { type: "callout", variant: "info", text: "Field keys in `expected_data` entries should match the field names used in your extraction schema. Unmatched fields are stored but ignored during benchmark comparison." },
       {
         type: "endpoint",
         method: "POST",
@@ -17315,6 +18488,9 @@ var sections40 = [
     description: "Retrieve a ground truth dataset by ID with metadata and entry count, or delete it permanently. Deleting a dataset does not remove associated benchmark results.",
     content: [
       { type: "paragraph", text: "Retrieve a dataset with its metadata and sample entries, or delete it permanently. The GET response includes a `samples` array with the actual ground truth entries, allowing you to inspect the expected values for each document." },
+      { type: "paragraph", text: "Use `GET` to inspect the dataset contents before running a benchmark. The `samples` array contains all ground truth entries with their `document_id`, `expected_data` (key-value map of verified field values), and optional `notes`. This lets you verify the dataset is correctly populated." },
+      { type: "paragraph", text: "The `document_count` field shows how many entries exist. For large datasets, the `samples` array may produce a sizable response. The `user_schema_id` indicates whether the dataset is scoped to a specific extraction schema, which improves benchmark accuracy by ensuring field name alignment." },
+      { type: "paragraph", text: "Use `DELETE` when a dataset is outdated or no longer needed. Benchmark results that referenced this dataset are preserved for historical tracking \u2014 the benchmark retains the `dataset_id` even after the dataset itself is removed. Create a new dataset with updated entries rather than modifying existing ones." },
       { type: "callout", variant: "warning", text: "Deleting a dataset is permanent. However, benchmark results that used this dataset are retained for historical reference. The benchmark will show the dataset_id but the dataset itself will no longer be retrievable." },
       {
         type: "endpoint",
@@ -17580,6 +18756,9 @@ var sections40 = [
     description: "List benchmark runs that compare extraction results against ground truth datasets. Each run produces per-field accuracy metrics.",
     content: [
       { type: "paragraph", text: "Benchmark runs compare your extraction output against ground truth datasets to produce per-field accuracy scores. Each run evaluates every document in the dataset and produces an `accuracy_overall` score along with per-field breakdowns. Use benchmarks to track extraction quality over time and measure the impact of schema or pipeline changes." },
+      { type: "paragraph", text: "Use this endpoint to see all benchmark runs and their accuracy scores. A typical workflow is to list benchmarks after making schema or pipeline changes, then compare the latest run against previous ones using `GET /v1/quality/benchmarks/compare` to measure improvement or detect regressions." },
+      { type: "paragraph", text: "Each benchmark includes `status` (`queued`, `running`, `completed`, or `failed`), `accuracy_overall` (0-1 score, null while running), `accuracy_by_field` (per-field breakdown), and `documents_processed`/`documents_total` for progress tracking. The `accuracy_delta` and `compared_to_run_id` fields support cross-run comparisons." },
+      { type: "paragraph", text: "Run benchmarks regularly after extraction pipeline changes. Pair with `GET /v1/quality/benchmarks/:id/results` for per-document drill-down showing which fields matched and which diverged. Use the compare endpoint to track accuracy trends across multiple runs." },
       {
         type: "endpoint",
         method: "GET",
@@ -17689,6 +18868,9 @@ var sections40 = [
     description: "Start a benchmark run that compares a job run output against a ground truth dataset. Produces per-field accuracy scores and overall metrics.",
     content: [
       { type: "paragraph", text: "Start a new benchmark run that evaluates your current extraction output against a ground truth dataset. The benchmark compares each document in the dataset entry-by-entry and field-by-field, producing an overall accuracy score and per-field breakdowns." },
+      { type: "paragraph", text: "The typical workflow is: create a benchmark after making extraction pipeline changes, poll `GET /v1/quality/benchmarks/:id` until `status` is `completed`, then inspect results. Run multiple benchmarks against the same dataset over time to track accuracy trends." },
+      { type: "paragraph", text: "The response returns the benchmark with `status: queued`, `accuracy_overall: null`, and `documents_processed: 0`. The `documents_total` field reflects how many entries are in the dataset. Poll the detail endpoint to check `status` and `documents_processed` for progress. Once completed, `accuracy_overall` and `accuracy_by_field` are populated." },
+      { type: "paragraph", text: "Multiple benchmarks can run in parallel against different datasets. Use `GET /v1/quality/benchmarks/compare` after completion to compare two runs side by side. The `dataset_id` is fixed at creation \u2014 to benchmark against a different dataset, create a new run." },
       { type: "callout", variant: "info", text: "Benchmark runs are asynchronous. The endpoint returns immediately with status `queued`. Poll the benchmark detail endpoint or list benchmarks to check when the run completes." },
       {
         type: "endpoint",
@@ -18018,6 +19200,9 @@ var sections41 = [
     description: "Create a new routing rule with conditions on document properties and actions to apply when matched. Conditions can match document type, source, and other metadata.",
     content: [
       { type: "paragraph", text: 'Create a rule that automatically applies actions to incoming documents based on their metadata. Conditions define what to match (e.g. document type equals "invoice"), and actions define what to do (e.g. assign the finance schema). Rules are evaluated on every `document_classified` event.' },
+      { type: "paragraph", text: 'The typical workflow is: create rules ordered by specificity \u2014 put narrow, high-priority rules first (e.g. "contracts from vendor X") and broader catch-all rules last. New rules are active immediately upon creation, so the next classified document will be evaluated against them.' },
+      { type: "paragraph", text: "The response returns the rule with `is_active: true`, a `trigger_type` of `document_classified`, and the assigned `priority` (defaults to 100 if omitted). The `action_type` is resolved from the `actions` object. Use the reorder endpoint after creation to adjust the priority relative to existing rules." },
+      { type: "paragraph", text: "Pair with `GET /v1/routing-rules` to verify the full priority chain after creating a rule. Use `source_connection_id` to scope rules to documents from a specific source \u2014 documents from other sources will skip the rule entirely. To test a rule before going live, create it and immediately disable it via `PATCH` with `is_active: false`." },
       { type: "callout", variant: "info", text: "New rules are created with `is_active: true` by default. If you want to test a rule before activating it, create it, then immediately disable it via `PATCH /v1/routing-rules/:id` with `is_active: false`." },
       {
         type: "endpoint",
@@ -18120,6 +19305,10 @@ var sections41 = [
     description: "Retrieve, update, or delete a routing rule by ID. Update conditions, actions, priority, or enabled state. Deleting a rule does not affect previously routed documents.",
     content: [
       { type: "paragraph", text: "Retrieve, update, or delete a single routing rule. Updates take effect immediately \u2014 the next `document_classified` event will use the updated rule. Deleting a rule does not retroactively affect documents that were already routed by it." },
+      { type: "paragraph", text: "Use `GET` to inspect a rule's conditions, actions, and priority. Use `PATCH` to adjust conditions, change the schema assignment, toggle `is_active`, or update the priority. Use `DELETE` when a rule is no longer needed \u2014 previously routed documents are not affected." },
+      { type: "paragraph", text: "The `PATCH` response returns the full updated rule including the new `updated_at` timestamp. All fields are optional \u2014 only include fields you want to change. The `is_active` toggle lets you temporarily disable a rule without deleting it, which is useful for testing or during maintenance windows." },
+      { type: "paragraph", text: "After updating priority via `PATCH`, use `GET /v1/routing-rules` to verify the full evaluation order. For bulk priority changes, prefer the `POST /v1/routing-rules/reorder` endpoint instead of patching individual rules. Pair deletion with rule creation to replace a rule atomically." },
+      { type: "callout", variant: "info", text: "Rule changes only affect future `document_classified` events. Documents already routed by a previous version of the rule retain their assigned schema and routing actions." },
       {
         type: "endpoint",
         method: "GET",
@@ -18301,6 +19490,9 @@ var sections41 = [
     description: "Reorder routing rules by providing an ordered array of rule IDs. Priority values are reassigned sequentially based on the new order.",
     content: [
       { type: "paragraph", text: "Reassign priority values for all routing rules at once. Pass an ordered array of rule IDs \u2014 the first ID receives priority 1, the second receives priority 2, and so on. This is the recommended way to change evaluation order after initial creation." },
+      { type: "paragraph", text: "Use this endpoint when you need to rearrange the evaluation order of multiple rules at once \u2014 for example, when promoting a new rule to the top of the chain or inserting a rule between two existing ones. This is more reliable than patching individual rule priorities, which can create gaps or collisions." },
+      { type: "paragraph", text: "The response returns a `reordered` array with each rule's `id` and new `priority` value. Priority 1 is evaluated first. The reorder takes effect immediately \u2014 the next `document_classified` event uses the new priority sequence." },
+      { type: "paragraph", text: "List all rules first via `GET /v1/routing-rules` to get the current IDs and order, then construct the reordered array. Include both active and inactive rules in the array to maintain a consistent priority sequence. Omitting any rule ID results in a validation error." },
       { type: "callout", variant: "warning", text: "All active rule IDs must be included in the `rule_ids` array. Omitting any rule returns a validation error. Inactive rules should also be included to maintain a consistent priority sequence." },
       {
         type: "endpoint",
@@ -18806,6 +19998,10 @@ var sections44 = [
     description: "All Talonic API errors return a consistent JSON envelope with a machine-readable code, human-readable message, HTTP status, retryable flag, request ID, and timestamp.",
     content: [
       { type: "paragraph", text: "All errors return a consistent JSON envelope. The `retryable` field tells you whether the request can be retried with the same parameters." },
+      { type: "paragraph", text: "Most integrations parse the `code` field for programmatic error handling and display the `message` field to users. A typical error handler checks `retryable` first \u2014 if `true`, queue the request for retry with exponential backoff; if `false`, surface the `message` to the caller and stop." },
+      { type: "paragraph", text: "The `request_id` field (prefixed with `req_`) uniquely identifies the failed request and is essential for debugging with Talonic support. The `path` field confirms which endpoint produced the error, and `timestamp` records when it occurred in ISO 8601 format." },
+      { type: "paragraph", text: "Pair error handling with the [Error Codes](error-codes) reference to map each `code` value to the correct remediation action. Note that `statusCode` always matches the HTTP response status, so you can use either for branching logic in your client." },
+      { type: "callout", text: "Always log the `request_id` from error responses. When contacting support, include it for faster resolution \u2014 it links directly to the server-side request trace." },
       {
         type: "code",
         title: "Error response envelope",