@talonic/docs 0.20.4 → 0.20.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/content.js CHANGED
@@ -1590,7 +1590,7 @@ var sections5 = [
1590
1590
  },
1591
1591
  {
1592
1592
  type: "callout",
1593
- text: "For the complete JSON Schema specification with all features, see the [Full Schema Reference](/docs#schema-reference) in the API docs."
1593
+ text: "For the complete JSON Schema specification with all features, see the [Full Schema Reference](/docs/platform/schema-features) in the Platform Guide."
1594
1594
  }
1595
1595
  ],
1596
1596
  related: [
package/dist/index.d.ts CHANGED
@@ -156,7 +156,7 @@ declare const PLATFORM_FAQ: {
156
156
  question: string;
157
157
  answer: string;
158
158
  }[];
159
- declare const LLMS_TXT = "# Talonic\n\n> AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.\n\nTalonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.\n\n## How It Works\n\n1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.\n2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).\n3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.\n4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \u00D7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.\n5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.\n\n## Sources & Documents\n\nSupported formats across three processing paths:\n- **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV\n- **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP\n- **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP\n- **Archives**: ZIP (recursive unpack)\n\nUpload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.\n\nEvery document is classified into a canonical type from the 529-type ontology (e.g., \"Employment Contract\", \"Invoice\", \"Bill of Lading\"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get \"Unclassified Document\".\n\nDocument detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.\n\n## Field Registry\n\nThe unified knowledge graph of all canonical fields, growing smarter with every document processed.\n\n- **Tier 1 (Core)**: Universal fields across many document types. Most reliable.\n- **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.\n- **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.\n\nFields with similar meanings cluster automatically via AI embeddings (e.g., \"Vendor Name\", \"Supplier Name\", \"Company Name\" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.\n\n## Schemas\n\nTwo types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).\n\nTemplate workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.\n\nField features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.\n\nVersioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.\n\n## Extraction Jobs (Runs)\n\nA job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.\n\n**4-phase pipeline:**\n1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.\n2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.\n3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.\n4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.\n\nPer-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).\n\n## Cases & Document Linking\n\nRegistry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.\n\nA **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.\n\nCase detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.\n\n## Smart Matching\n\nUpload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.\n\nResults: top 5 candidates per document with confidence scores and per-field evidence breakdown.\n\n## Validation & Quality\n\n- **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.\n- **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.\n- **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.\n\n## Delivery\n\nDestinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).\n\nField mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.\n\nDialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).\n\n## Search & Navigation\n\n**Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.\n\nDocument filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.\n\nKeyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).\n\n## Team & Settings\n\n4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.\n\nUsage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.\n\n## API\n\nBase URL: `https://api.talonic.com`. Auth: `Authorization: Bearer tlnc_...` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.\n\nKey endpoints:\n- POST /v1/extract \u2014 Synchronous/async document extraction (`include_markdown=true` returns OCR text, `processing_mode=batch` for 50% cost)\n- GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text\n- GET /v1/extractions \u2014 Query results and field corrections\n- POST /v1/schemas \u2014 Create/manage extraction schemas\n- GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions\n- POST /v1/sources \u2014 Manage API sources and document ingest\n- POST /v1/webhooks \u2014 Configure webhook endpoints\n- /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results\n- /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map\n- /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize\n- /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger\n- /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns\n- /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)\n- /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log\n- /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents\n- /v1/batches \u2014 Sync with provider, cancel\n- /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review\n- /v1/review \u2014 Assign, stats\n- /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison\n- /v1/reference-data \u2014 JSON upload (POST create)\n\nWebhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.\n\n## Agent\n\nThe embedded AI assistant accessible from any page. Two modes:\n- **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.\n- **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.\n\nDocument upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.\n\n## Documentation\n\n- [API Documentation](https://talonic.com/docs): Complete REST API reference\n- [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide\n- [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification\n";
159
+ declare const LLMS_TXT = "# Talonic\n\n> The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.\n\n## For Agents\n\n- [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol\n- [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration\n- [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response\n- [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls\n- [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests\n- [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve\n- [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url\n\n## Three Modes\n\n- [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document\n- [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define\n- [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents\n\n## API Quickstart\n\n- [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation\n- [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes\n- [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown\n- [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections\n- [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas\n- [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results\n- [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries\n- [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete\n- [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification\n\n## Concepts\n\n- [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions\n- [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces\n- [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains\n- [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read\n- [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories\n- [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning\n\n## SDKs\n\n- [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs\n- [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema\n- [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage\n- [Error Codes](https://talonic.com/docs/errors): Error response format and code reference\n- [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers\n\n## Platform Guide\n\n- [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats\n- [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions\n- [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects\n- [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance\n- [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection\n- [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks\n- [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers\n\n## Document Types\n\n- [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more\n- [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more\n- [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more\n- [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more\n- [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more\n- [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more\n- [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more\n- [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more\n- [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more\n- [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more\n\n## Company\n\n- [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation\n- [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)\n- [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview\n- [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data\n- [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency\n- [Contact](https://talonic.com/contact): Book a demo or request a schema audit\n\n## Optional\n\n- [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance\n- [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison\n- [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison\n";
160
160
  declare const LLMS_FULL_TXT_HEADER = "# Talonic \u2014 Full Documentation\n\n> This file contains the complete Talonic documentation for LLM consumption.\n> For a summary, see llms.txt.\n\n";
161
161
 
162
162
  export { API_FAQ, API_NAV_SECTIONS, API_SECTION_META, ApiReference, Callout, CellDot, CodeBlock, DownloadMarkdown, EndpointBlock, type HttpMethod, InlineCode, LLMS_FULL_TXT_HEADER, LLMS_TXT, type LinkComponent, MCP_FAQ, MCP_NAV_SECTIONS, MCP_SECTION_META, MethodBadge, MockNavItem, type NavSection, P, PLATFORM_FAQ, PLATFORM_NAV_SECTIONS, PLATFORM_SECTION_META, type Param, ParamTable, PipelineConnector, PipelineStage, PlatformGuide, SDK_FAQ, SDK_NAV_SECTIONS, SDK_SECTION_META, SectionHeading, type SectionMeta, Sidebar, SubHeading, TierBadge, UiExcerpt, highlightJson };
package/dist/index.js CHANGED
@@ -5615,8 +5615,8 @@ function PlatformGuide({ LinkComponent }) {
5615
5615
  /* @__PURE__ */ jsxs6(Callout, { children: [
5616
5616
  "For the complete JSON Schema specification with all features, see the",
5617
5617
  " ",
5618
- /* @__PURE__ */ jsx6(LinkComp, { href: "/docs#schema-reference", children: "Full Schema Reference" }),
5619
- " in the API docs. A downloadable template is available in the",
5618
+ /* @__PURE__ */ jsx6(LinkComp, { href: "/docs/platform/schema-features", children: "Full Schema Reference" }),
5619
+ " in the Platform Guide. A downloadable template is available in the",
5620
5620
  " ",
5621
5621
  /* @__PURE__ */ jsx6(LinkComp, { href: "https://github.com/talonicdev/platform/blob/main/packages/docs/src/schema-template.json", children: "repository" }),
5622
5622
  "."
@@ -6843,147 +6843,90 @@ var PLATFORM_FAQ = [
6843
6843
  ];
6844
6844
  var LLMS_TXT = `# Talonic
6845
6845
 
6846
- > AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.
6846
+ > The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.
6847
6847
 
6848
- Talonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.
6848
+ ## For Agents
6849
6849
 
6850
- ## How It Works
6850
+ - [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol
6851
+ - [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration
6852
+ - [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response
6853
+ - [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls
6854
+ - [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests
6855
+ - [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve
6856
+ - [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url
6851
6857
 
6852
- 1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.
6853
- 2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).
6854
- 3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.
6855
- 4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \xD7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.
6856
- 5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.
6858
+ ## Three Modes
6857
6859
 
6858
- ## Sources & Documents
6860
+ - [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document
6861
+ - [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define
6862
+ - [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents
6859
6863
 
6860
- Supported formats across three processing paths:
6861
- - **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV
6862
- - **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP
6863
- - **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP
6864
- - **Archives**: ZIP (recursive unpack)
6864
+ ## API Quickstart
6865
6865
 
6866
- Upload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.
6866
+ - [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation
6867
+ - [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes
6868
+ - [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown
6869
+ - [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections
6870
+ - [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas
6871
+ - [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results
6872
+ - [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries
6873
+ - [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete
6874
+ - [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification
6867
6875
 
6868
- Every document is classified into a canonical type from the 529-type ontology (e.g., "Employment Contract", "Invoice", "Bill of Lading"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get "Unclassified Document".
6876
+ ## Concepts
6869
6877
 
6870
- Document detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.
6878
+ - [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions
6879
+ - [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces
6880
+ - [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains
6881
+ - [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read
6882
+ - [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories
6883
+ - [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning
6871
6884
 
6872
- ## Field Registry
6885
+ ## SDKs
6873
6886
 
6874
- The unified knowledge graph of all canonical fields, growing smarter with every document processed.
6887
+ - [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs
6888
+ - [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema
6889
+ - [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage
6890
+ - [Error Codes](https://talonic.com/docs/errors): Error response format and code reference
6891
+ - [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers
6875
6892
 
6876
- - **Tier 1 (Core)**: Universal fields across many document types. Most reliable.
6877
- - **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.
6878
- - **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.
6893
+ ## Platform Guide
6879
6894
 
6880
- Fields with similar meanings cluster automatically via AI embeddings (e.g., "Vendor Name", "Supplier Name", "Company Name" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.
6895
+ - [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats
6896
+ - [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions
6897
+ - [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects
6898
+ - [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance
6899
+ - [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection
6900
+ - [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks
6901
+ - [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers
6881
6902
 
6882
- ## Schemas
6903
+ ## Document Types
6883
6904
 
6884
- Two types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).
6905
+ - [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more
6906
+ - [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more
6907
+ - [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more
6908
+ - [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more
6909
+ - [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more
6910
+ - [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more
6911
+ - [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more
6912
+ - [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more
6913
+ - [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more
6914
+ - [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more
6885
6915
 
6886
- Template workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.
6916
+ ## Company
6887
6917
 
6888
- Field features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.
6918
+ - [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation
6919
+ - [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)
6920
+ - [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview
6921
+ - [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data
6922
+ - [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency
6923
+ - [Contact](https://talonic.com/contact): Book a demo or request a schema audit
6889
6924
 
6890
- Versioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.
6925
+ ## Optional
6891
6926
 
6892
- ## Extraction Jobs (Runs)
6893
-
6894
- A job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.
6895
-
6896
- **4-phase pipeline:**
6897
- 1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.
6898
- 2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.
6899
- 3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.
6900
- 4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.
6901
-
6902
- Per-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).
6903
-
6904
- ## Cases & Document Linking
6905
-
6906
- Registry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.
6907
-
6908
- A **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.
6909
-
6910
- Case detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.
6911
-
6912
- ## Smart Matching
6913
-
6914
- Upload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.
6915
-
6916
- Results: top 5 candidates per document with confidence scores and per-field evidence breakdown.
6917
-
6918
- ## Validation & Quality
6919
-
6920
- - **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.
6921
- - **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.
6922
- - **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.
6923
-
6924
- ## Delivery
6925
-
6926
- Destinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).
6927
-
6928
- Field mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.
6929
-
6930
- Dialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).
6931
-
6932
- ## Search & Navigation
6933
-
6934
- **Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.
6935
-
6936
- Document filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.
6937
-
6938
- Keyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).
6939
-
6940
- ## Team & Settings
6941
-
6942
- 4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.
6943
-
6944
- Usage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.
6945
-
6946
- ## API
6947
-
6948
- Base URL: \`https://api.talonic.com\`. Auth: \`Authorization: Bearer tlnc_...\` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.
6949
-
6950
- Key endpoints:
6951
- - POST /v1/extract \u2014 Synchronous/async document extraction (\`include_markdown=true\` returns OCR text, \`processing_mode=batch\` for 50% cost)
6952
- - GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text
6953
- - GET /v1/extractions \u2014 Query results and field corrections
6954
- - POST /v1/schemas \u2014 Create/manage extraction schemas
6955
- - GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions
6956
- - POST /v1/sources \u2014 Manage API sources and document ingest
6957
- - POST /v1/webhooks \u2014 Configure webhook endpoints
6958
- - /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results
6959
- - /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map
6960
- - /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize
6961
- - /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger
6962
- - /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns
6963
- - /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)
6964
- - /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log
6965
- - /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents
6966
- - /v1/batches \u2014 Sync with provider, cancel
6967
- - /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review
6968
- - /v1/review \u2014 Assign, stats
6969
- - /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison
6970
- - /v1/reference-data \u2014 JSON upload (POST create)
6971
-
6972
- Webhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.
6973
-
6974
- ## Agent
6975
-
6976
- The embedded AI assistant accessible from any page. Two modes:
6977
- - **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.
6978
- - **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.
6979
-
6980
- Document upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.
6981
-
6982
- ## Documentation
6983
-
6984
- - [API Documentation](https://talonic.com/docs): Complete REST API reference
6985
- - [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide
6986
- - [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification
6927
+ - [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance
6928
+ - [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison
6929
+ - [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison
6987
6930
  `;
6988
6931
  var LLMS_FULL_TXT_HEADER = `# Talonic \u2014 Full Documentation
6989
6932