npm - @talonic/docs - Versions diffs - 0.20.5 → 0.20.6 - Mend

@talonic/docs 0.20.5 → 0.20.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/dist/index.d.ts +1 -1
package/dist/index.js +66 -123
package/dist/index.js.map +1 -1
package/dist/seo.d.ts +1 -1
package/dist/seo.js +66 -123
package/package.json +1 -1
package/dist/tailwind-preset.d.cts +0 -45
package/dist/tailwind-preset.d.ts +0 -45

package/dist/index.d.ts CHANGED Viewed

@@ -156,7 +156,7 @@ declare const PLATFORM_FAQ: {
     question: string;
     answer: string;
 }[];
-declare const LLMS_TXT = "# Talonic\n\n> AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.\n\nTalonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.\n\n## How It Works\n\n1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.\n2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).\n3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.\n4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \u00D7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.\n5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.\n\n## Sources & Documents\n\nSupported formats across three processing paths:\n- **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV\n- **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP\n- **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP\n- **Archives**: ZIP (recursive unpack)\n\nUpload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.\n\nEvery document is classified into a canonical type from the 529-type ontology (e.g., \"Employment Contract\", \"Invoice\", \"Bill of Lading\"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get \"Unclassified Document\".\n\nDocument detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.\n\n## Field Registry\n\nThe unified knowledge graph of all canonical fields, growing smarter with every document processed.\n\n- **Tier 1 (Core)**: Universal fields across many document types. Most reliable.\n- **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.\n- **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.\n\nFields with similar meanings cluster automatically via AI embeddings (e.g., \"Vendor Name\", \"Supplier Name\", \"Company Name\" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.\n\n## Schemas\n\nTwo types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).\n\nTemplate workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.\n\nField features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.\n\nVersioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.\n\n## Extraction Jobs (Runs)\n\nA job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.\n\n**4-phase pipeline:**\n1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.\n2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.\n3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.\n4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.\n\nPer-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).\n\n## Cases & Document Linking\n\nRegistry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.\n\nA **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.\n\nCase detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.\n\n## Smart Matching\n\nUpload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.\n\nResults: top 5 candidates per document with confidence scores and per-field evidence breakdown.\n\n## Validation & Quality\n\n- **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.\n- **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.\n- **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.\n\n## Delivery\n\nDestinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).\n\nField mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.\n\nDialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).\n\n## Search & Navigation\n\n**Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.\n\nDocument filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.\n\nKeyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).\n\n## Team & Settings\n\n4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.\n\nUsage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.\n\n## API\n\nBase URL: `https://api.talonic.com`. Auth: `Authorization: Bearer tlnc_...` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.\n\nKey endpoints:\n- POST /v1/extract \u2014 Synchronous/async document extraction (`include_markdown=true` returns OCR text, `processing_mode=batch` for 50% cost)\n- GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text\n- GET /v1/extractions \u2014 Query results and field corrections\n- POST /v1/schemas \u2014 Create/manage extraction schemas\n- GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions\n- POST /v1/sources \u2014 Manage API sources and document ingest\n- POST /v1/webhooks \u2014 Configure webhook endpoints\n- /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results\n- /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map\n- /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize\n- /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger\n- /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns\n- /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)\n- /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log\n- /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents\n- /v1/batches \u2014 Sync with provider, cancel\n- /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review\n- /v1/review \u2014 Assign, stats\n- /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison\n- /v1/reference-data \u2014 JSON upload (POST create)\n\nWebhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.\n\n## Agent\n\nThe embedded AI assistant accessible from any page. Two modes:\n- **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.\n- **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.\n\nDocument upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.\n\n## Documentation\n\n- [API Documentation](https://talonic.com/docs): Complete REST API reference\n- [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide\n- [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification\n";
+declare const LLMS_TXT = "# Talonic\n\n> The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.\n\n## For Agents\n\n- [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol\n- [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration\n- [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response\n- [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls\n- [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests\n- [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve\n- [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url\n\n## Three Modes\n\n- [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document\n- [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define\n- [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents\n\n## API Quickstart\n\n- [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation\n- [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes\n- [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown\n- [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections\n- [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas\n- [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results\n- [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries\n- [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete\n- [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification\n\n## Concepts\n\n- [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions\n- [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces\n- [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains\n- [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read\n- [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories\n- [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning\n\n## SDKs\n\n- [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs\n- [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema\n- [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage\n- [Error Codes](https://talonic.com/docs/errors): Error response format and code reference\n- [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers\n\n## Platform Guide\n\n- [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats\n- [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions\n- [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects\n- [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance\n- [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection\n- [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks\n- [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers\n\n## Document Types\n\n- [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more\n- [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more\n- [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more\n- [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more\n- [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more\n- [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more\n- [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more\n- [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more\n- [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more\n- [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more\n\n## Company\n\n- [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation\n- [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)\n- [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview\n- [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data\n- [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency\n- [Contact](https://talonic.com/contact): Book a demo or request a schema audit\n\n## Optional\n\n- [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance\n- [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison\n- [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison\n";
 declare const LLMS_FULL_TXT_HEADER = "# Talonic \u2014 Full Documentation\n\n> This file contains the complete Talonic documentation for LLM consumption.\n> For a summary, see llms.txt.\n\n";
 export { API_FAQ, API_NAV_SECTIONS, API_SECTION_META, ApiReference, Callout, CellDot, CodeBlock, DownloadMarkdown, EndpointBlock, type HttpMethod, InlineCode, LLMS_FULL_TXT_HEADER, LLMS_TXT, type LinkComponent, MCP_FAQ, MCP_NAV_SECTIONS, MCP_SECTION_META, MethodBadge, MockNavItem, type NavSection, P, PLATFORM_FAQ, PLATFORM_NAV_SECTIONS, PLATFORM_SECTION_META, type Param, ParamTable, PipelineConnector, PipelineStage, PlatformGuide, SDK_FAQ, SDK_NAV_SECTIONS, SDK_SECTION_META, SectionHeading, type SectionMeta, Sidebar, SubHeading, TierBadge, UiExcerpt, highlightJson };

package/dist/index.js CHANGED Viewed

@@ -6843,147 +6843,90 @@ var PLATFORM_FAQ = [
 ];
 var LLMS_TXT = `# Talonic
-> AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.
+> The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.
-Talonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.
+## For Agents
-## How It Works
+- [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol
+- [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration
+- [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response
+- [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls
+- [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests
+- [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve
+- [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url
-1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.
-2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).
-3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.
-4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \xD7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.
-5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.
+## Three Modes
-## Sources & Documents
+- [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document
+- [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define
+- [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents
-Supported formats across three processing paths:
-- **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV
-- **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP
-- **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP
-- **Archives**: ZIP (recursive unpack)
+## API Quickstart
-Upload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.
+- [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation
+- [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes
+- [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown
+- [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections
+- [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas
+- [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results
+- [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries
+- [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete
+- [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification
-Every document is classified into a canonical type from the 529-type ontology (e.g., "Employment Contract", "Invoice", "Bill of Lading"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get "Unclassified Document".
+## Concepts
-Document detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.
+- [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions
+- [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces
+- [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains
+- [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read
+- [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories
+- [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning
-## Field Registry
+## SDKs
-The unified knowledge graph of all canonical fields, growing smarter with every document processed.
+- [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs
+- [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema
+- [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage
+- [Error Codes](https://talonic.com/docs/errors): Error response format and code reference
+- [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers
-- **Tier 1 (Core)**: Universal fields across many document types. Most reliable.
-- **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.
-- **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.
+## Platform Guide
-Fields with similar meanings cluster automatically via AI embeddings (e.g., "Vendor Name", "Supplier Name", "Company Name" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.
+- [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats
+- [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions
+- [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects
+- [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance
+- [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection
+- [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks
+- [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers
-## Schemas
+## Document Types
-Two types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).
+- [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more
+- [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more
+- [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more
+- [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more
+- [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more
+- [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more
+- [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more
+- [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more
+- [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more
+- [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more
-Template workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.
+## Company
-Field features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.
+- [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation
+- [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)
+- [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview
+- [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data
+- [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency
+- [Contact](https://talonic.com/contact): Book a demo or request a schema audit
-Versioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.
+## Optional
-## Extraction Jobs (Runs)
-A job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.
-**4-phase pipeline:**
-1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.
-2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.
-3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.
-4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.
-Per-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).
-## Cases & Document Linking
-Registry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.
-A **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.
-Case detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.
-## Smart Matching
-Upload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.
-Results: top 5 candidates per document with confidence scores and per-field evidence breakdown.
-## Validation & Quality
-- **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.
-- **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.
-- **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.
-## Delivery
-Destinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).
-Field mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.
-Dialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).
-## Search & Navigation
-**Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.
-Document filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.
-Keyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).
-## Team & Settings
-4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.
-Usage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.
-## API
-Base URL: \`https://api.talonic.com\`. Auth: \`Authorization: Bearer tlnc_...\` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.
-Key endpoints:
-- POST /v1/extract \u2014 Synchronous/async document extraction (\`include_markdown=true\` returns OCR text, \`processing_mode=batch\` for 50% cost)
-- GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text
-- GET /v1/extractions \u2014 Query results and field corrections
-- POST /v1/schemas \u2014 Create/manage extraction schemas
-- GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions
-- POST /v1/sources \u2014 Manage API sources and document ingest
-- POST /v1/webhooks \u2014 Configure webhook endpoints
-- /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results
-- /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map
-- /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize
-- /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger
-- /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns
-- /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)
-- /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log
-- /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents
-- /v1/batches \u2014 Sync with provider, cancel
-- /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review
-- /v1/review \u2014 Assign, stats
-- /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison
-- /v1/reference-data \u2014 JSON upload (POST create)
-Webhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.
-## Agent
-The embedded AI assistant accessible from any page. Two modes:
-- **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.
-- **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.
-Document upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.
-## Documentation
-- [API Documentation](https://talonic.com/docs): Complete REST API reference
-- [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide
-- [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification
+- [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance
+- [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison
+- [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison
 `;
 var LLMS_FULL_TXT_HEADER = `# Talonic \u2014 Full Documentation