@talonic/docs 0.20.4 → 0.20.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/seo.d.ts CHANGED
@@ -42,7 +42,7 @@ declare const PLATFORM_FAQ: {
42
42
  question: string;
43
43
  answer: string;
44
44
  }[];
45
- declare const LLMS_TXT = "# Talonic\n\n> AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.\n\nTalonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.\n\n## How It Works\n\n1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.\n2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).\n3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.\n4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \u00D7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.\n5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.\n\n## Sources & Documents\n\nSupported formats across three processing paths:\n- **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV\n- **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP\n- **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP\n- **Archives**: ZIP (recursive unpack)\n\nUpload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.\n\nEvery document is classified into a canonical type from the 529-type ontology (e.g., \"Employment Contract\", \"Invoice\", \"Bill of Lading\"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get \"Unclassified Document\".\n\nDocument detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.\n\n## Field Registry\n\nThe unified knowledge graph of all canonical fields, growing smarter with every document processed.\n\n- **Tier 1 (Core)**: Universal fields across many document types. Most reliable.\n- **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.\n- **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.\n\nFields with similar meanings cluster automatically via AI embeddings (e.g., \"Vendor Name\", \"Supplier Name\", \"Company Name\" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.\n\n## Schemas\n\nTwo types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).\n\nTemplate workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.\n\nField features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.\n\nVersioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.\n\n## Extraction Jobs (Runs)\n\nA job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.\n\n**4-phase pipeline:**\n1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.\n2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.\n3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.\n4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.\n\nPer-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).\n\n## Cases & Document Linking\n\nRegistry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.\n\nA **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.\n\nCase detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.\n\n## Smart Matching\n\nUpload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.\n\nResults: top 5 candidates per document with confidence scores and per-field evidence breakdown.\n\n## Validation & Quality\n\n- **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.\n- **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.\n- **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.\n\n## Delivery\n\nDestinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).\n\nField mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.\n\nDialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).\n\n## Search & Navigation\n\n**Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.\n\nDocument filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.\n\nKeyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).\n\n## Team & Settings\n\n4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.\n\nUsage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.\n\n## API\n\nBase URL: `https://api.talonic.com`. Auth: `Authorization: Bearer tlnc_...` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.\n\nKey endpoints:\n- POST /v1/extract \u2014 Synchronous/async document extraction (`include_markdown=true` returns OCR text, `processing_mode=batch` for 50% cost)\n- GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text\n- GET /v1/extractions \u2014 Query results and field corrections\n- POST /v1/schemas \u2014 Create/manage extraction schemas\n- GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions\n- POST /v1/sources \u2014 Manage API sources and document ingest\n- POST /v1/webhooks \u2014 Configure webhook endpoints\n- /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results\n- /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map\n- /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize\n- /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger\n- /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns\n- /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)\n- /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log\n- /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents\n- /v1/batches \u2014 Sync with provider, cancel\n- /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review\n- /v1/review \u2014 Assign, stats\n- /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison\n- /v1/reference-data \u2014 JSON upload (POST create)\n\nWebhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.\n\n## Agent\n\nThe embedded AI assistant accessible from any page. Two modes:\n- **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.\n- **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.\n\nDocument upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.\n\n## Documentation\n\n- [API Documentation](https://talonic.com/docs): Complete REST API reference\n- [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide\n- [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification\n";
45
+ declare const LLMS_TXT = "# Talonic\n\n> The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.\n\n## For Agents\n\n- [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol\n- [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration\n- [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response\n- [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls\n- [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests\n- [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve\n- [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url\n\n## Three Modes\n\n- [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document\n- [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define\n- [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents\n\n## API Quickstart\n\n- [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation\n- [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes\n- [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown\n- [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections\n- [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas\n- [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results\n- [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries\n- [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete\n- [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification\n\n## Concepts\n\n- [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions\n- [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces\n- [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains\n- [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read\n- [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories\n- [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning\n\n## SDKs\n\n- [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs\n- [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema\n- [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage\n- [Error Codes](https://talonic.com/docs/errors): Error response format and code reference\n- [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers\n\n## Platform Guide\n\n- [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats\n- [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions\n- [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects\n- [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance\n- [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection\n- [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks\n- [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers\n\n## Document Types\n\n- [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more\n- [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more\n- [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more\n- [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more\n- [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more\n- [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more\n- [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more\n- [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more\n- [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more\n- [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more\n\n## Company\n\n- [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation\n- [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)\n- [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview\n- [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data\n- [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency\n- [Contact](https://talonic.com/contact): Book a demo or request a schema audit\n\n## Optional\n\n- [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance\n- [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison\n- [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison\n";
46
46
  declare const LLMS_FULL_TXT_HEADER = "# Talonic \u2014 Full Documentation\n\n> This file contains the complete Talonic documentation for LLM consumption.\n> For a summary, see llms.txt.\n\n";
47
47
 
48
48
  export { API_FAQ, API_NAV_SECTIONS, API_SECTION_META, LLMS_FULL_TXT_HEADER, LLMS_TXT, MCP_FAQ, MCP_NAV_SECTIONS, MCP_SECTION_META, OPENAPI_SPEC, PLATFORM_FAQ, PLATFORM_NAV_SECTIONS, PLATFORM_SECTION_META, SDK_FAQ, SDK_NAV_SECTIONS, SDK_SECTION_META, type SectionMeta };
package/dist/seo.js CHANGED
@@ -17245,147 +17245,90 @@ var PLATFORM_FAQ = [
17245
17245
  ];
17246
17246
  var LLMS_TXT = `# Talonic
17247
17247
 
17248
- > AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.
17248
+ > The data registry for unstructured documents. Agents extract once, query forever. 529 document types, 25+ file formats, per-cell provenance. Co-author of DIN SPEC 91491.
17249
17249
 
17250
- Talonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.
17250
+ ## For Agents
17251
17251
 
17252
- ## How It Works
17252
+ - [MCP Server](https://mcp.talonic.com/mcp): Hosted MCP endpoint \u2014 zero install, native Model Context Protocol
17253
+ - [MCP Setup Guide](https://talonic.com/docs/mcp): Claude Desktop, Cursor, Cline, Continue, Cowork configuration
17254
+ - [Cost Headers](https://talonic.com/developers#billing): X-Talonic-Cost-Credits and X-Talonic-Balance-Credits on every response
17255
+ - [Credit Balance & Runway](https://talonic.com/docs/api/credits): GET /v1/credits/balance \u2014 check remaining budget before calls
17256
+ - [Idempotency-Key](https://talonic.com/docs/api#idempotency): Safe retries \u2014 pass Idempotency-Key header to deduplicate requests
17257
+ - [Auto Top-Up](https://talonic.com/developers#billing): Human-gated credit replenishment \u2014 agents can check, humans approve
17258
+ - [Sync/Async Contract](https://talonic.com/docs/api/extract): \u22645 pages \u2192 200 sync response; larger \u2192 202 with poll_url
17253
17259
 
17254
- 1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.
17255
- 2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).
17256
- 3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.
17257
- 4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \xD7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.
17258
- 5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.
17260
+ ## Three Modes
17259
17261
 
17260
- ## Sources & Documents
17262
+ - [Mode 1 \u2014 Extract Everything](https://talonic.com/developers#trio): POST /v1/extract with no schema \u2014 discover all fields in any document
17263
+ - [Mode 2 \u2014 Extract a Shape](https://talonic.com/developers#trio): POST /v1/extract with a schema \u2014 get exactly the fields you define
17264
+ - [Mode 3 \u2014 Query Without Re-extracting](https://talonic.com/developers#trio): POST /v1/documents/filter \u2014 query the Field Registry across previously extracted documents
17261
17265
 
17262
- Supported formats across three processing paths:
17263
- - **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV
17264
- - **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP
17265
- - **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP
17266
- - **Archives**: ZIP (recursive unpack)
17266
+ ## API Quickstart
17267
17267
 
17268
- Upload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.
17268
+ - [Getting Started](https://talonic.com/docs/getting-started): Authentication, first extraction, schema creation
17269
+ - [POST /v1/extract](https://talonic.com/docs/api/extract): Primary extraction endpoint \u2014 sync, async, and batch modes
17270
+ - [Documents](https://talonic.com/docs/api/documents): List, retrieve, delete documents and get markdown
17271
+ - [Extractions](https://talonic.com/docs/api/extractions): Query extraction results and submit corrections
17272
+ - [Schemas](https://talonic.com/docs/api/schemas): Create, update, list, and delete extraction schemas
17273
+ - [Jobs](https://talonic.com/docs/api/jobs): Track async extraction jobs and results
17274
+ - [Webhooks](https://talonic.com/docs/api/webhooks): Event-driven flow \u2014 HMAC-SHA256 signed, exponential backoff retries
17275
+ - [Search & Filter](https://talonic.com/docs/api/search): Omnisearch, document filtering, field autocomplete
17276
+ - [OpenAPI 3.1 Spec](https://talonic.com/openapi.json): Machine-readable API specification
17269
17277
 
17270
- Every document is classified into a canonical type from the 529-type ontology (e.g., "Employment Contract", "Invoice", "Bill of Lading"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get "Unclassified Document".
17278
+ ## Concepts
17271
17279
 
17272
- Document detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.
17280
+ - [Field Registry](https://talonic.com/docs/concepts/field-registry): Unified knowledge graph of all extracted fields \u2014 tiers, clusters, master instructions
17281
+ - [Provenance & Confidence](https://talonic.com/docs/concepts/provenance): Per-cell confidence scores (0.0\u20131.0), resolution types, reasoning traces
17282
+ - [Cases & Document Linking](https://talonic.com/docs/concepts/cases): Entity matching, case formation, evidence chains
17283
+ - [4-Phase Pipeline](https://talonic.com/docs/concepts/pipeline): Resolve \u2192 Agent \u2192 Validation \u2192 Re-read
17284
+ - [Document Ontology](https://talonic.com/docs/concepts/ontology): 529-type classification across 10 categories
17285
+ - [Schemas](https://talonic.com/docs/concepts/schemas): Generated schemas vs. user templates, field matching, versioning
17273
17286
 
17274
- ## Field Registry
17287
+ ## SDKs
17275
17288
 
17276
- The unified knowledge graph of all canonical fields, growing smarter with every document processed.
17289
+ - [Node SDK](https://talonic.com/docs/sdk): Official Node.js/TypeScript SDK \u2014 extract, documents, schemas, jobs
17290
+ - [MCP Server Docs](https://talonic.com/docs/mcp): 7 tools \u2014 extract, search, filter, get_document, to_markdown, list_schemas, save_schema
17291
+ - [Authentication](https://talonic.com/docs/authentication): API key creation, scopes, Bearer token usage
17292
+ - [Error Codes](https://talonic.com/docs/errors): Error response format and code reference
17293
+ - [Rate Limits](https://talonic.com/docs/rate-limits): Rate limit tiers by plan and headers
17277
17294
 
17278
- - **Tier 1 (Core)**: Universal fields across many document types. Most reliable.
17279
- - **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.
17280
- - **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.
17295
+ ## Platform Guide
17281
17296
 
17282
- Fields with similar meanings cluster automatically via AI embeddings (e.g., "Vendor Name", "Supplier Name", "Company Name" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.
17297
+ - [Platform Overview](https://talonic.com/docs/platform): Core concepts, platform flow, supported formats
17298
+ - [Field Intelligence](https://talonic.com/docs/platform/tier-system): Tier system, semantic clusters, master instructions
17299
+ - [Schemas & Templates](https://talonic.com/docs/platform/generated-schemas): Schema creation, field matching, versioning, dialects
17300
+ - [Extraction Jobs](https://talonic.com/docs/platform/creating-job): Creating jobs, reviewing results, confidence and provenance
17301
+ - [Linking & Cases](https://talonic.com/docs/platform/entity-linking): Entity linking, link keys, cases, anomaly detection
17302
+ - [Validation & Quality](https://talonic.com/docs/platform/validation-checks): Golden samples, approval gates, ground truth benchmarks
17303
+ - [Delivery](https://talonic.com/docs/platform/destinations): Webhooks, REST, SFTP, email, S3/R2, field mappings, triggers
17283
17304
 
17284
- ## Schemas
17305
+ ## Document Types
17285
17306
 
17286
- Two types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).
17307
+ - [Financial & Tax](https://talonic.com/document-types/financial-tax): Schedule K-1, VAT Returns, Balance Sheets, SWIFT MT103, and 49 more
17308
+ - [Procurement & Invoicing](https://talonic.com/document-types/procurement-invoicing): Purchase Orders, Commercial Invoices, RFPs, and 50 more
17309
+ - [Trade & Logistics](https://talonic.com/document-types/trade-logistics): Bills of Lading, Air Waybills, Customs Declarations, and 50 more
17310
+ - [Legal & Contracts](https://talonic.com/document-types/legal-contracts): NDAs, MSAs, Loan Agreements, DPAs, and 48 more
17311
+ - [Corporate & Governance](https://talonic.com/document-types/corporate-governance): KYC Packages, Board Resolutions, ESG Reports, and 50 more
17312
+ - [Healthcare & Life Sciences](https://talonic.com/document-types/healthcare-life-sciences): Discharge Summaries, Clinical Trials, 510(k), and 50 more
17313
+ - [Manufacturing & Quality](https://talonic.com/document-types/manufacturing-quality): Certificates of Analysis, FMEA, BOM, PPAP, and 49 more
17314
+ - [Insurance & Claims](https://talonic.com/document-types/insurance-claims): FNOL Reports, Policy Declarations, Reinsurance, and 50 more
17315
+ - [Real Estate & Construction](https://talonic.com/document-types/real-estate-construction): Property Deeds, AIA G702, Lien Waivers, and 50 more
17316
+ - [HR & Employee Records](https://talonic.com/document-types/hr-employee-records): I-9 Forms, FMLA Certifications, Payroll Registers, and 49 more
17287
17317
 
17288
- Template workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.
17318
+ ## Company
17289
17319
 
17290
- Field features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.
17320
+ - [Product](https://talonic.com/product): Full platform walkthrough for technical evaluation
17321
+ - [Pricing](https://talonic.com/pricing): Free (5,000 credits/mo), Pro (\u20AC49/mo), Enterprise (custom)
17322
+ - [Developers](https://talonic.com/developers): Three Modes, API, SDK, MCP server overview
17323
+ - [DIN SPEC 91491](https://talonic.com/din-91491): Europe's first standard for AI-ready data
17324
+ - [Security](https://talonic.com/security): GDPR, HIPAA, ISO 27001, ISO 42001, EU data residency
17325
+ - [Contact](https://talonic.com/contact): Book a demo or request a schema audit
17291
17326
 
17292
- Versioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.
17327
+ ## Optional
17293
17328
 
17294
- ## Extraction Jobs (Runs)
17295
-
17296
- A job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.
17297
-
17298
- **4-phase pipeline:**
17299
- 1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.
17300
- 2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.
17301
- 3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.
17302
- 4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.
17303
-
17304
- Per-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).
17305
-
17306
- ## Cases & Document Linking
17307
-
17308
- Registry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.
17309
-
17310
- A **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.
17311
-
17312
- Case detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.
17313
-
17314
- ## Smart Matching
17315
-
17316
- Upload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.
17317
-
17318
- Results: top 5 candidates per document with confidence scores and per-field evidence breakdown.
17319
-
17320
- ## Validation & Quality
17321
-
17322
- - **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.
17323
- - **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.
17324
- - **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.
17325
-
17326
- ## Delivery
17327
-
17328
- Destinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).
17329
-
17330
- Field mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.
17331
-
17332
- Dialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).
17333
-
17334
- ## Search & Navigation
17335
-
17336
- **Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.
17337
-
17338
- Document filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.
17339
-
17340
- Keyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).
17341
-
17342
- ## Team & Settings
17343
-
17344
- 4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.
17345
-
17346
- Usage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.
17347
-
17348
- ## API
17349
-
17350
- Base URL: \`https://api.talonic.com\`. Auth: \`Authorization: Bearer tlnc_...\` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.
17351
-
17352
- Key endpoints:
17353
- - POST /v1/extract \u2014 Synchronous/async document extraction (\`include_markdown=true\` returns OCR text, \`processing_mode=batch\` for 50% cost)
17354
- - GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text
17355
- - GET /v1/extractions \u2014 Query results and field corrections
17356
- - POST /v1/schemas \u2014 Create/manage extraction schemas
17357
- - GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions
17358
- - POST /v1/sources \u2014 Manage API sources and document ingest
17359
- - POST /v1/webhooks \u2014 Configure webhook endpoints
17360
- - /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results
17361
- - /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map
17362
- - /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize
17363
- - /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger
17364
- - /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns
17365
- - /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)
17366
- - /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log
17367
- - /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents
17368
- - /v1/batches \u2014 Sync with provider, cancel
17369
- - /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review
17370
- - /v1/review \u2014 Assign, stats
17371
- - /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison
17372
- - /v1/reference-data \u2014 JSON upload (POST create)
17373
-
17374
- Webhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.
17375
-
17376
- ## Agent
17377
-
17378
- The embedded AI assistant accessible from any page. Two modes:
17379
- - **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.
17380
- - **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.
17381
-
17382
- Document upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.
17383
-
17384
- ## Documentation
17385
-
17386
- - [API Documentation](https://talonic.com/docs): Complete REST API reference
17387
- - [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide
17388
- - [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification
17329
+ - [Full Documentation for AI Models](https://talonic.com/llms-full.txt): Concatenated full reference \u2014 API, platform, pricing, compliance
17330
+ - [Talonic vs Reducto](https://talonic.com/vs/reducto): Schema-first vs. parse-first comparison
17331
+ - [Talonic vs Instabase](https://talonic.com/vs/instabase): Platform architecture comparison
17389
17332
  `;
17390
17333
  var LLMS_FULL_TXT_HEADER = `# Talonic \u2014 Full Documentation
17391
17334
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@talonic/docs",
3
- "version": "0.20.4",
3
+ "version": "0.20.6",
4
4
  "description": "Talonic documentation components — API Reference & Platform Guide",
5
5
  "license": "UNLICENSED",
6
6
  "private": false,
@@ -1,45 +0,0 @@
1
- /**
2
- * Tailwind CSS preset for @talonic/docs consumers.
3
- * Adds the Void design system color tokens and font families
4
- * so doc components render correctly in any host app.
5
- */
6
- declare const voidDocsPreset: {
7
- darkMode: "class";
8
- theme: {
9
- extend: {
10
- colors: {
11
- 'void-bg': string;
12
- 'void-bg-elevated': string;
13
- 'void-surface': string;
14
- 'void-surface-2': string;
15
- 'void-surface-3': string;
16
- 'void-border': string;
17
- 'void-border-hover': string;
18
- 'void-text-primary': string;
19
- 'void-text-secondary': string;
20
- 'void-text-muted': string;
21
- 'void-text-tertiary': string;
22
- 'void-accent': string;
23
- 'void-accent-hover': string;
24
- 'void-accent-dim': string;
25
- 'void-accent-tint': string;
26
- 'void-danger': string;
27
- 'void-danger-solid': string;
28
- 'void-divider': string;
29
- 'void-warning': string;
30
- 'void-warning-dim': string;
31
- 'void-tier-1': string;
32
- 'void-tier-2': string;
33
- 'void-tier-3': string;
34
- };
35
- fontFamily: {
36
- space: string[];
37
- body: string[];
38
- mono: string[];
39
- };
40
- };
41
- };
42
- plugins: never[];
43
- };
44
-
45
- export { voidDocsPreset as default };
@@ -1,45 +0,0 @@
1
- /**
2
- * Tailwind CSS preset for @talonic/docs consumers.
3
- * Adds the Void design system color tokens and font families
4
- * so doc components render correctly in any host app.
5
- */
6
- declare const voidDocsPreset: {
7
- darkMode: "class";
8
- theme: {
9
- extend: {
10
- colors: {
11
- 'void-bg': string;
12
- 'void-bg-elevated': string;
13
- 'void-surface': string;
14
- 'void-surface-2': string;
15
- 'void-surface-3': string;
16
- 'void-border': string;
17
- 'void-border-hover': string;
18
- 'void-text-primary': string;
19
- 'void-text-secondary': string;
20
- 'void-text-muted': string;
21
- 'void-text-tertiary': string;
22
- 'void-accent': string;
23
- 'void-accent-hover': string;
24
- 'void-accent-dim': string;
25
- 'void-accent-tint': string;
26
- 'void-danger': string;
27
- 'void-danger-solid': string;
28
- 'void-divider': string;
29
- 'void-warning': string;
30
- 'void-warning-dim': string;
31
- 'void-tier-1': string;
32
- 'void-tier-2': string;
33
- 'void-tier-3': string;
34
- };
35
- fontFamily: {
36
- space: string[];
37
- body: string[];
38
- mono: string[];
39
- };
40
- };
41
- };
42
- plugins: never[];
43
- };
44
-
45
- export { voidDocsPreset as default };