@talonic/docs 0.20.2 → 0.20.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/seo.d.ts CHANGED
@@ -26,6 +26,18 @@ declare const API_FAQ: {
26
26
  question: string;
27
27
  answer: string;
28
28
  }[];
29
+ declare const SDK_NAV_SECTIONS: NavSection[];
30
+ declare const MCP_NAV_SECTIONS: NavSection[];
31
+ declare const SDK_SECTION_META: SectionMeta[];
32
+ declare const MCP_SECTION_META: SectionMeta[];
33
+ declare const SDK_FAQ: {
34
+ question: string;
35
+ answer: string;
36
+ }[];
37
+ declare const MCP_FAQ: {
38
+ question: string;
39
+ answer: string;
40
+ }[];
29
41
  declare const PLATFORM_FAQ: {
30
42
  question: string;
31
43
  answer: string;
@@ -33,4 +45,4 @@ declare const PLATFORM_FAQ: {
33
45
  declare const LLMS_TXT = "# Talonic\n\n> AI-powered document structuring platform that turns unstructured files into schema-validated, provenance-tracked structured data.\n\nTalonic ingests documents in 25+ formats (PDFs, scans, images, spreadsheets, emails, archives), discovers every data field through AI extraction and semantic clustering, and produces structured datasets with per-cell confidence scores, reasoning traces, and source provenance. It runs a single deployable stack with Postgres + pgvector, Anthropic Claude for extraction, and Mistral Document AI for OCR.\n\n## How It Works\n\n1. **Upload** \u2014 Drag files/folders into Inputs or ingest via API. ZIP archives unpack automatically. Files are deduplicated via SHA-256 hashing.\n2. **Extract** \u2014 Each document goes through Document AI OCR (converts to Markdown), classification against a 529-type ontology, and AI field extraction (discovers every data point with confidence and source text).\n3. **Build Schema** \u2014 Extracted fields resolve into the Field Registry (canonical names, semantic clusters, master instructions). Define a user template selecting the fields you need.\n4. **Run Job** \u2014 A 4-phase pipeline fills every cell in a documents \u00D7 fields grid. ~30% filled instantly from graph matches, ~70% from AI agents.\n5. **Deliver** \u2014 Push approved data to webhooks, REST APIs, SFTP, email, or S3/R2 cloud storage.\n\n## Sources & Documents\n\nSupported formats across three processing paths:\n- **Text fast-path** (direct read): TXT, MD, HTML, XML, JSON, EML, CSV\n- **AI Vision** (multimodal): PNG, JPG, JPEG, GIF, WEBP\n- **OCR** (Mistral Document AI \u2192 Markdown): PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, MSG, BMP\n- **Archives**: ZIP (recursive unpack)\n\nUpload methods: drag-and-drop UI, folder upload (preserves file paths), API upload (single, batch up to 200 files, or archive up to 500 MB). Batch mode available at 50% cost with 48-hour delivery window.\n\nEvery document is classified into a canonical type from the 529-type ontology (e.g., \"Employment Contract\", \"Invoice\", \"Bill of Lading\"). Classification is language-agnostic \u2014 a German Arbeitsvertrag maps to the same type as an English Employment Contract. Unresolvable documents get \"Unclassified Document\".\n\nDocument detail tabs: Raw Extraction (every field with confidence), Resolved Data (mapped to registry), Processing Log (per-stage timing), Original File.\n\n## Field Registry\n\nThe unified knowledge graph of all canonical fields, growing smarter with every document processed.\n\n- **Tier 1 (Core)**: Universal fields across many document types. Most reliable.\n- **Tier 2 (Established)**: Promoted from Tier 3 after frequency thresholds. Production-ready.\n- **Tier 3 (Emerging)**: Newly discovered from a few documents. May promote as more data arrives.\n\nFields with similar meanings cluster automatically via AI embeddings (e.g., \"Vendor Name\", \"Supplier Name\", \"Company Name\" \u2192 same cluster). Master instructions are AI-synthesized extraction directives that improve accuracy over time.\n\n## Schemas\n\nTwo types: **Generated schemas** (auto-created per document type from Tier 1+2 fields) and **User templates** (user-defined output structures).\n\nTemplate workflow: name it \u2192 add fields (display name, data type, extraction instructions) \u2192 map to registry (exact/semantic/composite matching) \u2192 add reference tables \u2192 publish an immutable version.\n\nField features: format constraints (regex validation with empty/flag/constant fallback), modifiers (date/number format, alias mapping, max_length), bypass strategies (constant, generator, reference lookup \u2014 skip LLM), capture submoves (match \u2192 compute \u2192 reason), output name remapping.\n\nVersioning: Live (published, read-only), Workshop (mutable draft), Version History (timeline with diff). Test extraction compares draft vs. live results before publishing.\n\n## Extraction Jobs (Runs)\n\nA job applies a schema to documents, producing a grid (rows = documents, columns = fields). Navigate to Structuring \u2192 Runs \u2192 New.\n\n**4-phase pipeline:**\n1. **Resolve** \u2014 ~30% of cells in seconds. Graph matches, fuzzy name matching, concept-synonym expansion, 3-tier reference lookup (normalize \u2192 fuzzy \u2192 AI), description scan. No AI calls (except rare Haiku fallback). Values normalized: dates \u2192 YYYY/MM/DD, numbers \u2192 2 decimal places.\n2. **Agent** \u2014 AI reviews gap patterns and produces typed strategy per field: compute (formula from grid values), transfer (copy from equivalent field), extract (re-read document with instructions, 5 concurrent), skip (with reasoning). Fields with manual instructions are always extracted, never skipped.\n3. **Validation** \u2014 Cross-field sanity checks: date_sanity, amount_mismatch, lookup_failed, low_confidence_outlier, unexpected_empty. Flags are informational only \u2014 never block output.\n4. **Re-read** \u2014 Context-aware gap filling. For each empty/low-confidence cell, AI re-reads the original document with field instruction + full grid context. Respects the confidence gate: cells \u2265 0.7 confidence are permanently protected.\n\nPer-cell provenance: confidence (0.0\u20131.0), resolution_type (graph_match | agent_derived | source_reread | unresolved), phase (1\u20134), reasoning trace, source reference (document, page, field).\n\n## Cases & Document Linking\n\nRegistry fields can be link keys: Identity (company/person names), Transaction (contract/PO/invoice numbers), Reference (project codes, cost centers). The linking pipeline normalizes values and builds a bipartite graph of documents \u2194 entities.\n\nA **case** = 2+ documents connected through transaction/reference entities. An **entity group** = 2+ documents connected through identity-only entities. High-frequency entities (>30% of documents) are auto-excluded from case formation.\n\nCase detail: documents, shared entities, evidence chain, timeline, AI-generated narration. Document Graph provides a D3-force visual layout. Case templates auto-discovered after 3+ cases form.\n\n## Smart Matching\n\nUpload CSV/Excel as reference datasets. Define field-to-field comparisons with weighted strategies: exact (case-insensitive), fuzzy (token-based with similarity threshold), date_range (configurable tolerance), numeric_range (percentage or absolute tolerance). AI can auto-suggest field mappings.\n\nResults: top 5 candidates per document with confidence scores and per-field evidence breakdown.\n\n## Validation & Quality\n\n- **Validation checks**: Schema-level rules (field format, value range, cross-field consistency, AI-proposed coherence). Run during Phase 3.\n- **Golden samples**: Manually-created reference datasets. Benchmark runs compare extraction vs. golden for per-field accuracy with AI judge verdicts.\n- **Approval gates**: Threshold-based auto-approve/flag (minimum confidence, validation pass rate, field coverage). Failed rows go to manual review queue.\n\n## Delivery\n\nDestinations: Webhook (HMAC-SHA256 signed), REST API (configurable headers), SFTP, Email (attachments), S3/R2 (cloud storage).\n\nField mappings transform output fields to match destination format. Triggers: auto on approval (stage/push), scheduled (cron), or manual. Failed exports retry with exponential backoff.\n\nDialects control serialization: date_format, number_locale, CSV delimiter, null representation, boolean format, encoding (UTF-8, UTF-8-BOM, ISO-8859-1).\n\n## Search & Navigation\n\n**Omnisearch** (Cmd+K / Ctrl+K): searches across documents, extracted values, field names, schema names, and sources simultaneously.\n\nDocument filters: field-value conditions with autocomplete, comparison operators (eq, contains, gt, between, is_empty), combinable. URL-serializable and saveable as presets.\n\nKeyboard shortcuts: Cmd+K (search), Cmd+J (quick extract), Escape (close overlays).\n\n## Team & Settings\n\n4 roles: Viewer (read-only), Member (full CRUD), Admin (+ team management), Owner (+ billing, API keys, org settings). New members auto-match by email domain with pending approval.\n\nUsage & Registry: per-feature cost breakdown, daily cost chart, call log with model/tokens/cost. Admin master view for cross-tenant stats.\n\n## API\n\nBase URL: `https://api.talonic.com`. Auth: `Authorization: Bearer tlnc_...` (SHA-256 hashed, shown once at creation). Scopes: extract, read, write.\n\nKey endpoints:\n- POST /v1/extract \u2014 Synchronous/async document extraction (`include_markdown=true` returns OCR text, `processing_mode=batch` for 50% cost)\n- GET /v1/documents \u2014 List with cursor pagination; GET /v1/documents/:id/markdown for OCR text\n- GET /v1/extractions \u2014 Query results and field corrections\n- POST /v1/schemas \u2014 Create/manage extraction schemas\n- GET /v1/jobs \u2014 Track async jobs and results; N-Shot comparisons, overrides, judge decisions\n- POST /v1/sources \u2014 Manage API sources and document ingest\n- POST /v1/webhooks \u2014 Configure webhook endpoints\n- /v1/resolutions \u2014 Resolution runs: list, create, get, execute, delete, results\n- /v1/linking \u2014 Link keys, document links, entity graph, classify, backfill, document-case map\n- /v1/schema-graph \u2014 Schema classes, versions, diffs (approve/reject), edges, aliases, visualize\n- /v1/structuring \u2014 Validation checks CRUD, approval gates CRUD with rules, result checks, pending approvals, approve/reject, delivery trigger\n- /v1/telemetry \u2014 Per-schema and per-run summaries, trends, field-level breakdowns\n- /v1/validation \u2014 Golden samples (list, get, delete), validation runs (list, create, get, delete, results)\n- /v1/credits \u2014 Balance, history, usage summary, daily usage, per-request usage log\n- /v1/cases \u2014 Status updates, edges, edge confirm/reject, split/merge, completeness, pin/remove documents\n- /v1/batches \u2014 Sync with provider, cancel\n- /v1/matching \u2014 Smart run, AI resolve, strategies CRUD, run results/progress, review\n- /v1/review \u2014 Assign, stats\n- /v1/quality \u2014 Ground truth entries CRUD, benchmark results, benchmark comparison\n- /v1/reference-data \u2014 JSON upload (POST create)\n\nWebhook events: extraction.completed, job.completed, export.completed, validation.completed. All HMAC-SHA256 signed with retry on failure.\n\n## Agent\n\nThe embedded AI assistant accessible from any page. Two modes:\n- **Chat mode** \u2014 Ask questions about the platform, your documents, extraction results, schemas, or workflows. Grounded in platform documentation.\n- **Planning mode** \u2014 Request actions (create schemas, run jobs, configure exports). The agent builds a plan, confirms with you, then executes.\n\nDocument upload flow: Cmd+J or the upload button opens quick extract. Drop a file, select a schema (or let AI discover fields), and get structured results.\n\n## Documentation\n\n- [API Documentation](https://talonic.com/docs): Complete REST API reference\n- [Platform Guide](https://talonic.com/docs/platform): Product documentation and feature guide\n- [OpenAPI Spec](https://talonic.com/docs/openapi.json): Machine-readable API specification\n";
34
46
  declare const LLMS_FULL_TXT_HEADER = "# Talonic \u2014 Full Documentation\n\n> This file contains the complete Talonic documentation for LLM consumption.\n> For a summary, see llms.txt.\n\n";
35
47
 
36
- export { API_FAQ, API_NAV_SECTIONS, API_SECTION_META, LLMS_FULL_TXT_HEADER, LLMS_TXT, OPENAPI_SPEC, PLATFORM_FAQ, PLATFORM_NAV_SECTIONS, PLATFORM_SECTION_META, type SectionMeta };
48
+ export { API_FAQ, API_NAV_SECTIONS, API_SECTION_META, LLMS_FULL_TXT_HEADER, LLMS_TXT, MCP_FAQ, MCP_NAV_SECTIONS, MCP_SECTION_META, OPENAPI_SPEC, PLATFORM_FAQ, PLATFORM_NAV_SECTIONS, PLATFORM_SECTION_META, SDK_FAQ, SDK_NAV_SECTIONS, SDK_SECTION_META, type SectionMeta };
package/dist/seo.js CHANGED
@@ -17149,10 +17149,93 @@ var API_FAQ = [
17149
17149
  { question: "What file formats does the Talonic API support?", answer: "PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, PNG, JPG, JPEG, GIF, WEBP, TXT, MD, HTML, XML, JSON, EML, CSV, MSG, BMP, and ZIP archives." },
17150
17150
  { question: "How does authentication work?", answer: "All API requests require a Bearer token in the Authorization header. API keys carry the tlnc_ prefix and are scoped to a source. Create and manage keys from Settings \u2192 API Keys." },
17151
17151
  { question: "What schema formats are supported?", answer: "Three formats: JSON Schema (full control), simplified fields (recommended), and flat key-type maps (quick prototyping). Supported types: string, number, integer, boolean, date, array, object, enum." },
17152
- { question: "What are the rate limits?", answer: "Free: 50 extractions/day, 5/min burst, 10MB max. Pro: 2,000/day, 30/min, 50MB max. Enterprise: unlimited with custom rates." },
17152
+ { question: "What are the rate limits?", answer: "Per-key rate limits: 100 req/s extraction, 1,000 req/s read, 200 req/s write. Rate-limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) are included on every response." },
17153
17153
  { question: "How do webhooks work?", answer: "Webhooks deliver POST requests with HMAC-SHA256 signed JSON payloads. Events: extraction.complete, extraction.failed, document.ingested. Failed deliveries retry with exponential backoff (1min, 5min, 30min, 4hr)." },
17154
17154
  { question: "Can I extract data asynchronously?", answer: "Yes. Set async: true or provide a webhook_url in the extract request. Returns a 202 with a job ID that you can poll at /v1/jobs/:id." }
17155
17155
  ];
17156
+ var SDK_NAV_SECTIONS = [
17157
+ { id: "sdk-overview", label: "Overview", children: [
17158
+ { id: "sdk-introduction", label: "Introduction" },
17159
+ { id: "sdk-installation", label: "Installation" },
17160
+ { id: "sdk-authentication", label: "Authentication" },
17161
+ { id: "sdk-quickstart", label: "Quick Start" }
17162
+ ] },
17163
+ { id: "sdk-client", label: "Client", children: [
17164
+ { id: "sdk-configuration", label: "Configuration" },
17165
+ { id: "sdk-extract", label: "Extract" }
17166
+ ] },
17167
+ { id: "sdk-resources", label: "Resources", children: [
17168
+ { id: "sdk-documents", label: "Documents" },
17169
+ { id: "sdk-extractions", label: "Extractions" },
17170
+ { id: "sdk-schemas", label: "Schemas" },
17171
+ { id: "sdk-jobs", label: "Jobs" }
17172
+ ] },
17173
+ { id: "sdk-cli", label: "CLI", children: [
17174
+ { id: "sdk-cli-usage", label: "Usage" }
17175
+ ] },
17176
+ { id: "sdk-errors", label: "Error Handling", children: [
17177
+ { id: "sdk-error-classes", label: "Error Classes" },
17178
+ { id: "sdk-retries", label: "Retries" }
17179
+ ] }
17180
+ ];
17181
+ var MCP_NAV_SECTIONS = [
17182
+ { id: "mcp-overview", label: "Overview", children: [
17183
+ { id: "mcp-introduction", label: "Introduction" },
17184
+ { id: "mcp-installation", label: "Installation" },
17185
+ { id: "mcp-authentication", label: "Authentication" }
17186
+ ] },
17187
+ { id: "mcp-clients", label: "Client Setup", children: [
17188
+ { id: "mcp-claude-desktop", label: "Claude Desktop" },
17189
+ { id: "mcp-cursor", label: "Cursor" },
17190
+ { id: "mcp-cline", label: "Cline" },
17191
+ { id: "mcp-continue", label: "Continue" },
17192
+ { id: "mcp-cowork", label: "Cowork" }
17193
+ ] },
17194
+ { id: "mcp-tools", label: "Tools", children: [
17195
+ { id: "mcp-talonic-extract", label: "talonic_extract" },
17196
+ { id: "mcp-talonic-search", label: "talonic_search" },
17197
+ { id: "mcp-talonic-filter", label: "talonic_filter" },
17198
+ { id: "mcp-talonic-get-document", label: "talonic_get_document" },
17199
+ { id: "mcp-talonic-to-markdown", label: "talonic_to_markdown" },
17200
+ { id: "mcp-talonic-list-schemas", label: "talonic_list_schemas" },
17201
+ { id: "mcp-talonic-save-schema", label: "talonic_save_schema" }
17202
+ ] },
17203
+ { id: "mcp-resources", label: "Resources", children: [
17204
+ { id: "mcp-schemas-resource", label: "talonic://schemas" }
17205
+ ] },
17206
+ { id: "mcp-advanced", label: "Advanced", children: [
17207
+ { id: "mcp-drag-drop", label: "Drag & Drop in Chat" },
17208
+ { id: "mcp-architecture", label: "Architecture" },
17209
+ { id: "mcp-configuration", label: "Configuration" },
17210
+ { id: "mcp-troubleshooting", label: "Troubleshooting" }
17211
+ ] }
17212
+ ];
17213
+ var SDK_SECTION_META = [
17214
+ { id: "sdk-overview", title: "Node SDK Overview", description: "Install and configure the official Talonic Node.js SDK. Extract structured data from any document with a single function call." },
17215
+ { id: "sdk-client", title: "SDK Client", description: "Configure the Talonic client with API key, base URL, timeout, retries, and custom fetch. Top-level extract method for single-call extraction." },
17216
+ { id: "sdk-resources", title: "SDK Resources", description: "Documents, extractions, schemas, and jobs resource APIs in the Talonic Node SDK." },
17217
+ { id: "sdk-cli", title: "SDK CLI", description: "The talonic CLI binary for command-line extraction, schema management, and document operations." },
17218
+ { id: "sdk-errors", title: "SDK Error Handling", description: "Typed error classes, automatic retries with exponential backoff, and rate limit handling in the Talonic Node SDK." }
17219
+ ];
17220
+ var MCP_SECTION_META = [
17221
+ { id: "mcp-overview", title: "MCP Server Overview", description: "Install the official Talonic MCP server to give AI agents structured document extraction via the Model Context Protocol." },
17222
+ { id: "mcp-clients", title: "MCP Client Setup", description: "Step-by-step setup for Claude Desktop, Cursor, Cline, Continue, and Cowork MCP clients." },
17223
+ { id: "mcp-tools", title: "MCP Tools", description: "Seven MCP tools: extract, search, filter, get document, to markdown, list schemas, and save schema." },
17224
+ { id: "mcp-resources", title: "MCP Resources", description: "The talonic://schemas MCP resource for browsing saved schemas in compatible clients." },
17225
+ { id: "mcp-advanced", title: "MCP Advanced", description: "Drag-and-drop file handling, architecture overview, configuration options, and troubleshooting for the Talonic MCP server." }
17226
+ ];
17227
+ var SDK_FAQ = [
17228
+ { question: "How do I install the Talonic Node SDK?", answer: "Run npm install @talonic/node. Requires Node.js 18 or newer. Zero runtime dependencies." },
17229
+ { question: "How do I extract data from a document with the SDK?", answer: "Call talonic.extract() with a file_path (or file_url/buffer) and a schema defining the fields you want. Returns structured JSON with confidence scores." },
17230
+ { question: "Does the SDK retry on errors?", answer: "Yes. The SDK retries automatically on 429, 500, 502, 503, 504, network errors, and timeouts with exponential backoff capped at 16s. The API can mark errors as non-retryable." },
17231
+ { question: "What error types does the SDK throw?", answer: "TalonicAuthError (401/403), TalonicNotFoundError (404), TalonicValidationError (400/409/413/422), TalonicRateLimitError (429), TalonicServerError (5xx), TalonicNetworkError, and TalonicTimeoutError." }
17232
+ ];
17233
+ var MCP_FAQ = [
17234
+ { question: "What is the Talonic MCP server?", answer: "An official Model Context Protocol server that gives AI agents seven tools for document extraction, search, filtering, and schema management via the Talonic API." },
17235
+ { question: "How do I install the Talonic MCP server?", answer: 'Add a one-line npx invocation to your MCP client config: {"command": "npx", "args": ["-y", "@talonic/mcp@latest"], "env": {"TALONIC_API_KEY": "tlnc_..."}}. No clone or build required.' },
17236
+ { question: "Which MCP clients are supported?", answer: "Claude Desktop, Cursor, Cline, Continue, and Cowork. Any MCP-compliant client can connect." },
17237
+ { question: "How does file upload work in chat clients?", answer: "From v0.1.4, agents can pass file_data (base64) + filename instead of a file path. This handles drag-and-drop in sandboxed chat clients like Claude Desktop." }
17238
+ ];
17156
17239
  var PLATFORM_FAQ = [
17157
17240
  { question: "What is the Field Registry?", answer: "The unified knowledge graph of all canonical fields discovered across documents. Fields are organized into three tiers based on frequency: Tier 1 (core), Tier 2 (established), Tier 3 (emerging)." },
17158
17241
  { question: "How does the 4-phase extraction pipeline work?", answer: "Phase 1 (Resolve) fills ~30% of cells from graph matches \u2014 no AI needed. Phase 2 (Agent) uses AI strategies. Phase 3 (Validation) runs cross-field checks. Phase 4 (Re-read) fills remaining gaps with targeted document re-reading." },
@@ -17316,8 +17399,14 @@ export {
17316
17399
  API_SECTION_META,
17317
17400
  LLMS_FULL_TXT_HEADER,
17318
17401
  LLMS_TXT,
17402
+ MCP_FAQ,
17403
+ MCP_NAV_SECTIONS,
17404
+ MCP_SECTION_META,
17319
17405
  OPENAPI_SPEC,
17320
17406
  PLATFORM_FAQ,
17321
17407
  PLATFORM_NAV_SECTIONS,
17322
- PLATFORM_SECTION_META
17408
+ PLATFORM_SECTION_META,
17409
+ SDK_FAQ,
17410
+ SDK_NAV_SECTIONS,
17411
+ SDK_SECTION_META
17323
17412
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@talonic/docs",
3
- "version": "0.20.2",
3
+ "version": "0.20.4",
4
4
  "description": "Talonic documentation components — API Reference & Platform Guide",
5
5
  "license": "UNLICENSED",
6
6
  "private": false,