@claritylabs/cl-sdk 0.17.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # CL-SDK
2
2
 
3
- Open infrastructure for building AI agents that work with insurance. Pure TypeScript, provider-agnostic — bring any LLM, any embedding model, any storage backend.
3
+ Deterministic insurance intelligence primitives for regulated AI agents. Pure TypeScript, provider-agnostic — bring any LLM, any embedding model, any storage backend.
4
4
 
5
5
  **[Documentation](https://cl-sdk.claritylabs.inc/docs)** | **[npm](https://www.npmjs.com/package/@claritylabs/cl-sdk)** | **[GitHub](https://github.com/claritylabs-inc/cl-sdk)**
6
6
 
@@ -12,11 +12,14 @@ npm install @claritylabs/cl-sdk pdf-lib zod
12
12
 
13
13
  ## What It Does
14
14
 
15
- - **Document Extraction** — Agentic pipeline with 13 focused extractors that turns insurance PDFs into structured data with page-level provenance, quality gates, first-class definitions and covered reasons, and automatic declarations-to-schema promotion (limits, deductibles, locations, broker, loss payees, premium, taxes/fees, summary)
16
- - **Query Agent** — Citation-backed question answering over stored documents and inbound photos/PDFs/text with sub-question decomposition and grounding verification
17
- - **Application Processing** — Eight focused agents handle intake field extraction, auto-fill from prior answers, topic-based question batching, and PDF mapping
18
- - **Agent System** — Composable prompt modules for building insurance-aware conversational agents across email, chat, SMS, Slack, and Discord
19
- - **Storage** — DocumentStore and MemoryStore interfaces with SQLite reference implementation
15
+ - **Document Extraction** — Deterministic extraction pipeline with focused model calls that turns insurance PDFs into structured data with page-level provenance, quality gates, first-class definitions and covered reasons, referential coverage resolution, cost-aware formatting, and automatic declarations-to-schema promotion (limits, deductibles, locations, broker, loss payees, premium, taxes/fees, summary)
16
+ - **Source Grounding** — Shared source spans, source chunks, source stores, quoted evidence validation, and deterministic evidence ordering across extraction, query, application, PCE, and case workflows
17
+ - **Query Agent** — Citation-backed question answering over stored documents, source spans, and inbound photos/PDFs/text with sub-question decomposition, bounded retrieval planning, attachment-only reasoning when retrieval is unnecessary, and grounding verification
18
+ - **Application Processing** — Bounded workflows handle intake with deterministic planning — field extraction, prior-answer backfill, context auto-fill, document lookup gating, topic-based question batching, reply parsing, source-backed field provenance, and PDF mapping
19
+ - **Policy Change Endorsements** — PCE intake, evidence collection, missing-info handling, quality gates, execution mode selection, and reviewable submission packets
20
+ - **Case Workflows** — Shared primitives for evidence-backed proposals, missing information, validation issues, stable IDs, and packet artifacts
21
+ - **Agent System** — Composable prompt modules for building insurance-aware agents across email, chat, SMS, Slack, and Discord with human-reviewable behavior
22
+ - **Storage** — DocumentStore, MemoryStore, SourceStore, and ApplicationStore interfaces with reference implementations where appropriate
20
23
 
21
24
  ## Quick Start
22
25
 
@@ -38,9 +41,28 @@ const extractor = createExtractor({
38
41
  const result = await extractor.extract(pdfBase64);
39
42
  console.log(result.document); // Typed InsuranceDocument
40
43
  console.log(result.chunks); // DocumentChunk[] for vector storage
44
+ console.log(result.sourceSpans); // SourceSpan[] when supplied by the host
41
45
  console.log(result.reviewReport); // Quality gate results
42
46
  ```
43
47
 
48
+ ## Source Grounding
49
+
50
+ Source spans are the v1 evidence layer. Build spans from PDF text, OCR, emails, attachments, or structured fields, then pass them into extraction and downstream workflows:
51
+
52
+ ```typescript
53
+ import { buildPageSourceSpans, MemorySourceStore, createExtractor } from "@claritylabs/cl-sdk";
54
+
55
+ const pageOneText = "..."; // text from your PDF text/OCR pipeline
56
+ const sourceSpans = buildPageSourceSpans([
57
+ { documentId: "policy-123", sourceKind: "policy_pdf", pageNumber: 1, text: pageOneText },
58
+ ]);
59
+
60
+ const sourceStore = new MemorySourceStore();
61
+ const extractor = createExtractor({ generateText, generateObject, sourceStore });
62
+
63
+ const result = await extractor.extract(pdfBase64, "policy-123", { sourceSpans });
64
+ ```
65
+
44
66
  See the [full documentation](https://cl-sdk.claritylabs.inc/docs) for architecture, provider setup, API reference, and more.
45
67
 
46
68
  ## Multimodal Querying
@@ -59,6 +81,7 @@ const agent = createQueryAgent({
59
81
  generateObject,
60
82
  documentStore,
61
83
  memoryStore,
84
+ sourceRetriever,
62
85
  });
63
86
 
64
87
  const result = await agent.query({
@@ -81,16 +104,29 @@ const result = await agent.query({
81
104
  });
82
105
  ```
83
106
 
84
- The query pipeline first interprets each attachment into structured evidence, then combines that with retrieved chunks, document lookups, and conversation history before answering.
107
+ The query workflow first interprets each attachment into structured evidence, then uses the query classifier to decide whether stored-document retrieval is needed. Simple or attachment-only questions can skip retrieval and reason over the available evidence directly; document-backed questions still retrieve chunks, reason over citations, and run grounding verification. Verification can request targeted retry retrieval for weak sub-answers.
85
108
 
86
109
  Important: your `generateObject` callback must actually forward multimodal payloads from `providerOptions` to the model request:
87
110
 
88
111
  - `providerOptions.attachments` for generic image/pdf/text inputs
89
112
  - `providerOptions.pdfBase64` for PDF inputs
90
113
  - `providerOptions.images` for image inputs
114
+ - `providerOptions.sourceSpans` and `providerOptions.sourceChunks` for source evidence when your host passes them through
91
115
 
92
116
  If your callback ignores those fields, the model will only see the text prompt.
93
117
 
118
+ ## Bounded Agentic Workflows
119
+
120
+ CL-SDK uses deterministic scaffolding with agentic decision points rather than fixed all-tools-all-the-time chains:
121
+
122
+ - Extraction page mapping and review choose focused follow-up extractors from the live extractor catalog. Definitions and covered reasons can fall back through section extraction when a focused run returns no usable records.
123
+ - Supplementary extraction runs only when page assignments, form inventory, existing extracted text, or review follow-up tasks indicate regulatory, claims, notice, cancellation, or contact facts are likely present.
124
+ - Referential coverage resolution tries cheap local section/form matches first, then uses bounded target-specific actions for declarations, schedules, sections, page-location lookup, or skip.
125
+ - Formatting skips the LLM cleanup pass for plain prose and only formats long or noisy content that looks likely to contain markdown, spacing, list, heading, or table artifacts.
126
+ - Application processing plans optional backfill, context auto-fill, document search, batching, reply parsing, lookup, explanations, and next-batch email generation based on current state.
127
+
128
+ These gates reduce unnecessary provider calls while preserving reliability for edge cases where additional focused extraction or retrieval is needed.
129
+
94
130
  ## Development
95
131
 
96
132
  ```bash