@claritylabs/cl-sdk 0.8.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,661 +1,56 @@
1
1
  # CL-SDK
2
2
 
3
- [Clarity Labs](https://claritylabs.inc) allows insurers to understand their clients as well as they know themselves. A better understanding of clients means insurers can automate servicing to reduce costs and identify coverage gaps to cross-sell products.
3
+ Open infrastructure for building AI agents that work with insurance. Pure TypeScript, provider-agnostic bring any LLM, any embedding model, any storage backend.
4
4
 
5
- CL-SDK is the open infrastructure layer that makes this possible — a pure TypeScript library for extracting, reasoning about, and acting on insurance documents. Provider-agnostic by design: bring any LLM, any embedding model, any storage backend.
5
+ **[Documentation](https://cl-sdk.claritylabs.inc/docs)** | **[npm](https://www.npmjs.com/package/@claritylabs/cl-sdk)** | **[GitHub](https://github.com/claritylabs-inc/cl-sdk)**
6
6
 
7
7
  ## Installation
8
8
 
9
9
  ```bash
10
- npm install @claritylabs/cl-sdk
10
+ npm install @claritylabs/cl-sdk pdf-lib zod
11
11
  ```
12
12
 
13
- ### Peer Dependencies
13
+ ## What It Does
14
14
 
15
- ```bash
16
- npm install pdf-lib zod
17
- ```
18
-
19
- Optional (for SQLite storage):
20
- ```bash
21
- npm install better-sqlite3
22
- ```
15
+ - **Document Extraction** — Agentic pipeline with 11 focused extractors that turns insurance PDFs into structured data with page-level provenance and quality gates
16
+ - **Query Agent** — Citation-backed question answering over stored documents with sub-question decomposition and grounding verification
17
+ - **Application Processing** — Eight focused agents handle intake — field extraction, auto-fill from prior answers, topic-based question batching, and PDF mapping
18
+ - **Agent System** — Composable prompt modules for building insurance-aware conversational agents across email, chat, SMS, Slack, and Discord
19
+ - **Storage** — DocumentStore and MemoryStore interfaces with SQLite reference implementation
23
20
 
24
21
  ## Quick Start
25
22
 
26
- ### Document Extraction
27
-
28
- CL-SDK extracts structured data from insurance PDFs using a multi-agent pipeline. You provide two callback functions — `generateText` and `generateObject` — and the SDK handles the rest:
29
-
30
23
  ```typescript
31
24
  import { createExtractor } from "@claritylabs/cl-sdk";
32
25
 
33
26
  const extractor = createExtractor({
34
27
  generateText: async ({ prompt, system, maxTokens, providerOptions }) => {
35
- // Wrap your preferred LLM provider
36
28
  const result = await yourProvider.generate({ prompt, system, maxTokens, providerOptions });
37
29
  return { text: result.text, usage: result.usage };
38
30
  },
39
31
  generateObject: async ({ prompt, system, schema, maxTokens, providerOptions }) => {
40
- // schema is a Zod schema use it for structured output
41
- // IMPORTANT: pass providerOptions.pdfBase64 and/or providerOptions.images
42
- // through to your model as file/image message parts.
43
- const result = await yourProvider.generateStructured({
44
- prompt,
45
- system,
46
- schema,
47
- maxTokens,
48
- providerOptions,
49
- });
32
+ // Pass providerOptions.pdfBase64 and/or providerOptions.images to your model
33
+ const result = await yourProvider.generateStructured({ prompt, system, schema, maxTokens, providerOptions });
50
34
  return { object: result.object, usage: result.usage };
51
35
  },
52
36
  });
53
37
 
54
- const pdfBase64 = "..."; // base64-encoded insurance PDF
55
38
  const result = await extractor.extract(pdfBase64);
56
- console.log(result.document); // Typed InsuranceDocument (policy or quote)
57
- console.log(result.chunks); // DocumentChunk[] ready for vector storage
58
- ```
59
-
60
- ### With PDF-to-Image Conversion
61
-
62
- For providers that don't support native PDF input (e.g., OpenAI):
63
-
64
- ```typescript
65
- const extractor = createExtractor({
66
- generateText: /* ... */,
67
- generateObject: /* ... */,
68
- convertPdfToImages: async (pdfBase64, startPage, endPage) => {
69
- // Convert PDF pages to images using your preferred library
70
- return [{ imageBase64: "...", mimeType: "image/png" }]; // one per page
71
- },
72
- });
73
- ```
74
-
75
- ## Architecture
76
-
77
- ### Provider-Agnostic Callbacks
78
-
79
- CL-SDK has **zero framework dependencies**. All LLM interaction happens through two callback types:
80
-
81
- ```typescript
82
- type GenerateText = (params: {
83
- prompt: string;
84
- system?: string;
85
- maxTokens: number;
86
- providerOptions?: Record<string, unknown>;
87
- }) => Promise<{ text: string; usage?: { inputTokens: number; outputTokens: number } }>;
88
-
89
- type GenerateObject<T> = (params: {
90
- prompt: string;
91
- system?: string;
92
- schema: ZodSchema<T>;
93
- maxTokens: number;
94
- providerOptions?: Record<string, unknown>;
95
- }) => Promise<{ object: T; usage?: { inputTokens: number; outputTokens: number } }>;
96
- ```
97
-
98
- For extraction calls, `providerOptions` can carry document content:
99
-
100
- - `providerOptions.pdfBase64` — the PDF to send as a file part
101
- - `providerOptions.images` — page images to send as image parts
102
-
103
- The coordinator passes the full PDF to classify and plan. Worker extractors pass a page-scoped PDF produced by `extractPageRange()` unless `convertPdfToImages` is enabled, in which case they pass page images instead. Your callback must include that content in the actual model request; the prompt text alone is not sufficient.
104
-
105
- Works with any provider: Anthropic, OpenAI, Google, Mistral, Bedrock, Azure, Ollama, etc. You write the adapter once; the SDK calls it throughout the pipeline.
106
-
107
- > **Strict structured output compatibility:** The SDK automatically transforms Zod schemas before passing them to `generateObject` — converting `.optional()` fields to `.nullable()` so all properties appear in the JSON Schema `required` array. This ensures compatibility with providers like OpenAI that enforce strict structured output validation. No adapter changes needed on your end.
108
-
109
- ### Extraction Pipeline
110
-
111
- The extraction system uses a **coordinator/worker pattern** — a coordinator agent plans the work, specialized extractor agents execute in parallel, and a review loop ensures completeness.
112
-
113
- ```
114
- ┌─────────────┐ ┌─────────────┐ ┌──────────────────────┐
115
- │ 1. CLASSIFY │────▶│ 2. PLAN │────▶│ 3. EXTRACT (parallel)│
116
- │ │ │ │ │ │
117
- │ Document │ │ Select │ │ Run focused │
118
- │ type, line │ │ template, │ │ extractors against │
119
- │ of business │ │ assign │ │ assigned page │
120
- │ │ │ extractors │ │ ranges │
121
- │ │ │ to pages │ │ │
122
- └─────────────┘ └─────────────┘ └──────────┬───────────┘
123
-
124
- ┌─────────────┐ ┌─────────────┐ ┌──────────▼───────────┐
125
- │ 6. FORMAT │◀────│ 5. ASSEMBLE │◀────│ 4. REVIEW │
126
- │ │ │ │ │ │
127
- │ Clean up │ │ Merge all │ │ Check completeness │
128
- │ markdown │ │ results │ │ against template, │
129
- │ tables, │ │ into final │ │ dispatch follow-up │
130
- │ spacing │ │ document │ │ extractors for gaps │
131
- └──────┬──────┘ └─────────────┘ └──────────────────────┘
132
-
133
- ┌──────▼──────┐
134
- │ 7. CHUNK │
135
- │ Break into │
136
- │ retrieval- │
137
- │ ready │
138
- │ chunks │
139
- └─────────────┘
140
- ```
141
-
142
- #### Phase 1: Classify
143
-
144
- The coordinator sends the document to `generateObject` with the `ClassifyResultSchema`. It determines:
145
- - **Document type** — policy or quote
146
- - **Policy types** — one or more lines of business (e.g., `general_liability`, `workers_comp`)
147
- - **Confidence score**
148
-
149
- The full document is passed through `providerOptions.pdfBase64` for this step, so your callback must attach that PDF to the model request as a real document/file part.
150
-
151
- #### Phase 2: Plan
152
-
153
- Based on the classification, the coordinator selects a **line-of-business template** (e.g., `workers_comp`, `cyber`, `homeowners_ho3`) that defines expected sections and page hints. It then generates an **extraction plan** — a list of tasks that map specific extractors to page ranges within the PDF.
154
-
155
- The planner also receives the full document through `providerOptions.pdfBase64`, not just prompt text.
156
-
157
- #### Phase 3: Extract
158
-
159
- Focused extractor agents are dispatched **in parallel** (concurrency-limited, default 2). Each extractor targets a specific data domain against its assigned page range. The 11 extractor types are:
160
-
161
- | Extractor | What It Extracts |
162
- |-----------|-----------------|
163
- | `carrier_info` | Carrier name, NAIC, AM Best rating, MGA, underwriter, broker |
164
- | `named_insured` | Insured name, DBA, address, entity type, FEIN, SIC/NAICS |
165
- | `declarations` | Line-specific structured declarations (varies by policy type) |
166
- | `coverage_limits` | Coverage names, limits, deductibles, forms, triggers |
167
- | `endorsements` | Form numbers, titles, types, content, affected parties |
168
- | `exclusions` | Exclusion titles, content, applicability |
169
- | `conditions` | Duties after loss, cancellation, other insurance, etc. |
170
- | `premium_breakdown` | Premium amounts, taxes, fees, payment plans, rating basis |
171
- | `loss_history` | Loss runs, claim records, experience modification |
172
- | `supplementary` | Regulatory context, contacts, TPA, claims contacts |
173
- | `sections` | Raw section content (fallback for unmatched sections) |
174
-
175
- Each extractor writes its results to an in-memory `Map`. Results accumulate across all extractors.
176
-
177
- Before each worker call, the SDK slices the requested page range with `extractPageRange()` and passes that page-scoped PDF through `providerOptions.pdfBase64`. If `convertPdfToImages` is configured, it passes `providerOptions.images` instead. The callback layer is responsible for actually including that content in the model input.
178
-
179
- #### Phase 4: Review
180
-
181
- After initial extraction, a review loop (up to `maxReviewRounds`, default 2) checks completeness against the template's expected sections. If gaps are found, additional extractor tasks are dispatched to fill missing data. This iterative refinement ensures comprehensive extraction.
182
-
183
- #### Phase 5: Assemble
184
-
185
- All extractor results are merged into a final validated `InsuranceDocument`.
186
-
187
- #### Phase 6: Format
188
-
189
- A formatting agent pass cleans up markdown in all content-bearing string fields (sections, subsections, endorsements, exclusions, conditions, summary). It fixes:
190
-
191
- - **Pipe tables missing separator rows** — adds `| --- | --- |` and leading/trailing pipes
192
- - **Space-aligned tables** — converts whitespace-padded columns into proper markdown tables
193
- - **Sub-items mixed into tables** — pulls indented sub-items out of tables into lists
194
- - **Mixed table/prose content** — handles each segment independently
195
- - **General cleanup** — excessive blank lines, trailing whitespace, orphaned formatting markers
196
-
197
- Content is batched (up to 20 fields per call) and sent through `generateText` for formatting cleanup. Token usage is tracked the same as other pipeline steps.
198
-
199
- #### Phase 7: Chunk
200
-
201
- The formatted document is chunked into `DocumentChunk[]` for vector storage. Chunks are deterministically IDed as `${documentId}:${type}:${index}`.
202
-
203
- ### Configuration
204
-
205
- ```typescript
206
- const extractor = createExtractor({
207
- // Required: LLM callbacks
208
- generateText,
209
- generateObject,
210
-
211
- // Optional: PDF vision mode
212
- convertPdfToImages: async (pdfBase64, startPage, endPage) => [...],
213
-
214
- // Optional: storage backends
215
- documentStore, // Persist extracted documents
216
- memoryStore, // Vector search over chunks + conversation history
217
-
218
- // Optional: tuning
219
- concurrency: 2, // Max parallel extractors (default: 2)
220
- maxReviewRounds: 2, // Review loop iterations (default: 2)
221
-
222
- // Optional: observability
223
- onTokenUsage: (usage) => console.log(`${usage.inputTokens} in, ${usage.outputTokens} out`),
224
- onProgress: (message) => console.log(message),
225
- log: async (message) => logger.info(message),
226
- providerOptions: {}, // Passed through to every LLM call
227
- });
228
- ```
229
-
230
- ### Line-of-Business Templates
231
-
232
- Templates define what the extraction pipeline expects for each policy type. Each template specifies expected sections, page hints, and required vs. optional fields.
233
-
234
- **Personal lines:** homeowners (HO-3, HO-5), renters (HO-4), condo (HO-6), dwelling fire, personal auto, personal umbrella, personal inland marine, flood (NFIP + private), earthquake, watercraft, recreational vehicle, farm/ranch, mobile home
235
-
236
- **Commercial lines:** general liability, commercial property, commercial auto, workers' comp, umbrella/excess, professional liability, cyber, directors & officers, crime/fidelity
237
-
238
- ## Storage
239
-
240
- CL-SDK defines two storage interfaces (`DocumentStore` and `MemoryStore`) and ships a reference SQLite implementation. You can implement these interfaces with any backend.
241
-
242
- ### DocumentStore
243
-
244
- CRUD for extracted `InsuranceDocument` objects:
245
-
246
- ```typescript
247
- interface DocumentStore {
248
- save(doc: InsuranceDocument): Promise<void>;
249
- get(id: string): Promise<InsuranceDocument | null>;
250
- query(filters: DocumentFilters): Promise<InsuranceDocument[]>;
251
- delete(id: string): Promise<void>;
252
- }
253
- ```
254
-
255
- Filters support: `type` (policy/quote), `carrier` (fuzzy), `insuredName` (fuzzy), `policyNumber` (exact), `quoteNumber` (exact).
256
-
257
- ### MemoryStore
258
-
259
- Vector-searchable storage for document chunks and conversation history. Requires an `EmbedText` callback for generating embeddings:
260
-
261
- ```typescript
262
- type EmbedText = (text: string) => Promise<number[]>;
263
-
264
- interface MemoryStore {
265
- // Document chunks with embeddings
266
- addChunks(chunks: DocumentChunk[]): Promise<void>;
267
- search(query: string, options?: { limit?: number; filter?: ChunkFilter }): Promise<DocumentChunk[]>;
268
-
269
- // Conversation turns with embeddings
270
- addTurn(turn: ConversationTurn): Promise<void>;
271
- getHistory(conversationId: string, options?: { limit?: number }): Promise<ConversationTurn[]>;
272
- searchHistory(query: string, conversationId?: string): Promise<ConversationTurn[]>;
273
- }
274
- ```
275
-
276
- Search uses **cosine similarity** over embeddings to find semantically relevant chunks or conversation turns. Embedding failures are non-fatal — chunks are still stored, just not searchable by vector.
277
-
278
- ### SQLite Reference Implementation
279
-
280
- ```typescript
281
- import { createSqliteStore } from "@claritylabs/cl-sdk/storage/sqlite";
282
-
283
- const store = createSqliteStore({
284
- path: "./cl-sdk.db",
285
- embed: async (text) => {
286
- // Your embedding function (OpenAI, Cohere, local model, etc.)
287
- return await yourEmbeddingProvider.embed(text);
288
- },
289
- });
290
-
291
- // Use with extractor
292
- const extractor = createExtractor({
293
- generateText,
294
- generateObject,
295
- documentStore: store.documents,
296
- memoryStore: store.memory,
297
- });
298
-
299
- // Or use standalone
300
- await store.documents.save(document);
301
- const results = await store.memory.search("what is the deductible?", { limit: 5 });
302
-
303
- // Clean up
304
- store.close();
305
- ```
306
-
307
- ## Agent System
308
-
309
- CL-SDK includes a composable prompt system for building insurance-aware AI agents. The `buildAgentSystemPrompt` function assembles modular prompt segments based on the agent's context:
310
-
311
- ```typescript
312
- import { buildAgentSystemPrompt } from "@claritylabs/cl-sdk";
313
-
314
- const systemPrompt = buildAgentSystemPrompt({
315
- platform: "email", // email | chat | sms | slack | discord
316
- intent: "direct", // direct | mediated | observed
317
- userName: "John",
318
- companyName: "Acme Insurance",
319
- });
39
+ console.log(result.document); // Typed InsuranceDocument
40
+ console.log(result.chunks); // DocumentChunk[] for vector storage
41
+ console.log(result.reviewReport); // Quality gate results
320
42
  ```
321
43
 
322
- ### Prompt Modules
323
-
324
- The system prompt is composed from these modules:
325
-
326
- | Module | Purpose |
327
- |--------|---------|
328
- | **identity** | Agent role, company context, professional persona |
329
- | **intent** | Behavioral rules based on platform and interaction mode |
330
- | **formatting** | Output formatting rules (markdown for chat, plaintext for email/SMS) |
331
- | **safety** | Security guardrails, prompt injection resistance, data handling |
332
- | **coverage-gaps** | Coverage gap disclosure rules (only in mediated/observed mode) |
333
- | **coi-routing** | Certificate of Insurance request handling |
334
- | **quotes-policies** | Guidance for distinguishing quotes vs. active policies |
335
- | **conversation-memory** | Context about conversation history and document retrieval |
336
-
337
- ### Message Intent Classification
338
-
339
- Classify incoming messages to route them appropriately:
340
-
341
- ```typescript
342
- import { buildClassifyMessagePrompt } from "@claritylabs/cl-sdk";
343
-
344
- const prompt = buildClassifyMessagePrompt("email");
345
- // Returns classification prompt for intents:
346
- // policy_question, coi_request, renewal_inquiry, claim_report,
347
- // coverage_shopping, general, unrelated
348
- ```
349
-
350
- ## Application Processing Pipeline
351
-
352
- The application pipeline processes insurance applications through an agentic coordinator/worker system — small focused agents handle classification, field extraction, auto-fill, question batching, reply routing, and PDF mapping. Supports persistent state and vector-based answer backfill from prior applications.
353
-
354
- ### Quick Start
355
-
356
- ```typescript
357
- import { createApplicationPipeline } from "@claritylabs/cl-sdk";
358
-
359
- const pipeline = createApplicationPipeline({
360
- generateText,
361
- generateObject,
362
- applicationStore, // persistent state storage
363
- documentStore, // for policy/quote lookups during auto-fill
364
- memoryStore, // for vector-based answer backfill
365
- orgContext: [ // business context for auto-fill
366
- { key: "company_name", value: "Acme Corp", category: "company_info" },
367
- { key: "company_address", value: "123 Main St", category: "company_info" },
368
- ],
369
- });
370
-
371
- // Process a new application PDF
372
- const { state } = await pipeline.processApplication({
373
- pdfBase64: "...",
374
- applicationId: "app-123",
375
- });
376
- // state.fields → extracted fields, some already auto-filled
377
- // state.batches → question batches ready for user collection
378
-
379
- // Generate email for current batch
380
- const { text: emailBody } = await pipeline.generateCurrentBatchEmail("app-123", {
381
- companyName: "Acme Corp",
382
- });
383
-
384
- // Process user's reply
385
- const { state: updated, fieldsFilled, responseText } = await pipeline.processReply({
386
- applicationId: "app-123",
387
- replyText: "1. Yes\n2. $1,000,000\n3. Check our website for revenue",
388
- });
389
- ```
390
-
391
- ### Pipeline Phases
392
-
393
- ```
394
- ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐
395
- │ 1. CLASSIFY │────>│ 2. EXTRACT │────>│ 3. BACKFILL + │
396
- │ │ │ FIELDS │ │ AUTO-FILL │
397
- │ Is this an │ │ │ │ (parallel) │
398
- │ application? │ │ All fillable │ │ │
399
- │ │ │ fields as │ │ • vector backfill │
400
- │ │ │ structured │ │ • context auto-fill │
401
- │ │ │ data │ │ • document search │
402
- └──────────────┘ └──────────────┘ └──────────┬──────────┘
403
-
404
- ┌──────────────┐ ┌──────────v──────────┐
405
- │ REPLY LOOP │<────│ 4. BATCH QUESTIONS │
406
- │ │ │ │
407
- │ Route intent │ │ Group unfilled │
408
- │ Parse answers│ │ fields by topic │
409
- │ Handle lookup│ │ Generate emails │
410
- │ Explain field│ │ │
411
- └──────┬───────┘ └─────────────────────┘
412
-
413
- ┌──────v───────┐
414
- │ 5. CONFIRM + │
415
- │ MAP PDF │
416
- └──────────────┘
417
- ```
418
-
419
- ### Focused Agents (8 types)
420
-
421
- | Agent | Task | Model Size |
422
- |-------|------|-----------|
423
- | `classifier` | Detect if PDF is an application | Tiny |
424
- | `field-extractor` | Extract all form fields | Medium |
425
- | `auto-filler` | Match fields to business context | Small |
426
- | `batcher` | Group fields into topic batches | Small |
427
- | `reply-router` | Classify reply intent | Tiny |
428
- | `answer-parser` | Extract answers from replies | Small |
429
- | `lookup-filler` | Fill from policy/record lookups | Small |
430
- | `email-generator` | Generate professional batch emails | Small |
431
-
432
- ### Vector-Based Answer Backfill
433
-
434
- The `BackfillProvider` interface enables searching prior application answers and extracted document data to pre-fill new applications:
435
-
436
- ```typescript
437
- interface BackfillProvider {
438
- searchPriorAnswers(
439
- fields: { id: string; label: string; section: string; fieldType: string }[],
440
- options?: { limit?: number },
441
- ): Promise<PriorAnswer[]>;
442
- }
443
- ```
444
-
445
- This runs in parallel with context-based auto-fill, so the pipeline fills as many fields as possible before asking the user anything.
446
-
447
- ### Application Prompts (for advanced use)
448
-
449
- The individual prompt functions are still exported for custom pipelines:
450
-
451
- ```typescript
452
- import {
453
- buildFieldExtractionPrompt,
454
- buildAutoFillPrompt,
455
- buildQuestionBatchPrompt,
456
- buildAnswerParsingPrompt,
457
- buildConfirmationSummaryPrompt,
458
- buildBatchEmailGenerationPrompt,
459
- buildReplyIntentClassificationPrompt,
460
- buildFieldExplanationPrompt,
461
- buildFlatPdfMappingPrompt,
462
- buildAcroFormMappingPrompt,
463
- buildLookupFillPrompt,
464
- } from "@claritylabs/cl-sdk";
465
- ```
466
-
467
- ## Query Agent Pipeline
468
-
469
- The query agent answers user questions against stored documents with citation-backed provenance. It mirrors the extraction pipeline's coordinator/worker pattern: a classifier decomposes questions, retrievers pull evidence in parallel, reasoners answer from evidence only, and a verifier checks grounding.
470
-
471
- ### Quick Start
472
-
473
- ```typescript
474
- import { createQueryAgent } from "@claritylabs/cl-sdk";
475
-
476
- const agent = createQueryAgent({
477
- generateText,
478
- generateObject,
479
- documentStore, // where extracted documents are stored
480
- memoryStore, // where document chunks + conversation history live
481
- });
482
-
483
- const result = await agent.query({
484
- question: "What is the deductible on our GL policy?",
485
- conversationId: "conv-123",
486
- });
487
-
488
- console.log(result.answer); // Natural language answer
489
- console.log(result.citations); // Source references with exact quotes
490
- console.log(result.confidence); // 0-1 confidence score
491
- ```
492
-
493
- ### Pipeline Phases
494
-
495
- ```
496
- ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐
497
- │ 1. CLASSIFY │────>│ 2. RETRIEVE │────>│ 3. REASON │
498
- │ │ │ (parallel) │ │ (parallel) │
499
- │ Intent + │ │ │ │ │
500
- │ sub-question │ │ chunk search │ │ Answer each sub-Q │
501
- │ decomposition│ │ doc lookup │ │ from evidence only │
502
- │ │ │ conv history │ │ │
503
- └──────────────┘ └──────────────┘ └─────────┬──────────┘
504
-
505
- ┌──────────────┐ ┌─────────v──────────┐
506
- │ 5. RESPOND │<────│ 4. VERIFY │
507
- │ │ │ │
508
- │ Format with │ │ Grounding check │
509
- │ citations, │ │ Consistency check │
510
- │ store turn │ │ Completeness check │
511
- └──────────────┘ └────────────────────┘
512
- ```
513
-
514
- **Phase 1 — Classify:** Determines intent (`policy_question`, `coverage_comparison`, `document_search`, `claims_inquiry`, `general_knowledge`) and decomposes complex questions into atomic sub-questions. Each sub-question specifies which chunk types and document filters to use for retrieval.
515
-
516
- **Phase 2 — Retrieve (parallel):** For each sub-question, a retriever searches chunk embeddings, does structured document lookups, and pulls conversation history — all in parallel. Returns ranked evidence items.
517
-
518
- **Phase 3 — Reason (parallel):** For each sub-question, a reasoner receives only the retrieved evidence (never the full document) and produces a sub-answer with citations. Intent-specific prompts guide reasoning (e.g., coverage questions get prompts tuned for interpreting limits and endorsements).
519
-
520
- **Phase 4 — Verify:** The verifier checks that every claim is grounded in a citation, sub-answers don't contradict each other, and no evidence was overlooked. If issues are found, it can trigger re-retrieval with broader context.
521
-
522
- **Phase 5 — Respond:** Merges verified sub-answers into a single natural-language response with inline citations (`[1]`, `[2]`), deduplicates references, and stores the exchange as conversation turns.
523
-
524
- ### Configuration
525
-
526
- ```typescript
527
- const agent = createQueryAgent({
528
- // Required
529
- generateText,
530
- generateObject,
531
- documentStore,
532
- memoryStore,
533
-
534
- // Optional: tuning
535
- concurrency: 3, // max parallel retrievers/reasoners (default: 3)
536
- maxVerifyRounds: 1, // verification loop iterations (default: 1)
537
- retrievalLimit: 10, // max evidence items per sub-question (default: 10)
538
-
539
- // Optional: observability
540
- onTokenUsage: (usage) => console.log(`${usage.inputTokens} in, ${usage.outputTokens} out`),
541
- onProgress: (message) => console.log(message),
542
- log: async (message) => logger.info(message),
543
- providerOptions: {},
544
- });
545
- ```
546
-
547
- ### Citations
548
-
549
- Every factual claim in the answer references its source:
550
-
551
- ```typescript
552
- interface Citation {
553
- index: number; // [1], [2], etc.
554
- chunkId: string; // e.g. "doc-123:coverage:2"
555
- documentId: string;
556
- documentType?: "policy" | "quote";
557
- field?: string; // e.g. "coverages[0].deductible"
558
- quote: string; // exact text from source
559
- relevance: number; // 0-1 similarity score
560
- }
561
- ```
562
-
563
- ## PDF Operations
564
-
565
- ```typescript
566
- import {
567
- extractPageRange, // Extract specific pages from a PDF
568
- getPdfPageCount, // Get total page count
569
- getAcroFormFields, // Enumerate form fields (text, checkbox, dropdown, radio)
570
- fillAcroForm, // Fill and flatten AcroForm fields
571
- overlayTextOnPdf, // Overlay text at coordinates on flat PDFs
572
- } from "@claritylabs/cl-sdk";
573
- ```
574
-
575
- ## Tool Definitions
576
-
577
- Claude `tool_use`-compatible schemas for agent integrations:
578
-
579
- ```typescript
580
- import {
581
- AGENT_TOOLS, // All tools as an array
582
- DOCUMENT_LOOKUP_TOOL, // Search/retrieve policies and quotes
583
- COI_GENERATION_TOOL, // Generate Certificates of Insurance
584
- COVERAGE_COMPARISON_TOOL, // Compare coverages across documents
585
- } from "@claritylabs/cl-sdk";
586
- ```
587
-
588
- These are schema-only definitions (input schemas + descriptions). You provide the implementations that call your storage and PDF layers.
589
-
590
- ## Document Types
591
-
592
- All types are derived from Zod schemas, providing both runtime validation and TypeScript types:
593
-
594
- ```typescript
595
- import type {
596
- InsuranceDocument, // PolicyDocument | QuoteDocument (discriminated union)
597
- PolicyDocument, // Extracted policy with all enrichments
598
- QuoteDocument, // Extracted quote with subjectivities, premium breakdown
599
- Coverage, // Coverage name, limits, deductibles, form
600
- EnrichedCoverage, // Coverage + additional metadata
601
- Endorsement, // Form number, title, type, content
602
- Exclusion, // Title, content, applicability
603
- Condition, // Type, title, content
604
- Declaration, // Line-specific declarations (19 types)
605
- Platform, // email | chat | sms | slack | discord
606
- AgentContext, // Platform + intent + user/company context
607
- } from "@claritylabs/cl-sdk";
608
- ```
609
-
610
- ### Supported Policy Types
611
-
612
- 42 policy types across personal and commercial lines — including general liability, commercial property, workers' comp, cyber, D&O, homeowners (HO-3/HO-5/HO-4/HO-6), personal auto, flood (NFIP + private), earthquake, and more.
613
-
614
- ## Core Utilities
615
-
616
- ```typescript
617
- import {
618
- withRetry, // Exponential backoff with jitter (5 retries, 2–32s) for rate limits + transient errors
619
- pLimit, // Concurrency limiter for parallel async tasks
620
- sanitizeNulls, // Recursively convert null → undefined (for database compatibility)
621
- stripFences, // Remove markdown code fences from LLM JSON responses
622
- safeGenerateObject, // generateObject wrapper with retry, schema strictification, and fallback
623
- toStrictSchema, // Convert .optional() → .nullable() for strict structured output APIs
624
- } from "@claritylabs/cl-sdk";
625
- ```
626
-
627
- ### Schema Compatibility
628
-
629
- The SDK automatically handles schema compatibility with strict structured output APIs (like OpenAI). Two key mechanisms:
630
-
631
- **`toStrictSchema(schema)`** — Recursively transforms Zod schemas so `.optional()` properties become `.nullable()`. This ensures all properties appear in the JSON Schema `required` array, which OpenAI requires. Applied automatically inside the pipeline — you don't need to call this yourself unless building custom pipelines.
632
-
633
- **`safeGenerateObject(generateObject, params, options?)`** — Wraps a `generateObject` call with:
634
- 1. Automatic schema strictification via `toStrictSchema`
635
- 2. Retry on schema validation errors and transient API failures
636
- 3. Optional fallback value when all retries are exhausted
637
-
638
- ```typescript
639
- import { safeGenerateObject } from "@claritylabs/cl-sdk";
640
-
641
- const { object, usage } = await safeGenerateObject(
642
- myGenerateObject,
643
- { prompt: "...", schema: MySchema, maxTokens: 1024 },
644
- {
645
- fallback: { field: "default" }, // Return this if all retries fail
646
- maxRetries: 2, // Schema validation retries (default: 1)
647
- log: async (msg) => console.log(msg),
648
- },
649
- );
650
- ```
44
+ See the [full documentation](https://cl-sdk.claritylabs.inc/docs) for architecture, provider setup, API reference, and more.
651
45
 
652
46
  ## Development
653
47
 
654
48
  ```bash
655
49
  npm install
656
- npm run build # Build ESM + CJS + types via tsup
50
+ npm run build # ESM + CJS + types via tsup
657
51
  npm run dev # Watch mode
658
- npm run typecheck # Type check (tsc --noEmit)
52
+ npm run typecheck # tsc --noEmit
53
+ npm test # vitest
659
54
  ```
660
55
 
661
56
  Zero framework dependencies. Peer deps: `pdf-lib`, `zod`. Optional: `better-sqlite3`.