@claritylabs/cl-sdk 0.5.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +33 -9
- package/dist/index.d.mts +382 -77
- package/dist/index.d.ts +382 -77
- package/dist/index.js +718 -205
- package/dist/index.js.map +1 -1
- package/dist/index.mjs +715 -205
- package/dist/index.mjs.map +1 -1
- package/dist/storage-sqlite.d.mts +52 -10
- package/dist/storage-sqlite.d.ts +52 -10
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -104,14 +104,22 @@ The extraction system uses a **coordinator/worker pattern** — a coordinator ag
|
|
|
104
104
|
│ │ │ to pages │ │ │
|
|
105
105
|
└─────────────┘ └─────────────┘ └──────────┬───────────┘
|
|
106
106
|
│
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
107
|
+
┌─────────────┐ ┌─────────────┐ ┌──────────▼───────────┐
|
|
108
|
+
│ 6. FORMAT │◀────│ 5. ASSEMBLE │◀────│ 4. REVIEW │
|
|
109
|
+
│ │ │ │ │ │
|
|
110
|
+
│ Clean up │ │ Merge all │ │ Check completeness │
|
|
111
|
+
│ markdown │ │ results │ │ against template, │
|
|
112
|
+
│ tables, │ │ into final │ │ dispatch follow-up │
|
|
113
|
+
│ spacing │ │ document │ │ extractors for gaps │
|
|
114
|
+
└──────┬──────┘ └─────────────┘ └──────────────────────┘
|
|
115
|
+
│
|
|
116
|
+
┌──────▼──────┐
|
|
117
|
+
│ 7. CHUNK │
|
|
118
|
+
│ Break into │
|
|
119
|
+
│ retrieval- │
|
|
120
|
+
│ ready │
|
|
121
|
+
│ chunks │
|
|
122
|
+
└─────────────┘
|
|
115
123
|
```
|
|
116
124
|
|
|
117
125
|
#### Phase 1: Classify
|
|
@@ -151,7 +159,23 @@ After initial extraction, a review loop (up to `maxReviewRounds`, default 2) che
|
|
|
151
159
|
|
|
152
160
|
#### Phase 5: Assemble
|
|
153
161
|
|
|
154
|
-
All extractor results are merged into a final validated `InsuranceDocument
|
|
162
|
+
All extractor results are merged into a final validated `InsuranceDocument`.
|
|
163
|
+
|
|
164
|
+
#### Phase 6: Format
|
|
165
|
+
|
|
166
|
+
A formatting agent pass cleans up markdown in all content-bearing string fields (sections, subsections, endorsements, exclusions, conditions, summary). It fixes:
|
|
167
|
+
|
|
168
|
+
- **Pipe tables missing separator rows** — adds `| --- | --- |` and leading/trailing pipes
|
|
169
|
+
- **Space-aligned tables** — converts whitespace-padded columns into proper markdown tables
|
|
170
|
+
- **Sub-items mixed into tables** — pulls indented sub-items out of tables into lists
|
|
171
|
+
- **Mixed table/prose content** — handles each segment independently
|
|
172
|
+
- **General cleanup** — excessive blank lines, trailing whitespace, orphaned formatting markers
|
|
173
|
+
|
|
174
|
+
Content is batched (up to 20 fields per call) and sent through `generateText` for formatting cleanup. Token usage is tracked the same as other pipeline steps.
|
|
175
|
+
|
|
176
|
+
#### Phase 7: Chunk
|
|
177
|
+
|
|
178
|
+
The formatted document is chunked into `DocumentChunk[]` for vector storage. Chunks are deterministically IDed as `${documentId}:${type}:${index}`.
|
|
155
179
|
|
|
156
180
|
### Configuration
|
|
157
181
|
|