npm - @claritylabs/cl-sdk - Versions diffs - 0.5.0 → 0.7.0 - Mend

@claritylabs/cl-sdk 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +33 -9
package/dist/index.d.mts +382 -77
package/dist/index.d.ts +382 -77
package/dist/index.js +718 -205
package/dist/index.js.map +1 -1
package/dist/index.mjs +715 -205
package/dist/index.mjs.map +1 -1
package/dist/storage-sqlite.d.mts +52 -10
package/dist/storage-sqlite.d.ts +52 -10
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -104,14 +104,22 @@ The extraction system uses a **coordinator/worker pattern** — a coordinator ag
 │              │     │  to pages   │     │                      │
 └─────────────┘     └─────────────┘     └──────────┬───────────┘
                                                    │
-                    ┌─────────────┐     ┌──────────▼───────────┐
-                    │ 5. ASSEMBLE │◀────│  4. REVIEW           │
-                    │             │     │                      │
-                    │  Merge all  │     │  Check completeness  │
-                    │  results,   │     │  against template,   │
-                    │  validate,  │     │  dispatch follow-up  │
-                    │  chunk      │     │  extractors for gaps │
-                    └─────────────┘     └──────────────────────┘
+┌─────────────┐     ┌─────────────┐     ┌──────────▼───────────┐
+│ 6. FORMAT   │◀────│ 5. ASSEMBLE │◀────│  4. REVIEW           │
+│             │     │             │     │                      │
+│  Clean up   │     │  Merge all  │     │  Check completeness  │
+│  markdown   │     │  results    │     │  against template,   │
+│  tables,    │     │  into final │     │  dispatch follow-up  │
+│  spacing    │     │  document   │     │  extractors for gaps │
+└──────┬──────┘     └─────────────┘     └──────────────────────┘
+       │
+┌──────▼──────┐
+│ 7. CHUNK    │
+│  Break into │
+│  retrieval- │
+│  ready      │
+│  chunks     │
+└─────────────┘
 ```
 #### Phase 1: Classify
@@ -151,7 +159,23 @@ After initial extraction, a review loop (up to `maxReviewRounds`, default 2) che
 #### Phase 5: Assemble
-All extractor results are merged into a final validated `InsuranceDocument`, then chunked into `DocumentChunk[]` for vector storage. Chunks are deterministically IDed as `${documentId}:${type}:${index}`.
+All extractor results are merged into a final validated `InsuranceDocument`.
+#### Phase 6: Format
+A formatting agent pass cleans up markdown in all content-bearing string fields (sections, subsections, endorsements, exclusions, conditions, summary). It fixes:
+- **Pipe tables missing separator rows** — adds `| --- | --- |` and leading/trailing pipes
+- **Space-aligned tables** — converts whitespace-padded columns into proper markdown tables
+- **Sub-items mixed into tables** — pulls indented sub-items out of tables into lists
+- **Mixed table/prose content** — handles each segment independently
+- **General cleanup** — excessive blank lines, trailing whitespace, orphaned formatting markers
+Content is batched (up to 20 fields per call) and sent through `generateText` for formatting cleanup. Token usage is tracked the same as other pipeline steps.
+#### Phase 7: Chunk
+The formatted document is chunked into `DocumentChunk[]` for vector storage. Chunks are deterministically IDed as `${documentId}:${type}:${index}`.
 ### Configuration