npm - aiex-cli - Versions diffs - 0.0.5-beta.5 → 0.0.5-beta.6 - Mend

aiex-cli 0.0.5-beta.5 → 0.0.5-beta.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +4 -4
package/dist/cli.mjs +638 -377
package/dist/{doctor-collector-NTNBFeBw.mjs → doctor-collector-BpqhXNcO.mjs} +26 -91
package/dist/index.mjs +1 -1
package/dist/web/assets/AISettings-sVI4PTNB.js +264 -0
package/dist/web/assets/{DataBrowser-GAA-pGq0.js → DataBrowser-BGkZb9FV.js} +1 -1
package/dist/web/assets/{ExtractionViewer-BhhWrBs2.js → ExtractionViewer-DNrkSECj.js} +1 -1
package/dist/web/assets/{api-client-b4ZBXpNH.js → api-client-gQAAOw0v.js} +1 -1
package/dist/web/assets/{index-CKV2X6sS.js → index-BQKZKzzP.js} +3 -3
package/dist/web/assets/index-BU58oIRd.css +2 -0
package/dist/web/index.html +3 -3
package/dist/{zh-CN-Ca-Dv775.mjs → zh-CN-DkillGHx.mjs} +10 -23
package/package.json +1 -1
package/dist/web/assets/AISettings-BlyTFIIy.js +0 -272
package/dist/web/assets/index-Csdgio76.css +0 -2

package/README.md CHANGED Viewed

@@ -204,12 +204,12 @@ aiex completion fish | source
 ## 📄 Large Document Processing
-When processing very large documents (exceeding `40,000` characters), `aiex` runs an optimized **Pipeline Mode** to handle context window limits and control API costs:
+`aiex` uses a unified text extraction pipeline for both short and very large documents. Source files are converted to text or Markdown first; images are converted with OCR before structured extraction.
-- **Token-Aware AST Splitting**: Parses structural Markdown elements (headings, paragraphs, lists) using an AST-based parser (`marked.lexer`) and splits them using precise token counters (`js-tiktoken`). Active heading hierarchies are tracked and prepended to each chunk as context. Tables and code blocks are kept intact (atomic blocks) to avoid syntax corruption.
+- **Token-Aware AST Splitting**: Parses structural Markdown elements (headings, paragraphs, lists) using an AST-based parser (`marked.lexer`) and splits them using precise token counters (`js-tiktoken`). Active heading hierarchies are tracked and prepended to each chunk as context. Short documents run through the same pipeline as a single chunk.
 - **Concurrency Limiting**: To respect strict model rate limits, chunk extractions are processed in parallel with a strict concurrency limit (capped at 2 concurrent requests).
-- **Pre-filtering**: Integrates hybrid search-based pre-filtering to score and select only the most relevant document chunks based on schema queries, preventing unnecessary token usage on unrelated sections.
-- **Recursive Merging**: The final extracted JSON objects from each chunk are recursively merged, concatenating lists and deduplicating primitive fields.
+- **Candidate & Evidence Merging**: Chunk results are merged into schema-shaped candidates, with evidence coverage used to select scalar conflicts and preserve traceability.
+- **Schema Validation & Correction**: Merged output is validated against the JSON Schema. When correction is needed, the corrected output is rechecked against evidence before being written.
 <br>