npm - aiex-cli - Versions diffs - 0.0.5-beta.3 → 0.0.5-beta.5 - Mend

aiex-cli 0.0.5-beta.3 → 0.0.5-beta.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +6 -16
package/dist/cli.mjs +340 -355
package/dist/{doctor-collector-CQPDBVTw.mjs → doctor-collector-NTNBFeBw.mjs} +12 -6
package/dist/index.mjs +1 -1
package/dist/web/assets/AISettings-BlyTFIIy.js +272 -0
package/dist/web/assets/ExtractionViewer-BhhWrBs2.js +1 -0
package/dist/web/assets/{index-BWm_fhNt.js → index-CKV2X6sS.js} +2 -2
package/dist/web/assets/index-Csdgio76.css +2 -0
package/dist/web/index.html +2 -2
package/dist/{zh-CN-CKxdpj8c.mjs → zh-CN-Ca-Dv775.mjs} +2 -3
package/package.json +3 -1
package/dist/web/assets/AISettings-DoDVYWfb.js +0 -272
package/dist/web/assets/ExtractionViewer-DqIrBGNK.js +0 -1
package/dist/web/assets/index-CvY9TGny.css +0 -2

package/README.md CHANGED Viewed

@@ -70,7 +70,6 @@ aiex extract -s <schema> -f <file>        # from file (txt, pdf, png, jpg, ...)
 aiex extract -s <schema> -f <file> -m <model>      # specify AI model (overrides auto-selection)
 aiex extract -s <schema> -f <file> --no-insert     # extract and save JSON without inserting into SQLite
 aiex extract -s <schema> -f <file> --force         # force re-extraction even if already processed
-aiex extract -s <schema> -f <file> --agent         # run ReAct agent mode (ideal for large documents)
 aiex extract -s <schema> -d <directory>            # batch extract all supported files in a directory
 aiex extract -s <schema> -d <dir> -g "*.pdf"       # batch with glob filter
 aiex extract history                               # list extraction audit records
@@ -129,7 +128,6 @@ Dumps all extracted data for a given schema (or table) from the SQLite database
 | `aiex extract -s <name> -f <file> -m <model>` | Extract with a specific AI model |
 | `aiex extract -s <name> -f <file> --no-insert` | Extract and save JSON without inserting into SQLite |
 | `aiex extract -s <name> -f <file> --force` | Force re-extraction even if the file has already been processed |
-| `aiex extract -s <name> -f <file> --agent` | Extract data in ReAct agent mode (using tool navigation) |
 | `aiex extract -s <name> -d <dir>` | Batch extract all supported files in a directory |
 | `aiex extract -s <name> -d <dir> -g "*.pdf"` | Batch extract with glob filter |
 | `aiex extract history` | List extraction audit records |
@@ -204,22 +202,14 @@ aiex completion fish | source
 <br>
-## 📄 Large Document Processing (Pipeline vs. ReAct Agent)
+## 📄 Large Document Processing
-When processing very large documents (exceeding `40,000` characters), `aiex` provides two separate modes to handle context window limits and cost:
+When processing very large documents (exceeding `40,000` characters), `aiex` runs an optimized **Pipeline Mode** to handle context window limits and control API costs:
-### 1. Pipeline Mode (Default)
-- **Mechanism**: Splits the document logically at Markdown headings or paragraph boundaries. It processes each chunk sequentially through the LLM, prepending active heading stacks as context to prevent losing track of document structure (like headers). Finally, it merges the outputs recursively.
-- **Best for**: Small-to-medium files or structures where every single section must be scanned completely (e.g. log files).
-### 2. ReAct Agent Mode
-- **Mechanism**: Spawns an agent equipped with document navigation tools:
-  - `listChunks()`: Returns a Table of Contents (headings, sizes, indices).
-  - `readChunk(chunkId)`: Fetches a specific section.
-  - `searchChunks(query)`: Matches keywords across all chunks.
-  - `submitExtraction(data)`: Submits the final structured JSON payload.
-  The agent uses these tools to dynamically browse and retrieve only the relevant parts, drastically reducing API token costs for giant documents.
-- **How to run**: Pass `--agent` / `-a` via the CLI, or toggle **Extraction Mode** under the **Prompts** tab in the Web UI.
+- **Token-Aware AST Splitting**: Parses structural Markdown elements (headings, paragraphs, lists) using an AST-based parser (`marked.lexer`) and splits them using precise token counters (`js-tiktoken`). Active heading hierarchies are tracked and prepended to each chunk as context. Tables and code blocks are kept intact (atomic blocks) to avoid syntax corruption.
+- **Concurrency Limiting**: To respect strict model rate limits, chunk extractions are processed in parallel with a strict concurrency limit (capped at 2 concurrent requests).
+- **Pre-filtering**: Integrates hybrid search-based pre-filtering to score and select only the most relevant document chunks based on schema queries, preventing unnecessary token usage on unrelated sections.
+- **Recursive Merging**: The final extracted JSON objects from each chunk are recursively merged, concatenating lists and deduplicating primitive fields.
 <br>