npm - hazo_pdf - Versions diffs - 1.6.7 → 1.7.0 - Mend

hazo_pdf 1.6.7 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/README.md +83 -0
package/dist/{chunk-TJBBE34D.js → chunk-4JJOUQ62.js} +7 -6
package/dist/{chunk-TJBBE34D.js.map → chunk-4JJOUQ62.js.map} +1 -1
package/dist/chunk-KHB3VZJQ.js +157 -0
package/dist/chunk-KHB3VZJQ.js.map +1 -0
package/dist/index.d.ts +178 -1
package/dist/index.js +1545 -3
package/dist/index.js.map +1 -1
package/dist/{pdf_viewer-KMV7W3DA.js → pdf_viewer-B6S5PJJB.js} +2 -2
package/dist/server/index.d.ts +164 -1
package/dist/server/index.js +439 -1
package/dist/server/index.js.map +1 -1
package/dist/server/text_search-2OZOVUIP.js +154 -0
package/dist/server/text_search-2OZOVUIP.js.map +1 -0
package/dist/styles/full.css +184 -0
package/dist/styles/full.css.map +1 -1
package/dist/styles/index.css +136 -0
package/dist/styles/index.css.map +1 -1
package/dist/text_search-I2KZ7DTW.js +11 -0
package/package.json +7 -3
package/dist/chunk-FXOJ3DPX.js +0 -71
package/dist/chunk-FXOJ3DPX.js.map +0 -1
package/dist/text_search-GW2VYMU6.js +0 -9
/package/dist/{pdf_viewer-KMV7W3DA.js.map → pdf_viewer-B6S5PJJB.js.map} +0 -0
/package/dist/{text_search-GW2VYMU6.js.map → text_search-I2KZ7DTW.js.map} +0 -0

package/README.md CHANGED Viewed

@@ -21,6 +21,7 @@ A React component library for viewing and annotating PDF documents with support
 - ☁️ **Remote Storage** - Load and save PDFs from Google Drive, Dropbox, or local storage (via hazo_files)
 - 🖼️ **Dialog Component** - Ready-to-use modal dialog wrapper (`PdfViewerDialog`)
 - 🔧 **Server Utilities** - Server-side extraction utilities via `hazo_pdf/server` entry point
+- 🔎 **Text Snippet Extraction** - Server-side: find text in a PDF, highlight it, and return a cropped image snippet
 ## Installation
@@ -54,6 +55,14 @@ npm install hazo_llm_api
 The `hazo_llm_api` package is an optional peer dependency. When installed, it enables server-side document data extraction via the `hazo_pdf/server` entry point. See [Server-Side Extraction](#server-side-extraction) for details.
+**@napi-rs/canvas** (optional): For server-side text snippet extraction
+```bash
+npm install @napi-rs/canvas
+```
+Required only for the `extract_text_snippet()` server utility. Already installed as a transitive dependency of `pdfjs-dist` in most environments. See [Text Snippet Extraction](#text-snippet-extraction) for details.
 ## CSS Import Options
 The library provides two CSS files to choose from:
@@ -2461,6 +2470,80 @@ interface ExtractDocumentResult {
 ---
+## Text Snippet Extraction
+Extract a cropped image snippet from a PDF with the search text highlighted. Useful for attaching contextual evidence from documents.
+### Basic Usage
+```typescript
+import { extract_text_snippet } from 'hazo_pdf/server';
+const result = await extract_text_snippet(
+  { file_path: '/path/to/document.pdf' },
+  {
+    search_text: 'invoice total',
+    snippet_size: 'half',       // 'full' | 'half' | 'quarter'
+    match_mode: 'first',        // 'first' | 'all'
+  }
+);
+if (result.success) {
+  for (const snippet of result.snippets) {
+    // snippet.image_buffer  - PNG Buffer (for saving to file or API response)
+    // snippet.image_base64  - Base64 PNG (for embedding in HTML/JSON)
+    // snippet.page_index    - Which page (0-based)
+    // snippet.matches       - Array of match positions
+    // snippet.highlight_approximate - true for image-based PDFs
+  }
+}
+```
+### Snippet Sizes
+All snippet sizes use the full page width. Only the height varies:
+| Size | Dimensions |
+|------|-----------|
+| `full` | Full page (W x H) |
+| `half` | Full width, half height (W x H/2) |
+| `quarter` | Full width, quarter height (W x H/4) |
+The snippet is centered on the matched text. If multiple matches are found on the same page, the snippet expands to cover all of them.
+### Match Modes
+| Mode | Behavior |
+|------|----------|
+| `first` | Returns a single snippet from the first page where text is found |
+| `all` | Returns one snippet per page that contains the text |
+### Options
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `search_text` | `string` | Required | Text to search for |
+| `page_index` | `number` | `0` | Page to start searching (0-based) |
+| `snippet_size` | `SnippetSize` | `'half'` | Crop size relative to page |
+| `match_mode` | `SnippetMatchMode` | `'first'` | Return first match or all pages |
+| `render_scale` | `number` | `2.0` | Rendering quality (higher = sharper, larger file) |
+| `highlight_color` | `string` | `'rgba(255, 255, 0, 0.35)'` | Highlight color (CSS color string) |
+| `use_llm_for_image_pdf` | `boolean` | `false` | Use LLM vision for image-based PDFs |
+### Image-Based PDFs
+For scanned/image PDFs where text cannot be extracted from the text layer:
+- **Without LLM** (default): Returns full page as the snippet, no highlighting
+- **With LLM** (`use_llm_for_image_pdf: true`): Uses `hazo_llm_api` vision to find approximate text location and draw a rough highlight
+### Requirements
+- **`@napi-rs/canvas`**: Required for server-side canvas rendering. Usually already installed as a transitive dependency of `pdfjs-dist`.
+- **Server-only**: This utility runs in Node.js only (not in the browser).
+---
 ## Development
 ### Setup

package/dist/{chunk-TJBBE34D.js → chunk-4JJOUQ62.js} RENAMED Viewed

@@ -2390,8 +2390,8 @@ async function load_pdf_config_async(config_file) {
     return load_config_browser(config_file);
   }
   try {
-    const { HazoConfig } = __require("hazo_config");
-    const hazo_config = new HazoConfig({ filePath: config_file });
+    const { AppConfig } = __require("hazo_config");
+    const hazo_config = new AppConfig({ filePath: config_file });
     logger.debug(`Using hazo_config to load: ${config_file}`);
     const get_value = (section, key) => {
       return hazo_config.get(section, key);
@@ -3000,8 +3000,8 @@ function load_pdf_config(config_file) {
     return default_config;
   }
   try {
-    const { HazoConfig } = __require("hazo_config");
-    const hazo_config = new HazoConfig({ filePath: config_file });
+    const { AppConfig } = __require("hazo_config");
+    const hazo_config = new AppConfig({ filePath: config_file });
     const get_value = (section, key) => {
       return hazo_config.get(section, key);
     };
@@ -5334,7 +5334,7 @@ ${suffix_line}`;
     });
     setAutoHighlightIds(/* @__PURE__ */ new Set());
     const perform_auto_highlights = async () => {
-      const { find_text_in_pdf } = await import("./text_search-GW2VYMU6.js");
+      const { find_text_in_pdf } = await import("./text_search-I2KZ7DTW.js");
       const new_ids = /* @__PURE__ */ new Set();
       const auto_config = config_ref.current?.auto_highlight || default_config.auto_highlight;
       const search_opts = {
@@ -6365,6 +6365,7 @@ PdfViewer.displayName = "PdfViewer";
 var pdf_viewer_default = PdfViewer;
 export {
+  cn,
   load_pdf_document,
   create_coordinate_mapper,
   get_viewport_dimensions,
@@ -6400,4 +6401,4 @@ export {
   PdfViewer,
   pdf_viewer_default
 };
-//# sourceMappingURL=chunk-TJBBE34D.js.map
+//# sourceMappingURL=chunk-4JJOUQ62.js.map