hazo_files 1.4.6 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -20,6 +20,7 @@ A powerful, modular file management package for Node.js and React applications w
20
20
  - **File Change Detection**: xxHash-based content hashing for efficient change detection
21
21
  - **Content Tagging**: Optional LLM-based content classification at upload time or on-demand via `content_tag` field
22
22
  - **Schema Migrations**: Built-in V2/V3 migration utilities for adding reference tracking and content tagging to existing databases
23
+ - **Background Upload Pipelines**: Framework-agnostic `UploadManager` + React `HazoFileUploadProvider` for multi-step upload pipelines that survive component unmount, with optional sonner toast bridge
23
24
  - **TypeScript**: Full type safety and IntelliSense support
24
25
  - **OAuth Integration**: Built-in Google Drive and Dropbox OAuth authentication
25
26
  - **Prompt Cache Invalidation**: Passthrough for hazo_llm_api prompt cache management via server instance
@@ -61,6 +62,12 @@ npm install server-only # Server-side safety (recommended)
61
62
  npm install xxhash-wasm # File change detection (optional)
62
63
  ```
63
64
 
65
+ For the background-upload sonner toast bridge (optional):
66
+
67
+ ```bash
68
+ npm install sonner # Toast notifications for background upload pipelines
69
+ ```
70
+
64
71
  ### Tailwind CSS v4 Setup (Required for UI Components)
65
72
 
66
73
  If you're using Tailwind CSS v4 with the UI components, you must add a `@source` directive to your CSS file to ensure Tailwind scans the package's files for utility classes.
@@ -510,6 +517,167 @@ const fileManager = await createInitializedFileManager({
510
517
  });
511
518
  ```
512
519
 
520
+ ## Database Schema
521
+
522
+ Database tables are only required if you use `TrackedFileManager`, `FileMetadataService`, `NamingConventionService`, or `UploadExtractService`. Plain `FileManager` (filesystem only) needs no tables.
523
+
524
+ There are two tables:
525
+
526
+ - **`hazo_files`** — file metadata, hashes, references, content tags
527
+ - **`hazo_files_naming`** — saved naming conventions
528
+
529
+ The DDL below is also exposed programmatically via `HAZO_FILES_TABLE_SCHEMA` and `HAZO_FILES_NAMING_TABLE_SCHEMA` (see [Programmatic Setup](#programmatic-setup) below). Run the raw SQL if you prefer to manage migrations with your existing tooling (psql, sqlite3, Flyway, Knex, etc.).
530
+
531
+ ### `hazo_files` Table
532
+
533
+ #### PostgreSQL
534
+
535
+ ```sql
536
+ CREATE TABLE IF NOT EXISTS hazo_files (
537
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
538
+ filename TEXT NOT NULL,
539
+ file_type TEXT NOT NULL,
540
+ file_data TEXT DEFAULT '{}',
541
+ created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
542
+ changed_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
543
+ file_path TEXT NOT NULL,
544
+ storage_type TEXT NOT NULL,
545
+ file_hash TEXT,
546
+ file_size BIGINT,
547
+ file_changed_at TIMESTAMP WITH TIME ZONE,
548
+ file_refs TEXT DEFAULT '[]',
549
+ ref_count INTEGER DEFAULT 0,
550
+ status TEXT DEFAULT 'active',
551
+ scope_id UUID,
552
+ uploaded_by UUID,
553
+ storage_verified_at TIMESTAMP WITH TIME ZONE,
554
+ deleted_at TIMESTAMP WITH TIME ZONE,
555
+ original_filename TEXT,
556
+ content_tag TEXT
557
+ );
558
+
559
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_path ON hazo_files (file_path);
560
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_storage ON hazo_files (storage_type);
561
+ CREATE UNIQUE INDEX IF NOT EXISTS idx_hazo_files_path_storage ON hazo_files (file_path, storage_type);
562
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_hash ON hazo_files (file_hash);
563
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_status ON hazo_files (status);
564
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_scope ON hazo_files (scope_id);
565
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_ref_count ON hazo_files (ref_count);
566
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_deleted ON hazo_files (deleted_at);
567
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_content_tag ON hazo_files (content_tag);
568
+ ```
569
+
570
+ #### SQLite
571
+
572
+ ```sql
573
+ CREATE TABLE IF NOT EXISTS hazo_files (
574
+ id TEXT PRIMARY KEY,
575
+ filename TEXT NOT NULL,
576
+ file_type TEXT NOT NULL,
577
+ file_data TEXT DEFAULT '{}',
578
+ created_at TEXT NOT NULL,
579
+ changed_at TEXT NOT NULL,
580
+ file_path TEXT NOT NULL,
581
+ storage_type TEXT NOT NULL,
582
+ file_hash TEXT,
583
+ file_size INTEGER,
584
+ file_changed_at TEXT,
585
+ file_refs TEXT DEFAULT '[]',
586
+ ref_count INTEGER DEFAULT 0,
587
+ status TEXT DEFAULT 'active',
588
+ scope_id TEXT,
589
+ uploaded_by TEXT,
590
+ storage_verified_at TEXT,
591
+ deleted_at TEXT,
592
+ original_filename TEXT,
593
+ content_tag TEXT
594
+ );
595
+
596
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_path ON hazo_files (file_path);
597
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_storage ON hazo_files (storage_type);
598
+ CREATE UNIQUE INDEX IF NOT EXISTS idx_hazo_files_path_storage ON hazo_files (file_path, storage_type);
599
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_hash ON hazo_files (file_hash);
600
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_status ON hazo_files (status);
601
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_scope ON hazo_files (scope_id);
602
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_ref_count ON hazo_files (ref_count);
603
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_deleted ON hazo_files (deleted_at);
604
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_content_tag ON hazo_files (content_tag);
605
+ ```
606
+
607
+ ### `hazo_files_naming` Table
608
+
609
+ Required only if you use `NamingConventionService` to persist saved naming rules.
610
+
611
+ #### PostgreSQL
612
+
613
+ ```sql
614
+ CREATE TABLE IF NOT EXISTS hazo_files_naming (
615
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
616
+ scope_id UUID,
617
+ naming_title TEXT NOT NULL,
618
+ naming_type TEXT NOT NULL CHECK(naming_type IN ('file', 'folder', 'both')),
619
+ naming_value TEXT NOT NULL,
620
+ created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
621
+ changed_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
622
+ variables TEXT DEFAULT '[]'
623
+ );
624
+
625
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_scope ON hazo_files_naming (scope_id);
626
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_type ON hazo_files_naming (naming_type);
627
+ ```
628
+
629
+ #### SQLite
630
+
631
+ ```sql
632
+ CREATE TABLE IF NOT EXISTS hazo_files_naming (
633
+ id TEXT PRIMARY KEY,
634
+ scope_id TEXT,
635
+ naming_title TEXT NOT NULL,
636
+ naming_type TEXT NOT NULL CHECK(naming_type IN ('file', 'folder', 'both')),
637
+ naming_value TEXT NOT NULL,
638
+ created_at TEXT NOT NULL,
639
+ changed_at TEXT NOT NULL,
640
+ variables TEXT DEFAULT '[]'
641
+ );
642
+
643
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_scope ON hazo_files_naming (scope_id);
644
+ CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_type ON hazo_files_naming (naming_type);
645
+ ```
646
+
647
+ ### Programmatic Setup
648
+
649
+ The same DDL is available as exported constants so you can run it from your app's startup or migration script without hand-copying SQL:
650
+
651
+ ```typescript
652
+ import {
653
+ HAZO_FILES_TABLE_SCHEMA,
654
+ HAZO_FILES_NAMING_TABLE_SCHEMA,
655
+ } from 'hazo_files';
656
+
657
+ // Pick 'sqlite' or 'postgres'
658
+ const dbType: 'sqlite' | 'postgres' = 'sqlite';
659
+
660
+ // Create hazo_files table
661
+ await db.run(HAZO_FILES_TABLE_SCHEMA[dbType].ddl);
662
+ for (const idx of HAZO_FILES_TABLE_SCHEMA[dbType].indexes) {
663
+ await db.run(idx);
664
+ }
665
+
666
+ // Create hazo_files_naming table (only if using NamingConventionService)
667
+ await db.run(HAZO_FILES_NAMING_TABLE_SCHEMA[dbType].ddl);
668
+ for (const idx of HAZO_FILES_NAMING_TABLE_SCHEMA[dbType].indexes) {
669
+ await db.run(idx);
670
+ }
671
+ ```
672
+
673
+ For PostgreSQL, swap `db.run(...)` for `client.query(...)`.
674
+
675
+ To use a custom table name, see `getSchemaForTable(name, dbType)` and `getNamingSchemaForTable(name, dbType)`.
676
+
677
+ ### Upgrading Existing Tables
678
+
679
+ If you already have a pre-V2 or pre-V3 `hazo_files` table, see [Database Migration (Existing Databases)](#database-migration-existing-databases) and [V3 Database Migration](#v3-database-migration) for the `ALTER TABLE` scripts and migration helpers.
680
+
513
681
  ## UI Components
514
682
 
515
683
  ### FileBrowser Component
@@ -1440,6 +1608,157 @@ const { fileManager, metadataService, namingService, extractionService, uploadEx
1440
1608
  invalidatePromptCache?.('classification', 'classify_document');
1441
1609
  ```
1442
1610
 
1611
+ ## Background Upload Pipelines
1612
+
1613
+ A framework-agnostic upload pipeline engine that survives React component unmount. Useful when uploads include multi-step server work (upload → LLM extract → user confirmation → DB commit) and the user may navigate away mid-flight.
1614
+
1615
+ Two subpath exports:
1616
+
1617
+ - `hazo_files/background-upload` — core, no React dependency (`UploadManager`, `Job`, `PipelineExecutor`, `TypedEventEmitter`, all types)
1618
+ - `hazo_files/background-upload/react` — React bindings (`HazoFileUploadProvider`, `useFileUpload`, `useJobStatus`, `useFileUploadToasts`)
1619
+
1620
+ ### Core API (framework-agnostic)
1621
+
1622
+ ```typescript
1623
+ import { UploadManager } from 'hazo_files/background-upload';
1624
+ import type { PipelineStep, PipelineContext, JobHandle } from 'hazo_files/background-upload';
1625
+
1626
+ const uploadStep: PipelineStep = {
1627
+ name: 'upload',
1628
+ async execute(ctx: PipelineContext, handle: JobHandle) {
1629
+ handle.set_status('uploading');
1630
+ for (let i = 0; i < ctx.files.length; i++) {
1631
+ // ... POST file to your /api/files/upload route
1632
+ handle.set_progress(i + 1, ctx.files.length);
1633
+ }
1634
+ },
1635
+ };
1636
+
1637
+ const extractStep: PipelineStep = {
1638
+ name: 'extract',
1639
+ async execute(ctx, handle) {
1640
+ handle.set_status('processing');
1641
+ const extracted = await fetch('/api/extract', { /* ... */ }).then(r => r.json());
1642
+ ctx.extracted_data = extracted;
1643
+ },
1644
+ };
1645
+
1646
+ const manager = new UploadManager({
1647
+ max_concurrent: 2,
1648
+ default_pipeline_steps: [uploadStep, extractStep],
1649
+ });
1650
+
1651
+ manager.on('job:completed', ({ job }) => console.log('done', job.job_id));
1652
+
1653
+ const batch_id = manager.submit_batch({
1654
+ files: [file1, file2],
1655
+ group_id: 'project-123',
1656
+ group_label: 'Q4 Tax Documents',
1657
+ });
1658
+ ```
1659
+
1660
+ ### React Provider + Hooks
1661
+
1662
+ ```tsx
1663
+ // app/layout.tsx (Next.js) or your app root
1664
+ 'use client';
1665
+ import { HazoFileUploadProvider } from 'hazo_files/background-upload/react';
1666
+ import { Toaster } from 'sonner';
1667
+
1668
+ export default function RootLayout({ children }: { children: React.ReactNode }) {
1669
+ return (
1670
+ <HazoFileUploadProvider config={{ max_concurrent: 2 }}>
1671
+ <Toaster richColors />
1672
+ {children}
1673
+ </HazoFileUploadProvider>
1674
+ );
1675
+ }
1676
+ ```
1677
+
1678
+ ```tsx
1679
+ 'use client';
1680
+ import { useFileUpload, useJobStatus } from 'hazo_files/background-upload/react';
1681
+
1682
+ export function UploadButton() {
1683
+ const { submit_batch, active_jobs } = useFileUpload();
1684
+
1685
+ function onPick(files: FileList) {
1686
+ submit_batch({
1687
+ files: Array.from(files),
1688
+ group_id: 'project-123',
1689
+ group_label: 'Project 123',
1690
+ pipeline_steps: [/* your PipelineStep[] */],
1691
+ });
1692
+ }
1693
+
1694
+ return (
1695
+ <>
1696
+ <input type="file" multiple onChange={(e) => onPick(e.target.files!)} />
1697
+ <ul>
1698
+ {active_jobs.map((j) => (
1699
+ <li key={j.job_id}>{j.group_label}: {j.status}</li>
1700
+ ))}
1701
+ </ul>
1702
+ </>
1703
+ );
1704
+ }
1705
+
1706
+ // Track a single job
1707
+ export function JobBadge({ job_id }: { job_id: string }) {
1708
+ const job = useJobStatus(job_id);
1709
+ if (!job) return null;
1710
+ return <span>{job.status} {job.progress && `${job.progress.current}/${job.progress.total}`}</span>;
1711
+ }
1712
+ ```
1713
+
1714
+ ### Confirmation Steps (user-in-the-loop)
1715
+
1716
+ ```typescript
1717
+ const confirmStep: PipelineStep = {
1718
+ name: 'confirm',
1719
+ async execute(ctx, handle) {
1720
+ handle.set_status('awaiting_confirmation');
1721
+ const result = await handle.request_confirmation({
1722
+ conflicts: ctx.extracted_data.conflicts,
1723
+ });
1724
+ if (!result.confirmed) throw new Error('User cancelled');
1725
+ // ctx.extracted_data now reflects user choices via result.data
1726
+ },
1727
+ };
1728
+ ```
1729
+
1730
+ In the UI, subscribe to `job:confirmation_needed` (or read jobs in `awaiting_confirmation` status), render a dialog, then:
1731
+
1732
+ ```typescript
1733
+ const { resolve_confirmation } = useFileUpload();
1734
+ resolve_confirmation(job_id, { confirmed: true, data: userChoices });
1735
+ ```
1736
+
1737
+ ### Sonner Toast Bridge
1738
+
1739
+ The provider mounts a `ToastBridge` by default (`enable_toasts={true}`) that uses sonner to notify on `job:completed`, `job:error`, `job:confirmation_needed`, and `batch:completed`. Sonner is a soft optional peer dependency — if it isn't installed, the bridge is a no-op. Set `enable_toasts={false}` on the provider to opt out, or wire `useFileUploadToasts(manager)` yourself for custom toast behavior.
1740
+
1741
+ ### Events
1742
+
1743
+ | Event | Payload | Fired when |
1744
+ |-------|---------|------------|
1745
+ | `job:created` | `{ job }` | Job enters the queue |
1746
+ | `job:status_changed` | `{ job, previous_status }` | Status transitions (queued → uploading → processing → ...) |
1747
+ | `job:progress` | `{ job }` | `handle.set_progress` called inside a pipeline step |
1748
+ | `job:completed` | `{ job }` | All pipeline steps finished successfully |
1749
+ | `job:error` | `{ job, error }` | A pipeline step threw |
1750
+ | `job:confirmation_needed` | `{ job, payload }` | `handle.request_confirmation` called |
1751
+ | `job:confirmation_resolved` | `{ job, result }` | `resolve_confirmation` called |
1752
+ | `batch:progress` | `{ batch }` | Any job in the batch settles |
1753
+ | `batch:completed` | `{ batch }` | All jobs in the batch are `done` or `error` |
1754
+
1755
+ ### Design Notes
1756
+
1757
+ - **Survives unmount**: `UploadManager` lives on a `useRef`; pipelines run on the manager, not on React state. Navigating away does not abort uploads.
1758
+ - **Single source of truth**: `useFileUpload` / `useJobStatus` subscribe via `useSyncExternalStore` against the manager's event emitter, so multiple components stay consistent.
1759
+ - **Concurrency**: `max_concurrent` controls how many jobs the executor runs in parallel; the rest wait in the FIFO queue.
1760
+ - **No DB writes**: This module is purely an in-memory pipeline runner — your pipeline steps own all server I/O.
1761
+
1443
1762
  ## API Reference
1444
1763
 
1445
1764
  ### FileManager