hazo_files 1.4.6 → 1.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGE_LOG.md +481 -0
- package/README.md +319 -0
- package/SETUP_CHECKLIST.md +1309 -0
- package/dist/background-upload/index.d.mts +166 -0
- package/dist/background-upload/index.d.ts +166 -0
- package/dist/background-upload/index.js +301 -0
- package/dist/background-upload/index.mjs +271 -0
- package/dist/background-upload/react/index.d.mts +149 -0
- package/dist/background-upload/react/index.d.ts +149 -0
- package/dist/background-upload/react/index.js +473 -0
- package/dist/background-upload/react/index.mjs +432 -0
- package/package.json +33 -11
- package/docs/SETUP_CHECKLIST.md +0 -260
package/README.md
CHANGED
|
@@ -20,6 +20,7 @@ A powerful, modular file management package for Node.js and React applications w
|
|
|
20
20
|
- **File Change Detection**: xxHash-based content hashing for efficient change detection
|
|
21
21
|
- **Content Tagging**: Optional LLM-based content classification at upload time or on-demand via `content_tag` field
|
|
22
22
|
- **Schema Migrations**: Built-in V2/V3 migration utilities for adding reference tracking and content tagging to existing databases
|
|
23
|
+
- **Background Upload Pipelines**: Framework-agnostic `UploadManager` + React `HazoFileUploadProvider` for multi-step upload pipelines that survive component unmount, with optional sonner toast bridge
|
|
23
24
|
- **TypeScript**: Full type safety and IntelliSense support
|
|
24
25
|
- **OAuth Integration**: Built-in Google Drive and Dropbox OAuth authentication
|
|
25
26
|
- **Prompt Cache Invalidation**: Passthrough for hazo_llm_api prompt cache management via server instance
|
|
@@ -61,6 +62,12 @@ npm install server-only # Server-side safety (recommended)
|
|
|
61
62
|
npm install xxhash-wasm # File change detection (optional)
|
|
62
63
|
```
|
|
63
64
|
|
|
65
|
+
For the background-upload sonner toast bridge (optional):
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
npm install sonner # Toast notifications for background upload pipelines
|
|
69
|
+
```
|
|
70
|
+
|
|
64
71
|
### Tailwind CSS v4 Setup (Required for UI Components)
|
|
65
72
|
|
|
66
73
|
If you're using Tailwind CSS v4 with the UI components, you must add a `@source` directive to your CSS file to ensure Tailwind scans the package's files for utility classes.
|
|
@@ -510,6 +517,167 @@ const fileManager = await createInitializedFileManager({
|
|
|
510
517
|
});
|
|
511
518
|
```
|
|
512
519
|
|
|
520
|
+
## Database Schema
|
|
521
|
+
|
|
522
|
+
Database tables are only required if you use `TrackedFileManager`, `FileMetadataService`, `NamingConventionService`, or `UploadExtractService`. Plain `FileManager` (filesystem only) needs no tables.
|
|
523
|
+
|
|
524
|
+
There are two tables:
|
|
525
|
+
|
|
526
|
+
- **`hazo_files`** — file metadata, hashes, references, content tags
|
|
527
|
+
- **`hazo_files_naming`** — saved naming conventions
|
|
528
|
+
|
|
529
|
+
The DDL below is also exposed programmatically via `HAZO_FILES_TABLE_SCHEMA` and `HAZO_FILES_NAMING_TABLE_SCHEMA` (see [Programmatic Setup](#programmatic-setup) below). Run the raw SQL if you prefer to manage migrations with your existing tooling (psql, sqlite3, Flyway, Knex, etc.).
|
|
530
|
+
|
|
531
|
+
### `hazo_files` Table
|
|
532
|
+
|
|
533
|
+
#### PostgreSQL
|
|
534
|
+
|
|
535
|
+
```sql
|
|
536
|
+
CREATE TABLE IF NOT EXISTS hazo_files (
|
|
537
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
538
|
+
filename TEXT NOT NULL,
|
|
539
|
+
file_type TEXT NOT NULL,
|
|
540
|
+
file_data TEXT DEFAULT '{}',
|
|
541
|
+
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
|
542
|
+
changed_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
|
543
|
+
file_path TEXT NOT NULL,
|
|
544
|
+
storage_type TEXT NOT NULL,
|
|
545
|
+
file_hash TEXT,
|
|
546
|
+
file_size BIGINT,
|
|
547
|
+
file_changed_at TIMESTAMP WITH TIME ZONE,
|
|
548
|
+
file_refs TEXT DEFAULT '[]',
|
|
549
|
+
ref_count INTEGER DEFAULT 0,
|
|
550
|
+
status TEXT DEFAULT 'active',
|
|
551
|
+
scope_id UUID,
|
|
552
|
+
uploaded_by UUID,
|
|
553
|
+
storage_verified_at TIMESTAMP WITH TIME ZONE,
|
|
554
|
+
deleted_at TIMESTAMP WITH TIME ZONE,
|
|
555
|
+
original_filename TEXT,
|
|
556
|
+
content_tag TEXT
|
|
557
|
+
);
|
|
558
|
+
|
|
559
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_path ON hazo_files (file_path);
|
|
560
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_storage ON hazo_files (storage_type);
|
|
561
|
+
CREATE UNIQUE INDEX IF NOT EXISTS idx_hazo_files_path_storage ON hazo_files (file_path, storage_type);
|
|
562
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_hash ON hazo_files (file_hash);
|
|
563
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_status ON hazo_files (status);
|
|
564
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_scope ON hazo_files (scope_id);
|
|
565
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_ref_count ON hazo_files (ref_count);
|
|
566
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_deleted ON hazo_files (deleted_at);
|
|
567
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_content_tag ON hazo_files (content_tag);
|
|
568
|
+
```
|
|
569
|
+
|
|
570
|
+
#### SQLite
|
|
571
|
+
|
|
572
|
+
```sql
|
|
573
|
+
CREATE TABLE IF NOT EXISTS hazo_files (
|
|
574
|
+
id TEXT PRIMARY KEY,
|
|
575
|
+
filename TEXT NOT NULL,
|
|
576
|
+
file_type TEXT NOT NULL,
|
|
577
|
+
file_data TEXT DEFAULT '{}',
|
|
578
|
+
created_at TEXT NOT NULL,
|
|
579
|
+
changed_at TEXT NOT NULL,
|
|
580
|
+
file_path TEXT NOT NULL,
|
|
581
|
+
storage_type TEXT NOT NULL,
|
|
582
|
+
file_hash TEXT,
|
|
583
|
+
file_size INTEGER,
|
|
584
|
+
file_changed_at TEXT,
|
|
585
|
+
file_refs TEXT DEFAULT '[]',
|
|
586
|
+
ref_count INTEGER DEFAULT 0,
|
|
587
|
+
status TEXT DEFAULT 'active',
|
|
588
|
+
scope_id TEXT,
|
|
589
|
+
uploaded_by TEXT,
|
|
590
|
+
storage_verified_at TEXT,
|
|
591
|
+
deleted_at TEXT,
|
|
592
|
+
original_filename TEXT,
|
|
593
|
+
content_tag TEXT
|
|
594
|
+
);
|
|
595
|
+
|
|
596
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_path ON hazo_files (file_path);
|
|
597
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_storage ON hazo_files (storage_type);
|
|
598
|
+
CREATE UNIQUE INDEX IF NOT EXISTS idx_hazo_files_path_storage ON hazo_files (file_path, storage_type);
|
|
599
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_hash ON hazo_files (file_hash);
|
|
600
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_status ON hazo_files (status);
|
|
601
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_scope ON hazo_files (scope_id);
|
|
602
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_ref_count ON hazo_files (ref_count);
|
|
603
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_deleted ON hazo_files (deleted_at);
|
|
604
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_content_tag ON hazo_files (content_tag);
|
|
605
|
+
```
|
|
606
|
+
|
|
607
|
+
### `hazo_files_naming` Table
|
|
608
|
+
|
|
609
|
+
Required only if you use `NamingConventionService` to persist saved naming rules.
|
|
610
|
+
|
|
611
|
+
#### PostgreSQL
|
|
612
|
+
|
|
613
|
+
```sql
|
|
614
|
+
CREATE TABLE IF NOT EXISTS hazo_files_naming (
|
|
615
|
+
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
616
|
+
scope_id UUID,
|
|
617
|
+
naming_title TEXT NOT NULL,
|
|
618
|
+
naming_type TEXT NOT NULL CHECK(naming_type IN ('file', 'folder', 'both')),
|
|
619
|
+
naming_value TEXT NOT NULL,
|
|
620
|
+
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
|
621
|
+
changed_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
|
622
|
+
variables TEXT DEFAULT '[]'
|
|
623
|
+
);
|
|
624
|
+
|
|
625
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_scope ON hazo_files_naming (scope_id);
|
|
626
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_type ON hazo_files_naming (naming_type);
|
|
627
|
+
```
|
|
628
|
+
|
|
629
|
+
#### SQLite
|
|
630
|
+
|
|
631
|
+
```sql
|
|
632
|
+
CREATE TABLE IF NOT EXISTS hazo_files_naming (
|
|
633
|
+
id TEXT PRIMARY KEY,
|
|
634
|
+
scope_id TEXT,
|
|
635
|
+
naming_title TEXT NOT NULL,
|
|
636
|
+
naming_type TEXT NOT NULL CHECK(naming_type IN ('file', 'folder', 'both')),
|
|
637
|
+
naming_value TEXT NOT NULL,
|
|
638
|
+
created_at TEXT NOT NULL,
|
|
639
|
+
changed_at TEXT NOT NULL,
|
|
640
|
+
variables TEXT DEFAULT '[]'
|
|
641
|
+
);
|
|
642
|
+
|
|
643
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_scope ON hazo_files_naming (scope_id);
|
|
644
|
+
CREATE INDEX IF NOT EXISTS idx_hazo_files_naming_type ON hazo_files_naming (naming_type);
|
|
645
|
+
```
|
|
646
|
+
|
|
647
|
+
### Programmatic Setup
|
|
648
|
+
|
|
649
|
+
The same DDL is available as exported constants so you can run it from your app's startup or migration script without hand-copying SQL:
|
|
650
|
+
|
|
651
|
+
```typescript
|
|
652
|
+
import {
|
|
653
|
+
HAZO_FILES_TABLE_SCHEMA,
|
|
654
|
+
HAZO_FILES_NAMING_TABLE_SCHEMA,
|
|
655
|
+
} from 'hazo_files';
|
|
656
|
+
|
|
657
|
+
// Pick 'sqlite' or 'postgres'
|
|
658
|
+
const dbType: 'sqlite' | 'postgres' = 'sqlite';
|
|
659
|
+
|
|
660
|
+
// Create hazo_files table
|
|
661
|
+
await db.run(HAZO_FILES_TABLE_SCHEMA[dbType].ddl);
|
|
662
|
+
for (const idx of HAZO_FILES_TABLE_SCHEMA[dbType].indexes) {
|
|
663
|
+
await db.run(idx);
|
|
664
|
+
}
|
|
665
|
+
|
|
666
|
+
// Create hazo_files_naming table (only if using NamingConventionService)
|
|
667
|
+
await db.run(HAZO_FILES_NAMING_TABLE_SCHEMA[dbType].ddl);
|
|
668
|
+
for (const idx of HAZO_FILES_NAMING_TABLE_SCHEMA[dbType].indexes) {
|
|
669
|
+
await db.run(idx);
|
|
670
|
+
}
|
|
671
|
+
```
|
|
672
|
+
|
|
673
|
+
For PostgreSQL, swap `db.run(...)` for `client.query(...)`.
|
|
674
|
+
|
|
675
|
+
To use a custom table name, see `getSchemaForTable(name, dbType)` and `getNamingSchemaForTable(name, dbType)`.
|
|
676
|
+
|
|
677
|
+
### Upgrading Existing Tables
|
|
678
|
+
|
|
679
|
+
If you already have a pre-V2 or pre-V3 `hazo_files` table, see [Database Migration (Existing Databases)](#database-migration-existing-databases) and [V3 Database Migration](#v3-database-migration) for the `ALTER TABLE` scripts and migration helpers.
|
|
680
|
+
|
|
513
681
|
## UI Components
|
|
514
682
|
|
|
515
683
|
### FileBrowser Component
|
|
@@ -1440,6 +1608,157 @@ const { fileManager, metadataService, namingService, extractionService, uploadEx
|
|
|
1440
1608
|
invalidatePromptCache?.('classification', 'classify_document');
|
|
1441
1609
|
```
|
|
1442
1610
|
|
|
1611
|
+
## Background Upload Pipelines
|
|
1612
|
+
|
|
1613
|
+
A framework-agnostic upload pipeline engine that survives React component unmount. Useful when uploads include multi-step server work (upload → LLM extract → user confirmation → DB commit) and the user may navigate away mid-flight.
|
|
1614
|
+
|
|
1615
|
+
Two subpath exports:
|
|
1616
|
+
|
|
1617
|
+
- `hazo_files/background-upload` — core, no React dependency (`UploadManager`, `Job`, `PipelineExecutor`, `TypedEventEmitter`, all types)
|
|
1618
|
+
- `hazo_files/background-upload/react` — React bindings (`HazoFileUploadProvider`, `useFileUpload`, `useJobStatus`, `useFileUploadToasts`)
|
|
1619
|
+
|
|
1620
|
+
### Core API (framework-agnostic)
|
|
1621
|
+
|
|
1622
|
+
```typescript
|
|
1623
|
+
import { UploadManager } from 'hazo_files/background-upload';
|
|
1624
|
+
import type { PipelineStep, PipelineContext, JobHandle } from 'hazo_files/background-upload';
|
|
1625
|
+
|
|
1626
|
+
const uploadStep: PipelineStep = {
|
|
1627
|
+
name: 'upload',
|
|
1628
|
+
async execute(ctx: PipelineContext, handle: JobHandle) {
|
|
1629
|
+
handle.set_status('uploading');
|
|
1630
|
+
for (let i = 0; i < ctx.files.length; i++) {
|
|
1631
|
+
// ... POST file to your /api/files/upload route
|
|
1632
|
+
handle.set_progress(i + 1, ctx.files.length);
|
|
1633
|
+
}
|
|
1634
|
+
},
|
|
1635
|
+
};
|
|
1636
|
+
|
|
1637
|
+
const extractStep: PipelineStep = {
|
|
1638
|
+
name: 'extract',
|
|
1639
|
+
async execute(ctx, handle) {
|
|
1640
|
+
handle.set_status('processing');
|
|
1641
|
+
const extracted = await fetch('/api/extract', { /* ... */ }).then(r => r.json());
|
|
1642
|
+
ctx.extracted_data = extracted;
|
|
1643
|
+
},
|
|
1644
|
+
};
|
|
1645
|
+
|
|
1646
|
+
const manager = new UploadManager({
|
|
1647
|
+
max_concurrent: 2,
|
|
1648
|
+
default_pipeline_steps: [uploadStep, extractStep],
|
|
1649
|
+
});
|
|
1650
|
+
|
|
1651
|
+
manager.on('job:completed', ({ job }) => console.log('done', job.job_id));
|
|
1652
|
+
|
|
1653
|
+
const batch_id = manager.submit_batch({
|
|
1654
|
+
files: [file1, file2],
|
|
1655
|
+
group_id: 'project-123',
|
|
1656
|
+
group_label: 'Q4 Tax Documents',
|
|
1657
|
+
});
|
|
1658
|
+
```
|
|
1659
|
+
|
|
1660
|
+
### React Provider + Hooks
|
|
1661
|
+
|
|
1662
|
+
```tsx
|
|
1663
|
+
// app/layout.tsx (Next.js) or your app root
|
|
1664
|
+
'use client';
|
|
1665
|
+
import { HazoFileUploadProvider } from 'hazo_files/background-upload/react';
|
|
1666
|
+
import { Toaster } from 'sonner';
|
|
1667
|
+
|
|
1668
|
+
export default function RootLayout({ children }: { children: React.ReactNode }) {
|
|
1669
|
+
return (
|
|
1670
|
+
<HazoFileUploadProvider config={{ max_concurrent: 2 }}>
|
|
1671
|
+
<Toaster richColors />
|
|
1672
|
+
{children}
|
|
1673
|
+
</HazoFileUploadProvider>
|
|
1674
|
+
);
|
|
1675
|
+
}
|
|
1676
|
+
```
|
|
1677
|
+
|
|
1678
|
+
```tsx
|
|
1679
|
+
'use client';
|
|
1680
|
+
import { useFileUpload, useJobStatus } from 'hazo_files/background-upload/react';
|
|
1681
|
+
|
|
1682
|
+
export function UploadButton() {
|
|
1683
|
+
const { submit_batch, active_jobs } = useFileUpload();
|
|
1684
|
+
|
|
1685
|
+
function onPick(files: FileList) {
|
|
1686
|
+
submit_batch({
|
|
1687
|
+
files: Array.from(files),
|
|
1688
|
+
group_id: 'project-123',
|
|
1689
|
+
group_label: 'Project 123',
|
|
1690
|
+
pipeline_steps: [/* your PipelineStep[] */],
|
|
1691
|
+
});
|
|
1692
|
+
}
|
|
1693
|
+
|
|
1694
|
+
return (
|
|
1695
|
+
<>
|
|
1696
|
+
<input type="file" multiple onChange={(e) => onPick(e.target.files!)} />
|
|
1697
|
+
<ul>
|
|
1698
|
+
{active_jobs.map((j) => (
|
|
1699
|
+
<li key={j.job_id}>{j.group_label}: {j.status}</li>
|
|
1700
|
+
))}
|
|
1701
|
+
</ul>
|
|
1702
|
+
</>
|
|
1703
|
+
);
|
|
1704
|
+
}
|
|
1705
|
+
|
|
1706
|
+
// Track a single job
|
|
1707
|
+
export function JobBadge({ job_id }: { job_id: string }) {
|
|
1708
|
+
const job = useJobStatus(job_id);
|
|
1709
|
+
if (!job) return null;
|
|
1710
|
+
return <span>{job.status} {job.progress && `${job.progress.current}/${job.progress.total}`}</span>;
|
|
1711
|
+
}
|
|
1712
|
+
```
|
|
1713
|
+
|
|
1714
|
+
### Confirmation Steps (user-in-the-loop)
|
|
1715
|
+
|
|
1716
|
+
```typescript
|
|
1717
|
+
const confirmStep: PipelineStep = {
|
|
1718
|
+
name: 'confirm',
|
|
1719
|
+
async execute(ctx, handle) {
|
|
1720
|
+
handle.set_status('awaiting_confirmation');
|
|
1721
|
+
const result = await handle.request_confirmation({
|
|
1722
|
+
conflicts: ctx.extracted_data.conflicts,
|
|
1723
|
+
});
|
|
1724
|
+
if (!result.confirmed) throw new Error('User cancelled');
|
|
1725
|
+
// ctx.extracted_data now reflects user choices via result.data
|
|
1726
|
+
},
|
|
1727
|
+
};
|
|
1728
|
+
```
|
|
1729
|
+
|
|
1730
|
+
In the UI, subscribe to `job:confirmation_needed` (or read jobs in `awaiting_confirmation` status), render a dialog, then:
|
|
1731
|
+
|
|
1732
|
+
```typescript
|
|
1733
|
+
const { resolve_confirmation } = useFileUpload();
|
|
1734
|
+
resolve_confirmation(job_id, { confirmed: true, data: userChoices });
|
|
1735
|
+
```
|
|
1736
|
+
|
|
1737
|
+
### Sonner Toast Bridge
|
|
1738
|
+
|
|
1739
|
+
The provider mounts a `ToastBridge` by default (`enable_toasts={true}`) that uses sonner to notify on `job:completed`, `job:error`, `job:confirmation_needed`, and `batch:completed`. Sonner is a soft optional peer dependency — if it isn't installed, the bridge is a no-op. Set `enable_toasts={false}` on the provider to opt out, or wire `useFileUploadToasts(manager)` yourself for custom toast behavior.
|
|
1740
|
+
|
|
1741
|
+
### Events
|
|
1742
|
+
|
|
1743
|
+
| Event | Payload | Fired when |
|
|
1744
|
+
|-------|---------|------------|
|
|
1745
|
+
| `job:created` | `{ job }` | Job enters the queue |
|
|
1746
|
+
| `job:status_changed` | `{ job, previous_status }` | Status transitions (queued → uploading → processing → ...) |
|
|
1747
|
+
| `job:progress` | `{ job }` | `handle.set_progress` called inside a pipeline step |
|
|
1748
|
+
| `job:completed` | `{ job }` | All pipeline steps finished successfully |
|
|
1749
|
+
| `job:error` | `{ job, error }` | A pipeline step threw |
|
|
1750
|
+
| `job:confirmation_needed` | `{ job, payload }` | `handle.request_confirmation` called |
|
|
1751
|
+
| `job:confirmation_resolved` | `{ job, result }` | `resolve_confirmation` called |
|
|
1752
|
+
| `batch:progress` | `{ batch }` | Any job in the batch settles |
|
|
1753
|
+
| `batch:completed` | `{ batch }` | All jobs in the batch are `done` or `error` |
|
|
1754
|
+
|
|
1755
|
+
### Design Notes
|
|
1756
|
+
|
|
1757
|
+
- **Survives unmount**: `UploadManager` lives on a `useRef`; pipelines run on the manager, not on React state. Navigating away does not abort uploads.
|
|
1758
|
+
- **Single source of truth**: `useFileUpload` / `useJobStatus` subscribe via `useSyncExternalStore` against the manager's event emitter, so multiple components stay consistent.
|
|
1759
|
+
- **Concurrency**: `max_concurrent` controls how many jobs the executor runs in parallel; the rest wait in the FIFO queue.
|
|
1760
|
+
- **No DB writes**: This module is purely an in-memory pipeline runner — your pipeline steps own all server I/O.
|
|
1761
|
+
|
|
1443
1762
|
## API Reference
|
|
1444
1763
|
|
|
1445
1764
|
### FileManager
|