npm - @semiont/jobs - Versions diffs - 0.2.45 → 0.2.46 - Mend

@semiont/jobs 0.2.45 → 0.2.46

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -6,18 +6,13 @@
 [![npm downloads](https://img.shields.io/npm/dm/@semiont/jobs.svg)](https://www.npmjs.com/package/@semiont/jobs)
 [![License](https://img.shields.io/npm/l/@semiont/jobs.svg)](https://github.com/The-AI-Alliance/semiont/blob/main/LICENSE)
-Filesystem-based job queue and worker infrastructure for [Semiont](https://github.com/The-AI-Alliance/semiont) - provides async job processing, background workers, and long-running task management.
+Filesystem-based job queue, worker infrastructure, and annotation workers for [Semiont](https://github.com/The-AI-Alliance/semiont).
-## What is a Job Queue?
+## Architecture Context
-A job queue is a pattern for processing work asynchronously outside of the HTTP request/response cycle. Jobs are persisted to storage, processed by workers, and can be monitored for progress and completion.
+In production, the job queue and workers are created by `@semiont/make-meaning`'s `startMakeMeaning()` function. Workers emit commands on the **EventBus** — the **Stower** actor (in @semiont/make-meaning) handles all persistence to the Knowledge Base.
-**Benefits:**
-- **Decoupled processing** - HTTP responses return immediately while work continues
-- **Reliability** - Jobs are persisted to disk and survive process restarts
-- **Progress tracking** - Long-running tasks can report status updates
-- **Retry logic** - Failed jobs can be retried with exponential backoff
-- **Scalability** - Multiple workers can process jobs concurrently
+Workers are **not** actors. They use a polling loop, not RxJS subscriptions. But they emit the same EventBus commands as any other caller in the system.
 ## Installation
@@ -25,43 +20,33 @@ A job queue is a pattern for processing work asynchronously outside of the HTTP
 npm install @semiont/jobs
 ```
-**Prerequisites:**
-- Node.js >= 20.18.1
-- `@semiont/core` and `@semiont/api-client` (peer dependencies)
-## Architecture Context
-**Infrastructure Ownership**: In production applications, the job queue is **created and managed by [@semiont/make-meaning](../make-meaning/)'s `startMakeMeaning()` function**, which serves as the single orchestration point for all infrastructure components (EventStore, GraphDB, RepStore, InferenceClient, JobQueue, Workers).
-The quick start example below shows direct initialization for **testing, CLI tools, or standalone workers**. For backend integration, access the job queue through the `makeMeaning` context object.
+**Dependencies:**
+- `@semiont/core` — Core types, EventBus
+- `@semiont/api-client` — OpenAPI types
+- `@semiont/inference` — InferenceClient for AI operations
 ## Quick Start
 ```typescript
-import {
-  JobQueue,
-  initializeJobQueue,
-  getJobQueue,
-  JobWorker,
-  type PendingJob,
-  type RunningJob,
-  type GenerationParams,
-  type AnyJob,
-} from '@semiont/jobs';
+import { JobQueue, type PendingJob, type GenerationParams } from '@semiont/jobs';
+import { EventBus, userId, resourceId, annotationId } from '@semiont/core';
 import { jobId } from '@semiont/api-client';
-import { userId, resourceId, annotationId } from '@semiont/core';
-// 1. Initialize job queue
-await initializeJobQueue({ dataDir: './data' });
+// Initialize
+const eventBus = new EventBus();
+const jobQueue = new JobQueue({ dataDir: './data' }, logger, eventBus);
+await jobQueue.initialize();
-// 2. Create a job
-const jobQueue = getJobQueue();
+// Create a job
 const job: PendingJob<GenerationParams> = {
   status: 'pending',
   metadata: {
     id: jobId('job-abc123'),
     type: 'generation',
     userId: userId('user@example.com'),
+    userName: 'Jane Doe',
+    userEmail: 'jane@example.com',
+    userDomain: 'example.com',
     created: new Date().toISOString(),
     retryCount: 0,
     maxRetries: 3,
@@ -69,6 +54,8 @@ const job: PendingJob<GenerationParams> = {
   params: {
     referenceId: annotationId('ref-123'),
     sourceResourceId: resourceId('doc-456'),
+    sourceResourceName: 'Source Document',
+    annotation: { /* full W3C Annotation */ },
     title: 'Generated Article',
     prompt: 'Write about AI',
     language: 'en-US',
@@ -76,562 +63,120 @@ const job: PendingJob<GenerationParams> = {
 };
 await jobQueue.createJob(job);
-// 3. Create a worker to process jobs
-class MyGenerationWorker extends JobWorker {
-  protected getWorkerName(): string {
-    return 'MyGenerationWorker';
-  }
-  protected canProcessJob(job: AnyJob): boolean {
-    return job.metadata.type === 'generation';
-  }
-  protected async executeJob(job: AnyJob): Promise<void> {
-    // Type guard ensures job is running
-    if (job.status !== 'running') {
-      throw new Error('Job must be running');
-    }
-    const genJob = job as RunningJob<GenerationParams>;
-    console.log(`Generating resource: ${genJob.params.title}`);
-    // Your processing logic here
-  }
-}
-// 4. Start worker
-const worker = new MyGenerationWorker();
-await worker.start();
-```
-## Architecture
-The jobs package follows a simple status-directory pattern:
-```
-data/
-  jobs/
-    pending/        ← Jobs waiting to be processed
-      job-123.json
-      job-456.json
-    running/        ← Jobs currently being processed
-      job-789.json
-    complete/       ← Successfully completed jobs
-      job-111.json
-    failed/         ← Failed jobs (with error info)
-      job-222.json
-    cancelled/      ← Cancelled jobs
-      job-333.json
 ```
-**Key Components:**
-- **JobQueue** - Manages job lifecycle and persistence
-- **JobWorker** - Abstract base class for workers that process jobs
-- **Job Types** - Strongly-typed job definitions for different task types
-## Core Concepts
-### Jobs
-Jobs use discriminated unions based on their status, ensuring type safety and preventing invalid state access:
+## Job Types
 ```typescript
-import type { PendingJob, RunningJob, CompleteJob, GenerationParams, GenerationProgress, GenerationResult } from '@semiont/jobs';
-// Pending job - waiting to be processed
-const pendingJob: PendingJob<GenerationParams> = {
-  status: 'pending',
-  metadata: {
-    id: jobId('job-123'),
-    type: 'generation',
-    userId: userId('user@example.com'),
-    created: '2024-01-01T00:00:00Z',
-    retryCount: 0,
-    maxRetries: 3,
-  },
-  params: {
-    referenceId: annotationId('ref-456'),
-    sourceResourceId: resourceId('doc-789'),
-    title: 'AI Generated Article',
-    prompt: 'Write about quantum computing',
-    language: 'en-US',
-  },
-};
-// Running job - currently being processed
-const runningJob: RunningJob<GenerationParams, GenerationProgress> = {
-  status: 'running',
-  metadata: { /* same as above */ },
-  params: { /* same as above */ },
-  startedAt: '2024-01-01T00:01:00Z',
-  progress: {
-    stage: 'generating',
-    percentage: 45,
-    message: 'Generating content...',
-  },
-};
-// Complete job - successfully finished
-const completeJob: CompleteJob<GenerationParams, GenerationResult> = {
-  status: 'complete',
-  metadata: { /* same as above */ },
-  params: { /* same as above */ },
-  startedAt: '2024-01-01T00:01:00Z',
-  completedAt: '2024-01-01T00:05:00Z',
-  result: {
-    resourceId: resourceId('doc-new'),
-    resourceName: 'Generated Article',
-  },
-};
-// TypeScript prevents accessing progress on pending jobs!
-// pendingJob.progress  // ❌ Compile error
-// runningJob.progress  // ✅ Available
-// completeJob.result   // ✅ Available
+type JobType =
+  | 'reference-annotation'     // Entity reference detection
+  | 'generation'               // AI content generation
+  | 'highlight-annotation'     // Key passage highlighting
+  | 'assessment-annotation'    // Evaluative assessments
+  | 'comment-annotation'       // Explanatory comments
+  | 'tag-annotation'           // Structural role tagging
 ```
-### Job Types
+## Job Metadata
-The package supports multiple job types for different tasks, each with their own parameter types:
+All jobs share common metadata:
 ```typescript
-import type {
-  DetectionParams,           // Entity detection in resources
-  GenerationParams,          // AI content generation
-  HighlightDetectionParams,  // Identify key passages
-  AssessmentDetectionParams, // Generate evaluative comments
-  CommentDetectionParams,    // Generate explanatory comments
-  TagDetectionParams,        // Structural role detection
-} from '@semiont/jobs';
-```
-### Job Status
-Jobs progress through status states stored as directories:
-```typescript
-type JobStatus =
-  | 'pending'    // Waiting to be processed
-  | 'running'    // Currently being processed
-  | 'complete'   // Successfully finished
-  | 'failed'     // Failed with error
-  | 'cancelled'  // Cancelled by user
-```
-### Workers
-Workers poll the queue and process jobs:
-```typescript
-import { JobWorker, type AnyJob, type RunningJob, type CustomParams } from '@semiont/jobs';
-class CustomWorker extends JobWorker {
-  // Worker identification
-  protected getWorkerName(): string {
-    return 'CustomWorker';
-  }
-  // Filter which jobs this worker processes
-  protected canProcessJob(job: AnyJob): boolean {
-    return job.metadata.type === 'custom-type';
-  }
-  // Implement job processing logic
-  protected async executeJob(job: AnyJob): Promise<void> {
-    // 1. Type guard - job must be running
-    if (job.status !== 'running') {
-      throw new Error('Job must be running');
-    }
-    // 2. Access typed job data
-    const customJob = job as RunningJob<CustomParams>;
-    const params = customJob.params;
-    // 3. Perform async work
-    const result = await doWork(params);
-    // 4. Create updated job with result (immutable pattern)
-    const updatedJob: RunningJob<CustomParams> = {
-      ...customJob,
-      progress: { stage: 'complete', percentage: 100 },
-    };
-    await this.updateJobProgress(updatedJob);
-  }
+interface JobMetadata {
+  id: JobId;
+  type: JobType;
+  userId: UserId;
+  userName: string;       // For building W3C Agent creator
+  userEmail: string;      // For building W3C Agent creator
+  userDomain: string;     // For building W3C Agent creator
+  created: string;
+  retryCount: number;
+  maxRetries: number;
 }
 ```
-## Documentation
-📚 **[Job Queue Guide](./docs/JobQueue.md)** - JobQueue API and job management
-👷 **[Workers Guide](./docs/Workers.md)** - Building custom workers
-📝 **[Job Types Guide](./docs/JobTypes.md)** - All job type definitions and usage
-🔷 **[Type System Guide](./docs/TYPES.md)** - Discriminated unions and type safety
-⚙️ **[Configuration Guide](./docs/Configuration.md)** - Setup and options
-## Key Features
-- **Type-safe** - Full TypeScript support with discriminated union types
-- **Filesystem-based** - No external database required (JSON files for jobs)
-- **Status directories** - Jobs organized by status for easy polling
-- **Atomic operations** - Safe concurrent access to job files
-- **Progress tracking** - Jobs can report progress updates during processing
-- **Retry logic** - Built-in retry handling with configurable max attempts
-- **Framework-agnostic** - Pure TypeScript, no web framework dependencies
-## Use Cases
-✅ **AI generation** - Long-running LLM inference tasks
-✅ **Background processing** - Resource analysis, entity detection
-✅ **Worker microservices** - Separate processes for compute-intensive work
+The `userName`, `userEmail`, and `userDomain` fields are used by workers to build the W3C `Agent` for annotation `creator` attribution via `userToAgent()`.
-✅ **CLI tools** - Command-line tools that queue batch operations
+## Annotation Workers
-✅ **Testing** - Isolated job queues for unit/integration tests
+Six workers process different annotation types:
-❌ **Not for frontend** - Backend infrastructure only (workers need filesystem access)
+| Worker | Job Type | Constructor |
+|--------|----------|------------|
+| `ReferenceAnnotationWorker` | `reference-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
+| `GenerationWorker` | `generation` | `(jobQueue, config, inferenceClient, eventBus, logger)` |
+| `HighlightAnnotationWorker` | `highlight-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
+| `AssessmentAnnotationWorker` | `assessment-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
+| `CommentAnnotationWorker` | `comment-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
+| `TagAnnotationWorker` | `tag-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
-## API Overview
+Workers emit EventBus commands (`mark:create`, `job:start`, `job:complete`, etc.) — the Stower actor in @semiont/make-meaning handles persistence.
-### JobQueue
+## Custom Workers
 ```typescript
-const queue = getJobQueue();
+import { JobWorker, type AnyJob } from '@semiont/jobs';
+import type { Logger } from '@semiont/core';
-// Create job
-await queue.createJob(job);
-// Get job by ID
-const job = await queue.getJob(jobId);
-// Poll for next pending job
-const next = await queue.pollNextPendingJob();
-// Update job status
-job.status = 'complete';
-await queue.updateJob(job, 'running');
-// Query jobs by status
-const pending = await queue.queryJobs({ status: 'pending' });
-const failed = await queue.queryJobs({ status: 'failed' });
-// Cleanup old jobs
-await queue.cleanupCompletedJobs(Date.now() - 86400000); // 1 day ago
-```
-### JobWorker
-```typescript
-// Create worker
 class MyWorker extends JobWorker {
-  constructor() {
-    super(
-      1000,  // Poll interval (ms)
-      5000   // Error backoff (ms)
-    );
+  constructor(jobQueue: JobQueue, logger: Logger) {
+    super(jobQueue, 1000, 5000, logger);
+    //              ^^^^  ^^^^
+    //              poll   error backoff
   }
   protected getWorkerName(): string {
     return 'MyWorker';
   }
-  protected canProcessJob(job: Job): boolean {
-    return job.type === 'my-type';
-  }
-  protected async executeJob(job: Job): Promise<void> {
-    // Process job
+  protected canProcessJob(job: AnyJob): boolean {
+    return job.metadata.type === 'generation';
   }
-}
-// Start worker
-const worker = new MyWorker();
-await worker.start();
-// Stop worker (graceful shutdown)
-await worker.stop();
-```
-### Singleton Pattern
-```typescript
-import { initializeJobQueue, getJobQueue } from '@semiont/jobs';
-// Initialize once at startup
-await initializeJobQueue({ dataDir: './data' });
-// Get queue instance anywhere
-const queue = getJobQueue();
-```
-## Storage Format
-Jobs are stored as individual JSON files:
-```
-data/
-  jobs/
-    pending/
-      job-abc123.json
-    running/
-      job-def456.json
-    complete/
-      job-ghi789.json
-```
-Each job file contains the complete job object using the discriminated union structure:
-```json
-{
-  "status": "complete",
-  "metadata": {
-    "id": "job-abc123",
-    "type": "generation",
-    "userId": "user@example.com",
-    "created": "2024-01-01T00:00:00Z",
-    "retryCount": 0,
-    "maxRetries": 3
-  },
-  "params": {
-    "referenceId": "ref-456",
-    "sourceResourceId": "doc-789",
-    "title": "Generated Article",
-    "prompt": "Write about AI",
-    "language": "en-US"
-  },
-  "startedAt": "2024-01-01T00:01:00Z",
-  "completedAt": "2024-01-01T00:05:00Z",
-  "result": {
-    "resourceId": "doc-new",
-    "resourceName": "Generated Article"
+  protected async executeJob(job: AnyJob): Promise<any> {
+    // Your processing logic — return result object
   }
 }
 ```
-## Performance
-- **Polling-based** - Workers poll pending directory at configurable intervals
-- **Filesystem limits** - Performance degrades with >1000 pending jobs per directory
-- **Atomic moves** - Jobs move between status directories atomically (delete + write)
-- **No locks needed** - Status-based organization prevents race conditions
-**Scaling considerations:**
-- Multiple workers can run concurrently (same or different machines)
-- Workers use `pollNextPendingJob()` for FIFO processing
-- Completed jobs should be cleaned up periodically
-- For high throughput (>1000 jobs/min), consider Redis/database-backed queue
-## Error Handling
+## Discriminated Unions
-### Worker Error Recovery
+Jobs use TypeScript discriminated unions for type safety:
 ```typescript
-class ResilientWorker extends JobWorker {
-  protected async executeJob(job: AnyJob): Promise<void> {
-    if (job.status !== 'running') {
-      throw new Error('Job must be running');
-    }
-    try {
-      await doWork(job);
-    } catch (error) {
-      // JobWorker base class handles:
-      // 1. Moving job to 'failed' status
-      // 2. Recording error message
-      // 3. Retry logic (if retryCount < maxRetries)
-      throw error; // Let base class handle it
-    }
+function handleJob(job: AnyJob) {
+  if (job.status === 'running') {
+    console.log(job.progress);    // Available
+    // console.log(job.result);   // Compile error
   }
-}
-```
-### Manual Retry
-```typescript
-const queue = getJobQueue();
-const failedJobs = await queue.queryJobs({ status: 'failed' });
-for (const job of failedJobs) {
-  if (job.status === 'failed' && job.metadata.retryCount < job.metadata.maxRetries) {
-    // Create new pending job from failed job
-    const retryJob: PendingJob<any> = {
-      status: 'pending',
-      metadata: {
-        ...job.metadata,
-        retryCount: job.metadata.retryCount + 1,
-      },
-      params: job.params,
-    };
-    await queue.updateJob(retryJob, 'failed');
+  if (job.status === 'complete') {
+    console.log(job.result);      // Available
+    // console.log(job.progress); // Compile error
   }
 }
 ```
-## Testing
-```typescript
-import { initializeJobQueue, getJobQueue } from '@semiont/jobs';
-import type { PendingJob, GenerationParams } from '@semiont/jobs';
-import { describe, it, beforeEach } from 'vitest';
-describe('Job queue', () => {
-  beforeEach(async () => {
-    await initializeJobQueue({ dataDir: './test-data' });
-  });
-  it('should create and retrieve jobs', async () => {
-    const queue = getJobQueue();
-    const job: PendingJob<GenerationParams> = {
-      status: 'pending',
-      metadata: {
-        id: jobId('test-1'),
-        type: 'generation',
-        userId: userId('user@test.com'),
-        created: new Date().toISOString(),
-        retryCount: 0,
-        maxRetries: 3,
-      },
-      params: {
-        referenceId: annotationId('ref-1'),
-        sourceResourceId: resourceId('doc-1'),
-        title: 'Test',
-        prompt: 'Test prompt',
-        language: 'en-US',
-      },
-    };
-    await queue.createJob(job);
-    const retrieved = await queue.getJob(jobId('test-1'));
-    expect(retrieved).toEqual(job);
-  });
-});
-```
-## Examples
-### Building a Background Worker
-```typescript
-import { JobWorker, type AnyJob, type RunningJob, type GenerationParams, type GenerationProgress } from '@semiont/jobs';
-import { InferenceService } from './inference';
-class GenerationWorker extends JobWorker {
-  private inference: InferenceService;
-  constructor(inference: InferenceService) {
-    super(1000, 5000);
-    this.inference = inference;
-  }
+## Storage Format
-  protected getWorkerName(): string {
-    return 'GenerationWorker';
-  }
+Jobs are stored as individual JSON files organized by status:
-  protected canProcessJob(job: AnyJob): boolean {
-    return job.metadata.type === 'generation';
-  }
-  protected async executeJob(job: AnyJob): Promise<void> {
-    // Type guard
-    if (job.status !== 'running') {
-      throw new Error('Job must be running');
-    }
-    const genJob = job as RunningJob<GenerationParams, GenerationProgress>;
-    // Report progress (create new object - immutable pattern)
-    const updatedJob1: RunningJob<GenerationParams, GenerationProgress> = {
-      ...genJob,
-      progress: {
-        stage: 'generating',
-        percentage: 0,
-        message: 'Starting generation...',
-      },
-    };
-    await getJobQueue().updateJob(updatedJob1);
-    // Generate content
-    const content = await this.inference.generate({
-      prompt: genJob.params.prompt,
-      context: genJob.params.context,
-      temperature: genJob.params.temperature,
-      maxTokens: genJob.params.maxTokens,
-    });
-    // Update progress
-    const updatedJob2: RunningJob<GenerationParams, GenerationProgress> = {
-      ...updatedJob1,
-      progress: {
-        stage: 'creating',
-        percentage: 75,
-        message: 'Creating resource...',
-      },
-    };
-    await getJobQueue().updateJob(updatedJob2);
-    // Create resource (simplified)
-    const resourceId = await createResource(content, genJob.params.title);
-    // Set result (will be handled by base class transition to complete)
-    return {
-      resourceId,
-      resourceName: genJob.params.title,
-    };
-  }
-}
+```
+data/jobs/
+  pending/job-abc123.json
+  running/job-def456.json
+  complete/job-ghi789.json
+  failed/job-jkl012.json
+  cancelled/job-mno345.json
 ```
-### Progress Monitoring
-```typescript
-import { getJobQueue } from '@semiont/jobs';
-async function monitorJob(jobId: JobId): Promise<void> {
-  const queue = getJobQueue();
-  while (true) {
-    const job = await queue.getJob(jobId);
-    if (!job) {
-      console.log('Job not found');
-      break;
-    }
-    console.log(`Status: ${job.status}`);
-    // Type-safe progress access - only available on running jobs
-    if (job.status === 'running') {
-      console.log(`Progress: ${job.progress.percentage}%`);
-      console.log(`Stage: ${job.progress.stage}`);
-      console.log(`Message: ${job.progress.message || 'Processing...'}`);
-    }
-    // Type-safe result access - only available on complete jobs
-    if (job.status === 'complete') {
-      console.log(`Result: ${JSON.stringify(job.result)}`);
-    }
-    // Type-safe error access - only available on failed jobs
-    if (job.status === 'failed') {
-      console.log(`Error: ${job.error}`);
-    }
+## Documentation
-    if (job.status === 'complete' || job.status === 'failed') {
-      break;
-    }
-    await new Promise(resolve => setTimeout(resolve, 1000));
-  }
-}
-```
+- **[Job Queue Guide](./docs/JobQueue.md)** — JobQueue API and job management
+- **[Workers Guide](./docs/Workers.md)** — Building custom workers
+- **[Job Types Guide](./docs/JobTypes.md)** — All job type definitions
+- **[Type System Guide](./docs/TYPES.md)** — Discriminated unions and type safety
+- **[Configuration Guide](./docs/Configuration.md)** — Setup and options
+- **[API Reference](./docs/API.md)** — Complete API reference
 ## License
@@ -639,13 +184,7 @@ Apache-2.0
 ## Related Packages
-- [`@semiont/api-client`](../api-client/) - API types and utilities
-- [`@semiont/core`](../core/) - Domain types and utilities
-- [`@semiont/event-sourcing`](../event-sourcing/) - Event persistence
-- [`semiont-backend`](../../apps/backend/) - Backend API server
-## Learn More
-- [Background Jobs Pattern](https://www.enterpriseintegrationpatterns.com/patterns/messaging/MessageQueueing.html) - Queue-based processing
-- [Job Types Guide](./docs/JobTypes.md) - Detailed job type documentation
-- [Workers Guide](./docs/Workers.md) - Building custom workers
+- [`@semiont/core`](../core/) — Domain types, EventBus
+- [`@semiont/api-client`](../api-client/) — OpenAPI types
+- [`@semiont/inference`](../inference/) — AI inference client
+- [`@semiont/make-meaning`](../make-meaning/) — Actor model, Knowledge Base, service orchestration