@semiont/jobs 0.5.5 → 0.5.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,7 +10,7 @@ Job queue, worker infrastructure, and annotation workers for [Semiont](https://g
10
10
 
11
11
  ## Architecture Context
12
12
 
13
- Workers run in a separate separate process and connect to the Knowledge System (KS) over HTTP/SSE using `WorkerStateUnit` from `@semiont/api-client`. Workers receive job assignments via SSE push, claim jobs atomically, and emit domain events back to the KS via HTTP. The KS ingests these events onto its EventBus for SSE delivery to the frontend.
13
+ Workers run in a separate process and connect to the Knowledge System (KS) over HTTP/SSE using a `SemiontSession` (from `@semiont/sdk`) driven by a `JobClaimAdapter`. Workers receive job assignments via an SSE `job:queued` subscription, claim jobs atomically, and emit domain events back to the KS via `session.client.transport.emit(...)`. The KS ingests these events onto its EventBus for SSE delivery to the frontend.
14
14
 
15
15
  ## Installation
16
16
 
@@ -19,20 +19,25 @@ npm install @semiont/jobs
19
19
  ```
20
20
 
21
21
  **Dependencies:**
22
- - `@semiont/core` — Core types, EventBus
23
- - `@semiont/api-client` — OpenAPI types
22
+ - `@semiont/core` — Core types, `SemiontProject`, EventBus
23
+ - `@semiont/sdk` — `SemiontSession`, `WorkerBus` (worker process)
24
+ - `@semiont/http-transport` — HTTP transport, OpenAPI types
24
25
  - `@semiont/inference` — InferenceClient for AI operations
26
+ - `@semiont/content` — Content storage URI derivation
27
+ - `@semiont/event-sourcing` — Annotation id generation
28
+ - `@semiont/observability` — Spans and job-outcome metrics
25
29
 
26
30
  ## Quick Start
27
31
 
28
32
  ```typescript
29
- import { JobQueue, type PendingJob, type GenerationParams } from '@semiont/jobs';
30
- import { EventBus, userId, resourceId, annotationId } from '@semiont/core';
31
- import { jobId } from '@semiont/api-client';
33
+ import { FsJobQueue, type PendingJob, type GenerationParams } from '@semiont/jobs';
34
+ import { EventBus, userId, resourceId, annotationId, jobId } from '@semiont/core';
35
+ import { SemiontProject } from '@semiont/core/node';
32
36
 
33
- // Initialize
37
+ // Initialize — jobs are stored under project.jobsDir
34
38
  const eventBus = new EventBus();
35
- const jobQueue = new JobQueue({ dataDir: './data' }, logger, eventBus);
39
+ const project = new SemiontProject('/path/to/project');
40
+ const jobQueue = new FsJobQueue(project, logger, eventBus);
36
41
  await jobQueue.initialize();
37
42
 
38
43
  // Create a job
@@ -84,58 +89,43 @@ interface JobMetadata {
84
89
  id: JobId;
85
90
  type: JobType;
86
91
  userId: UserId;
87
- userName: string; // For building W3C Agent creator
88
- userEmail: string; // For building W3C Agent creator
89
- userDomain: string; // For building W3C Agent creator
92
+ userName: string; // Audit-only snapshot of the requesting user
93
+ userEmail: string; // Audit-only snapshot of the requesting user
94
+ userDomain: string; // Audit-only snapshot of the requesting user
90
95
  created: string;
91
96
  retryCount: number;
92
97
  maxRetries: number;
93
98
  }
94
99
  ```
95
100
 
96
- The `userName`, `userEmail`, and `userDomain` fields are used by workers to build the W3C `Agent` for annotation `creator` attribution via `userToAgent()`.
101
+ The `userName`, `userEmail`, and `userDomain` fields are an audit-only snapshot of the requesting user, persisted in the on-disk job file. Workers derive annotation `creator` attribution from `userId` via `didToAgent()`.
97
102
 
98
103
  ## Annotation Workers
99
104
 
100
- Six workers process different annotation types:
105
+ The worker process (`worker-main.ts` `startWorkerProcess` in `worker-process.ts`) claims jobs over the bus via a `JobClaimAdapter` and dispatches by `jobType` to a processor function. There are no per-type worker classes; each job type maps to one `process*Job` function:
101
106
 
102
- | Worker | Job Type | Constructor |
103
- |--------|----------|------------|
104
- | `ReferenceAnnotationWorker` | `reference-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
105
- | `GenerationWorker` | `generation` | `(jobQueue, config, inferenceClient, eventBus, logger)` |
106
- | `HighlightAnnotationWorker` | `highlight-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
107
- | `AssessmentAnnotationWorker` | `assessment-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
108
- | `CommentAnnotationWorker` | `comment-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
109
- | `TagAnnotationWorker` | `tag-annotation` | `(jobQueue, config, inferenceClient, eventBus, contentFetcher, logger)` |
107
+ | Job Type | Processor |
108
+ |----------|-----------|
109
+ | `reference-annotation` | `processReferenceJob` |
110
+ | `generation` | `processGenerationJob` |
111
+ | `highlight-annotation` | `processHighlightJob` |
112
+ | `assessment-annotation` | `processAssessmentJob` |
113
+ | `comment-annotation` | `processCommentJob` |
114
+ | `tag-annotation` | `processTagJob` |
110
115
 
111
- Workers emit EventBus commands (`mark:create`, `job:start`, `job:complete`, etc.) — the Stower actor in @semiont/make-meaning handles persistence.
116
+ Detection logic lives in the `AnnotationDetection` class (`src/workers/annotation-detection.ts`); generation synthesis in `generateResourceFromTopic()` (`src/workers/generation/resource-generation.ts`). Processors never fetch content themselves — the worker process fetches it via `session.client.browse.resourceContent(resourceId)` and passes it in.
112
117
 
113
- ## Custom Workers
118
+ Workers emit bus events via `session.client.transport.emit('mark:create' | 'job:start' | 'job:report-progress' | 'job:complete' | 'job:fail', payload)` — the Stower actor in @semiont/make-meaning handles persistence to the event log, and the job command handlers mirror the same events into the queue files (completion, retry-on-failure with `maxRetries`, progress-as-heartbeat).
114
119
 
115
- ```typescript
116
- import { JobWorker, type AnyJob } from '@semiont/jobs';
117
- import type { Logger } from '@semiont/core';
118
-
119
- class MyWorker extends JobWorker {
120
- constructor(jobQueue: JobQueue, logger: Logger) {
121
- super(jobQueue, 1000, 5000, logger);
122
- // ^^^^ ^^^^
123
- // poll error backoff
124
- }
120
+ ## Adding a Job Type
125
121
 
126
- protected getWorkerName(): string {
127
- return 'MyWorker';
128
- }
122
+ Workers are not subclassed. To add a job type:
129
123
 
130
- protected canProcessJob(job: AnyJob): boolean {
131
- return job.metadata.type === 'generation';
132
- }
124
+ 1. Add the new `JobType` and its params/result/progress types in `src/types.ts`.
125
+ 2. Add a `process*Job` function in `src/processors.ts` that runs the inference and returns the annotations/result.
126
+ 3. Dispatch the new `jobType` to that processor in `handleJobInner()` in `src/worker-process.ts`.
133
127
 
134
- protected async executeJob(job: AnyJob): Promise<any> {
135
- // Your processing logic — return result object
136
- }
137
- }
138
- ```
128
+ Processors are transport-agnostic: they take content, an `InferenceClient`, the job params, the user id, the `generator` (W3C SoftwareAgent), and an `onProgress` callback, and return annotations plus a result. The worker process handles claiming, content fetching, and lifecycle event emission.
139
129
 
140
130
  ## Discriminated Unions
141
131
 
@@ -144,12 +134,12 @@ Jobs use TypeScript discriminated unions for type safety:
144
134
  ```typescript
145
135
  function handleJob(job: AnyJob) {
146
136
  if (job.status === 'running') {
147
- console.log(job:progress); // Available
137
+ console.log(job.progress); // Available
148
138
  // console.log(job.result); // Compile error
149
139
  }
150
140
  if (job.status === 'complete') {
151
141
  console.log(job.result); // Available
152
- // console.log(job:progress); // Compile error
142
+ // console.log(job.progress); // Compile error
153
143
  }
154
144
  }
155
145
  ```
@@ -159,7 +149,7 @@ function handleJob(job: AnyJob) {
159
149
  Jobs are stored as individual JSON files organized by status:
160
150
 
161
151
  ```
162
- data/jobs/
152
+ {project.jobsDir}/
163
153
  pending/job-abc123.json
164
154
  running/job-def456.json
165
155
  complete/job-ghi789.json
@@ -182,7 +172,8 @@ Apache-2.0
182
172
 
183
173
  ## Related Packages
184
174
 
185
- - [`@semiont/core`](../core/) — Domain types, EventBus
186
- - [`@semiont/api-client`](../api-client/) — OpenAPI types
175
+ - [`@semiont/core`](../core/) — Domain types, `SemiontProject`, EventBus
176
+ - [`@semiont/sdk`](../sdk/) — `SemiontSession`, `WorkerBus`
177
+ - [`@semiont/http-transport`](../http-transport/) — HTTP transport, OpenAPI types
187
178
  - [`@semiont/inference`](../inference/) — AI inference client
188
179
  - [`@semiont/make-meaning`](../make-meaning/) — Actor model, Knowledge Base, service orchestration
package/dist/index.d.ts CHANGED
@@ -1,5 +1,4 @@
1
- import { Readable } from 'stream';
2
- import { ResourceId, JobId, UserId, EntityType, AnnotationId, Annotation, GatheredContext, TagSchema, Logger, EventBus, components } from '@semiont/core';
1
+ import { JobId, UserId, ResourceId, EntityType, AnnotationId, Annotation, GatheredContext, TagSchema, Logger, EventBus, components, SupportedMediaType } from '@semiont/core';
3
2
  import { SemiontProject } from '@semiont/core/node';
4
3
  import { InferenceClient } from '@semiont/inference';
5
4
 
@@ -16,12 +15,6 @@ import { InferenceClient } from '@semiont/inference';
16
15
  * - State machine is explicit and type-safe
17
16
  */
18
17
 
19
- /**
20
- * Content fetcher - turns a ResourceId into a readable stream.
21
- * Workers use this to access resource content on demand.
22
- * The implementation is provided by the backend at startup.
23
- */
24
- type ContentFetcher = (resourceId: ResourceId) => Promise<Readable | null>;
25
18
  type JobType = 'reference-annotation' | 'generation' | 'highlight-annotation' | 'assessment-annotation' | 'comment-annotation' | 'tag-annotation';
26
19
  type JobStatus = 'pending' | 'running' | 'complete' | 'failed' | 'cancelled';
27
20
  /**
@@ -31,6 +24,13 @@ interface JobMetadata {
31
24
  id: JobId;
32
25
  type: JobType;
33
26
  userId: UserId;
27
+ /**
28
+ * Audit-only snapshot of the requesting user (with `userEmail` and
29
+ * `userDomain` below), stamped at job creation and persisted in the
30
+ * on-disk job file. No code path reads these back — annotation
31
+ * `creator` attribution is derived from `userId` via `didToAgent()`.
32
+ * Kept intentionally so job files are self-describing to a human.
33
+ */
34
34
  userName: string;
35
35
  userEmail: string;
36
36
  userDomain: string;
@@ -327,7 +327,22 @@ interface JobQueue {
327
327
  createJob(job: AnyJob): Promise<void>;
328
328
  getJob(jobId: JobId): Promise<AnyJob | null>;
329
329
  updateJob(job: AnyJob, oldStatus?: JobStatus): Promise<void>;
330
- pollNextPendingJob(predicate?: (job: AnyJob) => boolean): Promise<AnyJob | null>;
330
+ /** Move a running job to `complete`. Returns false if the job isn't running. */
331
+ completeJob(jobId: JobId, result: Record<string, unknown>): Promise<boolean>;
332
+ /**
333
+ * Move a running job back to `pending` (retry, re-announced) while
334
+ * `retryCount < maxRetries`, else to `failed`. Returns what happened,
335
+ * or null if the job isn't running.
336
+ */
337
+ failJob(jobId: JobId, error: string): Promise<'retried' | 'failed' | null>;
338
+ /** Write progress into a running job's file (throttled, best-effort). */
339
+ recordProgress(jobId: JobId, progress: Record<string, unknown>): Promise<void>;
340
+ /**
341
+ * Cancel all pending jobs in a category — 'generation' is the
342
+ * `generation` type; 'annotation' is every `*-annotation` type.
343
+ * Running jobs are left to finish. Returns the number cancelled.
344
+ */
345
+ cancelPendingJobs(category: 'annotation' | 'generation'): Promise<number>;
331
346
  cancelJob(jobId: JobId): Promise<boolean>;
332
347
  getStats(): Promise<{
333
348
  pending: number;
@@ -342,33 +357,40 @@ interface JobQueue {
342
357
  * Job Queue Manager
343
358
  *
344
359
  * Filesystem-based job queue with atomic operations.
345
- * Jobs are stored in directories by status for easy polling.
360
+ * Jobs are stored in directories by status; status transitions are
361
+ * atomic delete + write across directories.
346
362
  */
347
363
 
348
364
  declare class FsJobQueue implements JobQueue {
349
365
  private eventBus?;
350
366
  private jobsDir;
351
367
  private logger;
352
- private pendingQueue;
353
- private watcher;
354
- private loadDebounceTimer;
368
+ private reannounceTimer;
369
+ private cleanupTimer;
370
+ /** Per-job timestamp of the last progress write, for throttling. */
371
+ private lastProgressWrite;
355
372
  constructor(project: SemiontProject, logger: Logger, eventBus?: EventBus | undefined);
356
373
  /**
357
- * Initialize job queue directories, load pending jobs, and start fs.watch
374
+ * Initialize job queue directories, announce any pending backlog,
375
+ * and start the re-announce interval. Idempotent.
358
376
  */
359
377
  initialize(): Promise<void>;
360
378
  /**
361
- * Clean up watcher
379
+ * Stop the re-announce and retention intervals
362
380
  */
363
381
  destroy(): void;
364
382
  /**
365
- * Load pending jobs from disk into in-memory queue
383
+ * Emit `job:queued` for a pending job, if an EventBus is wired and
384
+ * the job carries a `resourceId` (every current job type does).
366
385
  */
367
- private loadPendingJobs;
386
+ private announce;
368
387
  /**
369
- * Debounced version of loadPendingJobs fs.watch can fire rapidly
388
+ * Announce every job currently in `pending/`. Files that vanish or
389
+ * fail to parse mid-scan (claimed, cancelled, partially written)
390
+ * are skipped — they're either gone for a good reason or picked up
391
+ * on the next tick.
370
392
  */
371
- private debouncedLoadPendingJobs;
393
+ private announcePendingJobs;
372
394
  /**
373
395
  * Create a new job
374
396
  */
@@ -382,10 +404,25 @@ declare class FsJobQueue implements JobQueue {
382
404
  */
383
405
  updateJob(job: AnyJob, oldStatus?: JobStatus): Promise<void>;
384
406
  /**
385
- * Poll for next pending job (FIFO) from in-memory queue.
386
- * If a predicate is provided, returns the first matching job (skipping non-matching ones).
407
+ * Move a running job to `complete`. Returns false (and changes
408
+ * nothing) if the job is missing or not running which also makes
409
+ * duplicate `job:complete` events harmless.
410
+ */
411
+ completeJob(jobId: JobId, result: Record<string, unknown>): Promise<boolean>;
412
+ /**
413
+ * Retry-or-fail a running job. While `retryCount < maxRetries` the
414
+ * job goes back to `pending` with the count bumped (and is
415
+ * re-announced); after that it lands in `failed` with the error.
416
+ * Returns null (and changes nothing) if the job isn't running.
417
+ */
418
+ failJob(jobId: JobId, error: string): Promise<'retried' | 'failed' | null>;
419
+ /**
420
+ * Write progress into a running job's file. Throttled per job, and
421
+ * a no-op for jobs that aren't running. Beyond surfacing live
422
+ * progress to `job:status-requested`, each write refreshes the
423
+ * file's mtime — the heartbeat `recoverStaleRunningJobs` watches.
387
424
  */
388
- pollNextPendingJob(predicate?: (job: AnyJob) => boolean): Promise<AnyJob | null>;
425
+ recordProgress(jobId: JobId, progress: Record<string, unknown>): Promise<void>;
389
426
  /**
390
427
  * List jobs with filters
391
428
  */
@@ -394,6 +431,21 @@ declare class FsJobQueue implements JobQueue {
394
431
  * Cancel a job
395
432
  */
396
433
  cancelJob(jobId: JobId): Promise<boolean>;
434
+ /**
435
+ * Cancel all pending jobs in a category — the granularity of the
436
+ * `job:cancel-requested` UI signal. Running jobs are left to finish:
437
+ * interrupting a worker mid-inference would need a worker-side kill
438
+ * channel that doesn't exist.
439
+ */
440
+ cancelPendingJobs(category: 'annotation' | 'generation'): Promise<number>;
441
+ /**
442
+ * Recover running jobs orphaned by a dead worker: any `running/`
443
+ * file whose mtime is older than the stale window is fed through
444
+ * the same retry-or-fail path as `job:fail`. Progress writes
445
+ * refresh the mtime, so a live worker is never recovered out from
446
+ * under itself as long as it reports within the window.
447
+ */
448
+ recoverStaleRunningJobs(): Promise<number>;
397
449
  /**
398
450
  * Clean up old completed/failed jobs (older than retention period)
399
451
  */
@@ -415,80 +467,14 @@ declare class FsJobQueue implements JobQueue {
415
467
  }
416
468
 
417
469
  /**
418
- * Job Worker Base Class
419
- *
420
- * Abstract worker that polls the job queue and processes jobs.
421
- * Subclasses implement specific job processing logic.
422
- */
423
-
424
- declare abstract class JobWorker {
425
- private running;
426
- private currentJob;
427
- private pollIntervalMs;
428
- private errorBackoffMs;
429
- protected jobQueue: JobQueue;
430
- protected logger: Logger;
431
- constructor(jobQueue: JobQueue, pollIntervalMs: number | undefined, errorBackoffMs: number | undefined, logger: Logger);
432
- /**
433
- * Start the worker (polls queue in loop)
434
- */
435
- start(): Promise<void>;
436
- /**
437
- * Stop the worker (graceful shutdown)
438
- */
439
- stop(): Promise<void>;
440
- /**
441
- * Poll for next job to process
442
- */
443
- private pollNextJob;
444
- /**
445
- * Process a job (handles state transitions and error handling)
446
- */
447
- private processJob;
448
- /**
449
- * Handle job failure (retry or move to failed)
450
- */
451
- protected handleJobFailure(job: AnyJob, error: any): Promise<void>;
452
- /**
453
- * Update job progress (best-effort, doesn't throw)
454
- */
455
- protected updateJobProgress(job: AnyJob): Promise<void>;
456
- /**
457
- * Sleep utility
458
- */
459
- protected sleep(ms: number): Promise<void>;
460
- /**
461
- * Emit completion event (optional hook for subclasses)
462
- * Override this to emit job-specific completion events (e.g., job.completed)
463
- */
464
- protected emitCompletionEvent(_job: RunningJob<any, any>, _result: any): Promise<void>;
465
- /**
466
- * Get worker name (for logging)
467
- */
468
- protected abstract getWorkerName(): string;
469
- /**
470
- * Check if this worker can process the given job
471
- */
472
- protected abstract canProcessJob(job: AnyJob): boolean;
473
- /**
474
- * Execute the job (job-specific logic)
475
- * This is where the actual work happens
476
- * Return the result object (or void for jobs without results)
477
- * Throw an error to trigger retry logic
478
- */
479
- protected abstract executeJob(job: AnyJob): Promise<any>;
480
- }
481
-
482
- /**
483
- * Job Processors — extracted from JobWorker subclasses
470
+ * Job Processors
484
471
  *
485
472
  * Pure functions that take content + inference client + params,
486
473
  * report progress via callback, and return annotations + results.
487
474
  *
488
475
  * No EventBus, no JobQueue, no side effects except calling inference.
489
- * Two callers:
490
- * 1. In-process JobWorker subclasses (existing path)
491
- * 2. Remote WorkerStateUnit via worker-process.ts (new path)
476
+ * Driven by the remote worker process (worker-process.ts), which claims
477
+ * jobs over SSE and dispatches by jobType to these functions.
492
478
  */
493
479
 
494
480
  type Agent = components['schemas']['Agent'];
@@ -513,7 +499,7 @@ declare function processTagJob(content: string, inferenceClient: InferenceClient
513
499
  declare function processGenerationJob(inferenceClient: InferenceClient, params: GenerationParams, onProgress: OnProgress, logger: Logger): Promise<{
514
500
  content: string;
515
501
  title: string;
516
- format: string;
502
+ format: SupportedMediaType;
517
503
  result: GenerationResult;
518
504
  }>;
519
505
 
@@ -569,16 +555,11 @@ interface TagMatch {
569
555
  * 2. Call AI inference
570
556
  * 3. Parse and validate results using MotivationParsers
571
557
  *
572
- * All methods take content as a string parameter.
573
- * Workers are responsible for fetching content via ContentFetcher.
558
+ * All methods take content as a string parameter — the worker process
559
+ * fetches it and hands it in.
574
560
  */
575
561
 
576
562
  declare class AnnotationDetection {
577
- /**
578
- * Fetch content from a ContentFetcher and read the stream to a string.
579
- * Shared helper for all workers.
580
- */
581
- static fetchContent(contentFetcher: ContentFetcher, resourceId: ResourceId): Promise<string>;
582
563
  /**
583
564
  * Detect comments in content.
584
565
  *
@@ -641,5 +622,5 @@ declare function generateResourceFromTopic(topic: string, entityTypes: string[],
641
622
  content: string;
642
623
  }>;
643
624
 
644
- export { AnnotationDetection, FsJobQueue, JobWorker, generateResourceFromTopic, isCancelledJob, isCompleteJob, isFailedJob, isPendingJob, isRunningJob, processAssessmentJob, processCommentJob, processGenerationJob, processHighlightJob, processReferenceJob, processTagJob };
645
- export type { AnyJob, AssessmentDetectionJob, AssessmentDetectionParams, AssessmentDetectionProgress, AssessmentDetectionResult, CancelledJob, CommentDetectionJob, CommentDetectionParams, CommentDetectionProgress, CommentDetectionResult, CompleteJob, ContentFetcher, DetectionJob, DetectionParams, DetectionProgress, DetectionResult, FailedJob, GenerationJob, GenerationParams, GenerationResult, HighlightDetectionJob, HighlightDetectionParams, HighlightDetectionProgress, HighlightDetectionResult, JobMetadata, JobQueryFilters, JobQueue, JobStatus, JobType, OnProgress, PendingJob, ProcessorResult, RunningJob, TagDetectionJob, TagDetectionParams, TagDetectionProgress, TagDetectionResult, YieldProgress };
625
+ export { AnnotationDetection, FsJobQueue, generateResourceFromTopic, isCancelledJob, isCompleteJob, isFailedJob, isPendingJob, isRunningJob, processAssessmentJob, processCommentJob, processGenerationJob, processHighlightJob, processReferenceJob, processTagJob };
626
+ export type { AnyJob, AssessmentDetectionJob, AssessmentDetectionParams, AssessmentDetectionProgress, AssessmentDetectionResult, CancelledJob, CommentDetectionJob, CommentDetectionParams, CommentDetectionProgress, CommentDetectionResult, CompleteJob, DetectionJob, DetectionParams, DetectionProgress, DetectionResult, FailedJob, GenerationJob, GenerationParams, GenerationResult, HighlightDetectionJob, HighlightDetectionParams, HighlightDetectionProgress, HighlightDetectionResult, JobMetadata, JobQueryFilters, JobQueue, JobStatus, JobType, OnProgress, PendingJob, ProcessorResult, RunningJob, TagDetectionJob, TagDetectionParams, TagDetectionProgress, TagDetectionResult, YieldProgress };