@memberjunction/content-autotagging 5.22.0 → 5.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +78 -18
- package/dist/CloudStorage/generic/CloudStorageBase.d.ts +2 -2
- package/dist/CloudStorage/generic/CloudStorageBase.d.ts.map +1 -1
- package/dist/CloudStorage/generic/CloudStorageBase.js +2 -2
- package/dist/CloudStorage/generic/CloudStorageBase.js.map +1 -1
- package/dist/Core/generic/AutotagBase.d.ts +3 -1
- package/dist/Core/generic/AutotagBase.d.ts.map +1 -1
- package/dist/Core/generic/AutotagBase.js.map +1 -1
- package/dist/Engine/generic/AutotagBaseEngine.d.ts +89 -7
- package/dist/Engine/generic/AutotagBaseEngine.d.ts.map +1 -1
- package/dist/Engine/generic/AutotagBaseEngine.js +462 -76
- package/dist/Engine/generic/AutotagBaseEngine.js.map +1 -1
- package/dist/Entity/generic/AutotagEntity.d.ts +2 -2
- package/dist/Entity/generic/AutotagEntity.d.ts.map +1 -1
- package/dist/Entity/generic/AutotagEntity.js +2 -2
- package/dist/Entity/generic/AutotagEntity.js.map +1 -1
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.d.ts +2 -2
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.d.ts.map +1 -1
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.js +2 -2
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.js.map +1 -1
- package/dist/RSSFeed/generic/AutotagRSSFeed.d.ts +2 -2
- package/dist/RSSFeed/generic/AutotagRSSFeed.d.ts.map +1 -1
- package/dist/RSSFeed/generic/AutotagRSSFeed.js +2 -2
- package/dist/RSSFeed/generic/AutotagRSSFeed.js.map +1 -1
- package/dist/Websites/generic/AutotagWebsite.d.ts +2 -2
- package/dist/Websites/generic/AutotagWebsite.d.ts.map +1 -1
- package/dist/Websites/generic/AutotagWebsite.js +2 -2
- package/dist/Websites/generic/AutotagWebsite.js.map +1 -1
- package/package.json +11 -8
package/README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
# @memberjunction/content-autotagging
|
|
2
2
|
|
|
3
|
-
AI-powered content ingestion and
|
|
3
|
+
AI-powered content ingestion, autotagging, and vectorization engine for MemberJunction. Scans content from multiple sources (local files, websites, RSS feeds, cloud storage), extracts text from documents, uses LLMs to generate weighted tags and metadata attributes, and vectorizes content for semantic search.
|
|
4
4
|
|
|
5
5
|
## Overview
|
|
6
6
|
|
|
7
|
-
The `@memberjunction/content-autotagging` package provides an extensible framework for ingesting content from diverse sources and leveraging AI models to extract meaningful tags, summaries, and metadata. Built on the MemberJunction platform, it helps organizations automatically organize and categorize their content.
|
|
7
|
+
The `@memberjunction/content-autotagging` package provides an extensible framework for ingesting content from diverse sources and leveraging AI models to extract meaningful tags, summaries, and metadata. Built on the MemberJunction platform, it helps organizations automatically organize and categorize their content. The engine uses the managed **"Content Autotagging"** AI prompt via `AIPromptRunner` (rather than direct `BaseLLM` calls), enabling prompt versioning, model routing, and centralized prompt management.
|
|
8
8
|
|
|
9
9
|
```mermaid
|
|
10
10
|
graph TD
|
|
@@ -19,10 +19,14 @@ graph TD
|
|
|
19
19
|
G --> I["Office Parser"]
|
|
20
20
|
G --> J["HTML Parser<br/>(Cheerio)"]
|
|
21
21
|
|
|
22
|
-
A --> K["
|
|
23
|
-
K --> L["Tag Generation"]
|
|
22
|
+
A --> K["AIPromptRunner<br/>(Content Autotagging prompt)"]
|
|
23
|
+
K --> L["Tag Generation<br/>(with weights)"]
|
|
24
24
|
K --> M["Attribute Extraction"]
|
|
25
25
|
|
|
26
|
+
A --> V["Vectorization"]
|
|
27
|
+
V --> W["Embedding Model"]
|
|
28
|
+
V --> X["Vector DB Upsert"]
|
|
29
|
+
|
|
26
30
|
A --> N["Content Items<br/>(Database)"]
|
|
27
31
|
A --> O["Content Item Attributes<br/>(Database)"]
|
|
28
32
|
|
|
@@ -34,10 +38,21 @@ graph TD
|
|
|
34
38
|
style F fill:#2d8659,stroke:#1a5c3a,color:#fff
|
|
35
39
|
style G fill:#b8762f,stroke:#8a5722,color:#fff
|
|
36
40
|
style K fill:#7c5295,stroke:#563a6b,color:#fff
|
|
41
|
+
style V fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
37
42
|
style N fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
38
43
|
style O fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
39
44
|
```
|
|
40
45
|
|
|
46
|
+
## Key Features
|
|
47
|
+
|
|
48
|
+
- **AIPromptRunner integration**: Uses the managed "Content Autotagging" prompt, enabling prompt versioning and model routing through MJ's prompt management system (no direct `BaseLLM` calls)
|
|
49
|
+
- **Tag weights**: Each generated tag includes a relevance weight (0.0--1.0) indicating how strongly the tag relates to the content
|
|
50
|
+
- **Batch processing**: Configurable batch size (default: 20) with concurrent processing within each batch
|
|
51
|
+
- **Parallel tagging + vectorization**: Tagging and vectorization run in parallel for maximum throughput
|
|
52
|
+
- **Per-source/type embedding model selection**: Cascade resolution for embedding model and vector index -- source override, then content type default, then global fallback (first active vector index)
|
|
53
|
+
- **Real-time progress reporting**: `AutotagProgressCallback` provides per-item progress updates during processing
|
|
54
|
+
- **Graceful provider skip**: Providers skip gracefully when no content sources are configured for their type
|
|
55
|
+
|
|
41
56
|
## Installation
|
|
42
57
|
|
|
43
58
|
```bash
|
|
@@ -51,7 +66,8 @@ sequenceDiagram
|
|
|
51
66
|
participant Source as Content Source
|
|
52
67
|
participant Engine as AutotagBaseEngine
|
|
53
68
|
participant Extract as Text Extractor
|
|
54
|
-
participant
|
|
69
|
+
participant Prompt as AIPromptRunner
|
|
70
|
+
participant Vec as Embedding + VectorDB
|
|
55
71
|
participant DB as Database
|
|
56
72
|
|
|
57
73
|
Source->>Engine: Provide content items
|
|
@@ -59,9 +75,14 @@ sequenceDiagram
|
|
|
59
75
|
Engine->>Extract: Extract text (PDF/Office/HTML)
|
|
60
76
|
Extract-->>Engine: Raw text
|
|
61
77
|
Engine->>Engine: Chunk text for token limits
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
78
|
+
par Tagging
|
|
79
|
+
Engine->>Prompt: Run "Content Autotagging" prompt
|
|
80
|
+
Prompt-->>Engine: Tags (with weights) + Attributes
|
|
81
|
+
and Vectorization
|
|
82
|
+
Engine->>Vec: Embed text + upsert to vector DB
|
|
83
|
+
Vec-->>Engine: Vectorization result
|
|
84
|
+
end
|
|
85
|
+
Engine->>DB: Save ContentItem + Tags + Attributes
|
|
65
86
|
Engine->>DB: Create ProcessRun record
|
|
66
87
|
```
|
|
67
88
|
|
|
@@ -74,7 +95,7 @@ sequenceDiagram
|
|
|
74
95
|
| RSS Feeds | `AutotagRSSFeed` | Parses RSS/Atom feeds for articles |
|
|
75
96
|
| Azure Blob | `AutotagAzureBlob` | Processes files from Azure Blob Storage |
|
|
76
97
|
|
|
77
|
-
All sources extend `AutotagBase`, which provides the common interface for content discovery and ingestion.
|
|
98
|
+
All sources extend `AutotagBase`, which provides the common interface for content discovery and ingestion. Each source's `Autotag()` method accepts an optional `AutotagProgressCallback` for real-time progress reporting. Sources skip gracefully when no content sources of their type are configured in the database.
|
|
78
99
|
|
|
79
100
|
## Supported File Formats
|
|
80
101
|
|
|
@@ -85,6 +106,32 @@ All sources extend `AutotagBase`, which provides the common interface for conten
|
|
|
85
106
|
| HTML/Web Pages | `cheerio` | .html, .htm |
|
|
86
107
|
| Plain Text | Native | .txt, .md, .csv |
|
|
87
108
|
|
|
109
|
+
## Tag Weights
|
|
110
|
+
|
|
111
|
+
The LLM prompt returns tags with relevance weights between 0.0 and 1.0 indicating how strongly each tag relates to the content. Both old-style (plain string array) and new-style (object with `tag` + `weight`) responses are supported:
|
|
112
|
+
|
|
113
|
+
```json
|
|
114
|
+
// New format (preferred) — returned by the "Content Autotagging" prompt
|
|
115
|
+
[
|
|
116
|
+
{ "tag": "machine learning", "weight": 0.95 },
|
|
117
|
+
{ "tag": "neural networks", "weight": 0.82 },
|
|
118
|
+
{ "tag": "data science", "weight": 0.70 }
|
|
119
|
+
]
|
|
120
|
+
|
|
121
|
+
// Legacy format — auto-normalized with weight 1.0
|
|
122
|
+
["machine learning", "neural networks", "data science"]
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Embedding Model and Vector Index Resolution
|
|
126
|
+
|
|
127
|
+
The engine resolves the embedding model and vector index for each content item using a three-level cascade:
|
|
128
|
+
|
|
129
|
+
1. **Content Source override**: If the source has `EmbeddingModelID` and `VectorIndexID` set, those are used
|
|
130
|
+
2. **Content Type default**: If the source has no override, the content type's defaults are used
|
|
131
|
+
3. **Global fallback**: If neither source nor type specifies, the first active vector index in the system is used
|
|
132
|
+
|
|
133
|
+
Items sharing the same (embeddingModel, vectorIndex) pair are grouped and processed together for efficient batching.
|
|
134
|
+
|
|
88
135
|
## Usage
|
|
89
136
|
|
|
90
137
|
### RSS Feed Processing
|
|
@@ -93,7 +140,9 @@ All sources extend `AutotagBase`, which provides the common interface for conten
|
|
|
93
140
|
import { AutotagRSSFeed } from '@memberjunction/content-autotagging';
|
|
94
141
|
|
|
95
142
|
const rssTagger = new AutotagRSSFeed();
|
|
96
|
-
await rssTagger.Autotag(contextUser)
|
|
143
|
+
await rssTagger.Autotag(contextUser, (processed, total, currentItem) => {
|
|
144
|
+
console.log(`[${processed}/${total}] Processing: ${currentItem}`);
|
|
145
|
+
});
|
|
97
146
|
```
|
|
98
147
|
|
|
99
148
|
### Website Content Processing
|
|
@@ -133,13 +182,19 @@ await blobTagger.Autotag(contextUser);
|
|
|
133
182
|
import { AutotagBaseEngine } from '@memberjunction/content-autotagging';
|
|
134
183
|
|
|
135
184
|
const engine = AutotagBaseEngine.Instance;
|
|
136
|
-
|
|
185
|
+
|
|
186
|
+
// Process content items with custom batch size
|
|
187
|
+
await engine.ExtractTextAndProcessWithLLM(contentItems, contextUser, batchSize);
|
|
188
|
+
|
|
189
|
+
// Vectorize content items (runs in parallel with tagging)
|
|
190
|
+
const result = await engine.VectorizeContentItems(contentItems, tagMap, contextUser, batchSize);
|
|
191
|
+
console.log(`Vectorized: ${result.vectorized}, Skipped: ${result.skipped}`);
|
|
137
192
|
```
|
|
138
193
|
|
|
139
194
|
## Creating a Custom Content Source
|
|
140
195
|
|
|
141
196
|
```typescript
|
|
142
|
-
import { AutotagBase } from '@memberjunction/content-autotagging';
|
|
197
|
+
import { AutotagBase, AutotagProgressCallback } from '@memberjunction/content-autotagging';
|
|
143
198
|
import { RegisterClass } from '@memberjunction/global';
|
|
144
199
|
|
|
145
200
|
@RegisterClass(AutotagBase, 'AutotagCustomSource')
|
|
@@ -149,13 +204,14 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
149
204
|
return contentItems;
|
|
150
205
|
}
|
|
151
206
|
|
|
152
|
-
public async Autotag(contextUser) {
|
|
207
|
+
public async Autotag(contextUser, onProgress?: AutotagProgressCallback) {
|
|
153
208
|
const contentSourceTypeID = await this.engine.setSubclassContentSourceType(
|
|
154
209
|
'Custom Source', contextUser
|
|
155
210
|
);
|
|
156
211
|
const contentSources = await this.engine.getAllContentSources(
|
|
157
212
|
contextUser, contentSourceTypeID
|
|
158
213
|
);
|
|
214
|
+
if (contentSources.length === 0) return; // Skip gracefully
|
|
159
215
|
const contentItems = await this.SetContentItemsToProcess(contentSources);
|
|
160
216
|
await this.engine.ExtractTextAndProcessWithLLM(contentItems, contextUser);
|
|
161
217
|
}
|
|
@@ -166,12 +222,12 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
166
222
|
|
|
167
223
|
| Entity | Purpose |
|
|
168
224
|
|--------|---------|
|
|
169
|
-
| Content Sources | Configuration for each content source |
|
|
225
|
+
| Content Sources | Configuration for each content source (with optional EmbeddingModelID/VectorIndexID overrides) |
|
|
170
226
|
| Content Items | Individual pieces of content with extracted text |
|
|
171
|
-
| Content Item Tags | AI-generated tags |
|
|
227
|
+
| Content Item Tags | AI-generated tags with relevance weights (0.0--1.0) |
|
|
172
228
|
| Content Item Attributes | Additional extracted metadata |
|
|
173
229
|
| Content Process Runs | Processing history and audit trail |
|
|
174
|
-
| Content Types | Content categorization definitions |
|
|
230
|
+
| Content Types | Content categorization definitions (with default EmbeddingModelID/VectorIndexID) |
|
|
175
231
|
| Content Source Types | Source type definitions |
|
|
176
232
|
| Content File Types | Supported file format definitions |
|
|
177
233
|
|
|
@@ -182,8 +238,12 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
182
238
|
| `@memberjunction/core` | Entity system and metadata |
|
|
183
239
|
| `@memberjunction/global` | Class registration |
|
|
184
240
|
| `@memberjunction/core-entities` | Content entity types |
|
|
185
|
-
| `@memberjunction/ai` |
|
|
186
|
-
| `@memberjunction/aiengine` | AI Engine
|
|
241
|
+
| `@memberjunction/ai` | Embedding model integration |
|
|
242
|
+
| `@memberjunction/aiengine` | AI Engine for prompt cache access |
|
|
243
|
+
| `@memberjunction/ai-prompts` | AIPromptRunner for managed prompt execution |
|
|
244
|
+
| `@memberjunction/ai-core-plus` | AIPromptParams types |
|
|
245
|
+
| `@memberjunction/ai-vectors` | TextChunker for content chunking |
|
|
246
|
+
| `@memberjunction/ai-vectordb` | VectorDBBase for vector storage |
|
|
187
247
|
| `pdf-parse` | PDF text extraction |
|
|
188
248
|
| `officeparser` | Office document parsing |
|
|
189
249
|
| `cheerio` | HTML parsing |
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { AutotagBase } from "../../Core/index.js";
|
|
1
|
+
import { AutotagBase, AutotagProgressCallback } from "../../Core/index.js";
|
|
2
2
|
import { AutotagBaseEngine } from "../../Engine/index.js";
|
|
3
3
|
import { ContentSourceParams } from "../../Engine/index.js";
|
|
4
4
|
import { UserInfo } from "@memberjunction/core";
|
|
@@ -22,7 +22,7 @@ export declare abstract class CloudStorageBase extends AutotagBase {
|
|
|
22
22
|
* @returns - An array of content source items that have been modified or added after the most recent process run for that content source
|
|
23
23
|
*/
|
|
24
24
|
abstract SetNewAndModifiedContentItems(contentSourceParams: ContentSourceParams, lastRunDate: Date, contextUser: UserInfo): Promise<MJContentItemEntity[]>;
|
|
25
|
-
Autotag(contextUser: UserInfo): Promise<void>;
|
|
25
|
+
Autotag(contextUser: UserInfo, onProgress?: AutotagProgressCallback): Promise<void>;
|
|
26
26
|
SetContentItemsToProcess(contentSources: MJContentSourceEntity[]): Promise<MJContentItemEntity[]>;
|
|
27
27
|
}
|
|
28
28
|
//# sourceMappingURL=CloudStorageBase.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"CloudStorageBase.d.ts","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,MAAM,YAAY,CAAC;
|
|
1
|
+
{"version":3,"file":"CloudStorageBase.d.ts","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,uBAAuB,EAAE,MAAM,YAAY,CAAC;AAClE,OAAO,EAAE,iBAAiB,EAAE,MAAM,cAAc,CAAC;AACjD,OAAO,EAAE,mBAAmB,EAAE,MAAM,cAAc,CAAC;AACnD,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAI3F,8BAAsB,gBAAiB,SAAQ,WAAW;IACtD,SAAS,CAAC,WAAW,EAAE,QAAQ,CAAC;IAChC,SAAS,CAAC,MAAM,EAAE,iBAAiB,CAAC;IACpC,SAAS,CAAC,mBAAmB,EAAE,MAAM,CAAA;;IAOrC;;MAEE;aACc,YAAY,IAAI,OAAO,CAAC,IAAI,CAAC;IAE7C;;;;;;;;MAQE;aACc,6BAA6B,CAAC,mBAAmB,EAAE,mBAAmB,EAAE,WAAW,EAAE,IAAI,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;IAEpJ,OAAO,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,CAAC,EAAE,uBAAuB,GAAG,OAAO,CAAC,IAAI,CAAC;IAQnF,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;CAyBjH"}
|
|
@@ -7,12 +7,12 @@ export class CloudStorageBase extends AutotagBase {
|
|
|
7
7
|
super();
|
|
8
8
|
this.engine = AutotagBaseEngine.Instance;
|
|
9
9
|
}
|
|
10
|
-
async Autotag(contextUser) {
|
|
10
|
+
async Autotag(contextUser, onProgress) {
|
|
11
11
|
this.contextUser = contextUser;
|
|
12
12
|
this.contentSourceTypeID = this.engine.SetSubclassContentSourceType('Cloud Storage');
|
|
13
13
|
const contentSources = await this.engine.getAllContentSources(this.contextUser, this.contentSourceTypeID) || [];
|
|
14
14
|
const contentItemsToProcess = await this.SetContentItemsToProcess(contentSources);
|
|
15
|
-
await this.engine.ExtractTextAndProcessWithLLM(contentItemsToProcess, this.contextUser);
|
|
15
|
+
await this.engine.ExtractTextAndProcessWithLLM(contentItemsToProcess, this.contextUser, undefined, onProgress);
|
|
16
16
|
}
|
|
17
17
|
async SetContentItemsToProcess(contentSources) {
|
|
18
18
|
const contentItemsToProcess = [];
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"CloudStorageBase.js","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,
|
|
1
|
+
{"version":3,"file":"CloudStorageBase.js","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAA2B,MAAM,YAAY,CAAC;AAClE,OAAO,EAAE,iBAAiB,EAAE,MAAM,cAAc,CAAC;AAIjD,OAAO,MAAM,MAAM,QAAQ,CAAC;AAC5B,MAAM,CAAC,MAAM,CAAC,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAA;AAE9B,MAAM,OAAgB,gBAAiB,SAAQ,WAAW;IAKtD;QACI,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,MAAM,GAAG,iBAAiB,CAAC,QAAQ,CAAC;IAC7C,CAAC;IAkBM,KAAK,CAAC,OAAO,CAAC,WAAqB,EAAE,UAAoC;QAC5E,IAAI,CAAC,WAAW,GAAG,WAAW,CAAC;QAC/B,IAAI,CAAC,mBAAmB,GAAG,IAAI,CAAC,MAAM,CAAC,4BAA4B,CAAC,eAAe,CAAC,CAAC;QACrF,MAAM,cAAc,GAA4B,MAAM,IAAI,CAAC,MAAM,CAAC,oBAAoB,CAAC,IAAI,CAAC,WAAW,EAAE,IAAI,CAAC,mBAAmB,CAAC,IAAI,EAAE,CAAC;QACzI,MAAM,qBAAqB,GAA0B,MAAM,IAAI,CAAC,wBAAwB,CAAC,cAAc,CAAC,CAAA;QACxG,MAAM,IAAI,CAAC,MAAM,CAAC,4BAA4B,CAAC,qBAAqB,EAAE,IAAI,CAAC,WAAW,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;IACnH,CAAC;IAEM,KAAK,CAAC,wBAAwB,CAAC,cAAuC;QACzE,MAAM,qBAAqB,GAA0B,EAAE,CAAA;QAEvD,KAAK,MAAM,aAAa,IAAI,cAAc,EAAE,CAAC;YACzC,MAAM,IAAI,CAAC,YAAY,EAAE,CAAC;YAE1B,MAAM,mBAAmB,GAAwB;gBAC7C,eAAe,EAAE,aAAa,CAAC,EAAE;gBACjC,IAAI,EAAE,aAAa,CAAC,IAAI;gBACxB,aAAa,EAAE,aAAa,CAAC,aAAa;gBAC1C,mBAAmB,EAAE,aAAa,CAAC,mBAAmB;gBACtD,iBAAiB,EAAE,aAAa,CAAC,iBAAiB;gBAClD,GAAG,EAAE,aAAa,CAAC,GAAG;aACzB,CAAA;YAED,MAAM,WAAW,GAAS,MAAM,IAAI,CAAC,MAAM,CAAC,2BAA2B,CAAC,mBAAmB,CAAC,eAAe,EAAE,IAAI,CAAC,WAAW,CAAC,CAAC;YAE/H,IAAI,WAAW,EAAE,CAAC;gBACd,MAAM,YAAY,GAAG,MAAM,IAAI,CAAC,6BAA6B,CAAC,mBAAmB,EAAE,WAAW,EAAE,IAAI,CAAC,WAAW,CAAC,CAAC;gBAClH,qBAAqB,CAAC,IAAI,CAAC,GAAG,YAAY,CAAC,CAAC;YAChD,CAAC;QACL,CAAC;QAED,OAAO,qBAAqB,CAAC;IACjC,CAAC;CACJ"}
|
|
@@ -1,7 +1,9 @@
|
|
|
1
1
|
import { UserInfo } from '@memberjunction/core';
|
|
2
2
|
import { MJContentSourceEntity, MJContentItemEntity } from '@memberjunction/core-entities';
|
|
3
|
+
/** Progress callback for per-item updates during autotagging */
|
|
4
|
+
export type AutotagProgressCallback = (processed: number, total: number, currentItem?: string) => void;
|
|
3
5
|
export declare abstract class AutotagBase {
|
|
4
6
|
abstract SetContentItemsToProcess(contentSources: MJContentSourceEntity[]): Promise<MJContentItemEntity[]>;
|
|
5
|
-
abstract Autotag(contextUser: UserInfo): Promise<void>;
|
|
7
|
+
abstract Autotag(contextUser: UserInfo, onProgress?: AutotagProgressCallback): Promise<void>;
|
|
6
8
|
}
|
|
7
9
|
//# sourceMappingURL=AutotagBase.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBase.d.ts","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAE3F,8BAAsB,WAAW;aACb,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;aACjG,OAAO,CAAC,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;
|
|
1
|
+
{"version":3,"file":"AutotagBase.d.ts","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAE3F,gEAAgE;AAChE,MAAM,MAAM,uBAAuB,GAAG,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,WAAW,CAAC,EAAE,MAAM,KAAK,IAAI,CAAC;AAEvG,8BAAsB,WAAW;aACb,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;aACjG,OAAO,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,CAAC,EAAE,uBAAuB,GAAG,OAAO,CAAC,IAAI,CAAC;CACtG"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBase.js","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"AutotagBase.js","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAMA,MAAM,OAAgB,WAAW;CAGhC"}
|
|
@@ -2,7 +2,7 @@ import { BaseEngine, IMetadataProvider, UserInfo } from '@memberjunction/core';
|
|
|
2
2
|
import { MJContentSourceEntity, MJContentItemEntity, MJContentFileTypeEntity, MJContentTypeEntity, MJContentSourceTypeEntity, MJContentTypeAttributeEntity, MJContentSourceTypeParamEntity } from '@memberjunction/core-entities';
|
|
3
3
|
import { ContentSourceParams, ContentSourceTypeParams, ContentSourceTypeParamValue } from './content.types.js';
|
|
4
4
|
import { ProcessRunParams, JsonObject, ContentItemProcessParams } from './process.types.js';
|
|
5
|
-
import {
|
|
5
|
+
import type { MJAIPromptEntityExtended } from '@memberjunction/ai-core-plus';
|
|
6
6
|
/**
|
|
7
7
|
* Core engine for content autotagging. Extends BaseEngine to cache content metadata
|
|
8
8
|
* (types, source types, file types, attributes) at startup. Uses AIEngine via composition
|
|
@@ -28,8 +28,9 @@ export declare class AutotagBaseEngine extends BaseEngine<AutotagBaseEngine> {
|
|
|
28
28
|
Config(forceRefresh?: boolean, contextUser?: UserInfo, provider?: IMetadataProvider): Promise<unknown>;
|
|
29
29
|
/**
|
|
30
30
|
* Given a list of content items, extract the text from each and process with LLM for tagging.
|
|
31
|
+
* Items are processed in configurable batches with controlled concurrency within each batch.
|
|
31
32
|
*/
|
|
32
|
-
ExtractTextAndProcessWithLLM(contentItems: MJContentItemEntity[], contextUser: UserInfo): Promise<void>;
|
|
33
|
+
ExtractTextAndProcessWithLLM(contentItems: MJContentItemEntity[], contextUser: UserInfo, batchSize?: number, onProgress?: (processed: number, total: number, currentItem?: string) => void): Promise<void>;
|
|
33
34
|
/**
|
|
34
35
|
* Builds processing parameters for a single content item
|
|
35
36
|
*/
|
|
@@ -38,12 +39,27 @@ export declare class AutotagBaseEngine extends BaseEngine<AutotagBaseEngine> {
|
|
|
38
39
|
* Process a content item's text with the LLM and save results.
|
|
39
40
|
*/
|
|
40
41
|
ProcessContentItemText(params: ContentItemProcessParams, contextUser: UserInfo): Promise<void>;
|
|
42
|
+
/**
|
|
43
|
+
* Resolves the "Content Autotagging" prompt from the AIEngine cache.
|
|
44
|
+
* Throws if the prompt is not found or not active.
|
|
45
|
+
*/
|
|
46
|
+
private getAutotagPrompt;
|
|
47
|
+
/**
|
|
48
|
+
* Builds template data for the autotagging prompt from processing params and chunk context.
|
|
49
|
+
*/
|
|
50
|
+
private buildPromptData;
|
|
41
51
|
promptAndRetrieveResultsFromLLM(params: ContentItemProcessParams, contextUser: UserInfo): Promise<JsonObject>;
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
52
|
+
/**
|
|
53
|
+
* Resolves the input token limit for chunking. Uses the model specified by modelID if available,
|
|
54
|
+
* otherwise falls back to a conservative default.
|
|
55
|
+
*/
|
|
56
|
+
private resolveTokenLimit;
|
|
57
|
+
/**
|
|
58
|
+
* Processes a single text chunk using AIPromptRunner and merges results.
|
|
59
|
+
* Uses the prompt's configured model by default. If ContentType.AIModelID is set,
|
|
60
|
+
* it is passed as a runtime model override via AIPromptParams.override.
|
|
61
|
+
*/
|
|
62
|
+
processChunkWithPromptRunner(prompt: MJAIPromptEntityExtended, params: ContentItemProcessParams, chunk: string, LLMResults: JsonObject, contextUser: UserInfo): Promise<JsonObject>;
|
|
47
63
|
saveLLMResults(LLMResults: JsonObject, contextUser: UserInfo): Promise<void>;
|
|
48
64
|
deleteInvalidContentItem(contentItemID: string, contextUser: UserInfo): Promise<void>;
|
|
49
65
|
/**
|
|
@@ -104,5 +120,71 @@ export declare class AutotagBaseEngine extends BaseEngine<AutotagBaseEngine> {
|
|
|
104
120
|
parseDOCX(dataBuffer: Buffer): Promise<string>;
|
|
105
121
|
parseHTML(data: string): Promise<string>;
|
|
106
122
|
parseFileFromPath(filePath: string): Promise<string>;
|
|
123
|
+
/**
|
|
124
|
+
* Embeds content items and upserts them to the appropriate vector index.
|
|
125
|
+
* Items are grouped by their resolved (embeddingModel + vectorIndex) pair — derived
|
|
126
|
+
* from per-ContentSource overrides, per-ContentType defaults, or the global fallback
|
|
127
|
+
* (first active VectorIndex). Each group is processed in configurable batches with
|
|
128
|
+
* parallel upserts within each batch.
|
|
129
|
+
*/
|
|
130
|
+
VectorizeContentItems(items: MJContentItemEntity[], contextUser: UserInfo, onProgress?: (processed: number, total: number) => void, batchSize?: number): Promise<{
|
|
131
|
+
vectorized: number;
|
|
132
|
+
skipped: number;
|
|
133
|
+
}>;
|
|
134
|
+
/**
|
|
135
|
+
* Process a single infrastructure group: embed texts in batches and upsert to vector DB.
|
|
136
|
+
* Upserts within each batch run in parallel for throughput.
|
|
137
|
+
*/
|
|
138
|
+
private vectorizeGroup;
|
|
139
|
+
/**
|
|
140
|
+
* Load content source and content type records for all unique source/type IDs
|
|
141
|
+
* referenced by the given items. Returns maps keyed by normalized ID.
|
|
142
|
+
*/
|
|
143
|
+
private loadContentSourceAndTypeMaps;
|
|
144
|
+
/**
|
|
145
|
+
* Resolve the (embeddingModelID, vectorIndexID) pair for a content item using
|
|
146
|
+
* the cascade: ContentSource override -> ContentType default -> null (global fallback).
|
|
147
|
+
*/
|
|
148
|
+
private resolveItemInfrastructureIds;
|
|
149
|
+
/**
|
|
150
|
+
* Group items by their resolved (embeddingModelID + vectorIndexID) key.
|
|
151
|
+
* Items with the same pair share infrastructure and can be batched together.
|
|
152
|
+
*/
|
|
153
|
+
private groupItemsByInfrastructure;
|
|
154
|
+
/** Create a stable cache key for an (embeddingModelID, vectorIndexID) pair */
|
|
155
|
+
private infraGroupKey;
|
|
156
|
+
/**
|
|
157
|
+
* Resolve a group key into concrete infrastructure instances. For the 'default|default'
|
|
158
|
+
* key, falls back to the first active VectorIndex (original behavior).
|
|
159
|
+
*/
|
|
160
|
+
private resolveGroupInfrastructure;
|
|
161
|
+
/**
|
|
162
|
+
* Build infrastructure from explicit embeddingModelID and vectorIndexID.
|
|
163
|
+
* Looks up the vector index by ID and the embedding model from AIEngine.
|
|
164
|
+
*/
|
|
165
|
+
private buildVectorInfrastructure;
|
|
166
|
+
/**
|
|
167
|
+
* Fallback: resolve infrastructure from the first active VectorIndex (original behavior).
|
|
168
|
+
*/
|
|
169
|
+
private getDefaultVectorInfrastructure;
|
|
170
|
+
/**
|
|
171
|
+
* Shared helper: given a vector index record and embedding model ID, resolve all
|
|
172
|
+
* driver instances needed for embedding + upsert.
|
|
173
|
+
*/
|
|
174
|
+
private createInfrastructureFromIndex;
|
|
175
|
+
/** Find an embedding model by ID in AIEngine, with helpful error reporting */
|
|
176
|
+
private findEmbeddingModel;
|
|
177
|
+
/** Create a BaseEmbeddings instance for a given driver class */
|
|
178
|
+
private createEmbeddingInstance;
|
|
179
|
+
/** Create a VectorDBBase instance for a given class key */
|
|
180
|
+
private createVectorDBInstance;
|
|
181
|
+
/** SHA-1 deterministic vector ID for a content item */
|
|
182
|
+
private contentItemVectorId;
|
|
183
|
+
/** Build the text that gets embedded: Title + Description + full Text */
|
|
184
|
+
private buildEmbeddingText;
|
|
185
|
+
/** Build metadata stored alongside the vector — truncate large text fields */
|
|
186
|
+
private buildVectorMetadata;
|
|
187
|
+
/** Load all tags for the given items in a single RunView call */
|
|
188
|
+
private loadTagsForItems;
|
|
107
189
|
}
|
|
108
190
|
//# sourceMappingURL=AutotagBaseEngine.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBaseEngine.d.ts","sourceRoot":"","sources":["../../../src/Engine/generic/AutotagBaseEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAA4B,iBAAiB,EAAqB,QAAQ,EAAuB,MAAM,sBAAsB,CAAA;AAEhJ,OAAO,EACH,qBAAqB,EAAE,mBAAmB,EAAE,uBAAuB,EACxC,mBAAmB,EAAE,yBAAyB,EACzE,4BAA4B,EACE,8BAA8B,EAC/D,MAAM,+BAA+B,CAAA;AACtC,OAAO,EAAE,mBAAmB,EAAE,uBAAuB,EAAE,2BAA2B,EAAE,MAAM,iBAAiB,CAAA;AAI3G,OAAO,EAAE,gBAAgB,EAAE,UAAU,EAAE,wBAAwB,EAAE,MAAM,iBAAiB,CAAA;
|
|
1
|
+
{"version":3,"file":"AutotagBaseEngine.d.ts","sourceRoot":"","sources":["../../../src/Engine/generic/AutotagBaseEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAA4B,iBAAiB,EAAqB,QAAQ,EAAuB,MAAM,sBAAsB,CAAA;AAEhJ,OAAO,EACH,qBAAqB,EAAE,mBAAmB,EAAE,uBAAuB,EACxC,mBAAmB,EAAE,yBAAyB,EACzE,4BAA4B,EACE,8BAA8B,EAC/D,MAAM,+BAA+B,CAAA;AACtC,OAAO,EAAE,mBAAmB,EAAE,uBAAuB,EAAE,2BAA2B,EAAE,MAAM,iBAAiB,CAAA;AAI3G,OAAO,EAAE,gBAAgB,EAAE,UAAU,EAAE,wBAAwB,EAAE,MAAM,iBAAiB,CAAA;AASxF,OAAO,KAAK,EAAE,wBAAwB,EAAE,MAAM,8BAA8B,CAAA;AAkB5E;;;;GAIG;AACH,qBACa,iBAAkB,SAAQ,UAAU,CAAC,iBAAiB,CAAC;IAChE,WAAkB,QAAQ,IAAI,iBAAiB,CAE9C;IAGD,OAAO,CAAC,aAAa,CAA6B;IAClD,OAAO,CAAC,mBAAmB,CAAmC;IAC9D,OAAO,CAAC,iBAAiB,CAAiC;IAC1D,OAAO,CAAC,sBAAsB,CAAsC;IACpE,OAAO,CAAC,wBAAwB,CAAwC;IAExE,2CAA2C;IAC3C,IAAW,YAAY,IAAI,mBAAmB,EAAE,CAA+B;IAC/E,kDAAkD;IAClD,IAAW,kBAAkB,IAAI,yBAAyB,EAAE,CAAqC;IACjG,gDAAgD;IAChD,IAAW,gBAAgB,IAAI,uBAAuB,EAAE,CAAmC;IAC3F,qDAAqD;IACrD,IAAW,qBAAqB,IAAI,4BAA4B,EAAE,CAAwC;IAC1G,wDAAwD;IACxD,IAAW,uBAAuB,IAAI,8BAA8B,EAAE,CAA0C;IAEnG,MAAM,CAAC,YAAY,CAAC,EAAE,OAAO,EAAE,WAAW,CAAC,EAAE,QAAQ,EAAE,QAAQ,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC,OAAO,CAAC;IAgCnH;;;OAGG;IACU,4BAA4B,CACrC,YAAY,EAAE,mBAAmB,EAAE,EACnC,WAAW,EAAE,QAAQ,EACrB,SAAS,GAAE,MAAqC,EAChD,UAAU,CAAC,EAAE,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,WAAW,CAAC,EAAE,MAAM,KAAK,IAAI,GAC9E,OAAO,CAAC,IAAI,CAAC;IA4ChB;;OAEG;YACW,qBAAqB;IAgBnC;;OAEG;IACU,sBAAsB,CAAC,MAAM,EAAE,wBAAwB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAK3G;;;OAGG;IACH,OAAO,CAAC,gBAAgB;IAWxB;;OAEG;IACH,OAAO,CAAC,eAAe;IAqBV,+BAA+B,CAAC,MAAM,EAAE,wBAAwB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,UAAU,CAAC;IAuB1H;;;OAGG;IACH,OAAO,CAAC,iBAAiB;IAWzB;;;;OAIG;IACU,4BAA4B,CACrC,MAAM,EAAE,wBAAwB,EAChC,MAAM,EAAE,wBAAwB,EAChC,KAAK,EAAE,MAAM,EACb,UAAU,EAAE,UAAU,EACtB,WAAW,EAAE,QAAQ,GACtB,OAAO,CAAC,UAAU,CAAC;IAkDT,cAAc,CAAC,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAS5E,wBAAwB,CAAC,aAAa,EAAE,MAAM,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAOlG;;;OAGG;IACI,kBAAkB,CAAC,IAAI,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,GAAG,MAAM,EAAE;IA0BrE;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAYzB;;;OAGG;IACU,mBAAmB,CAAC,aAAa,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAiCrH;;;OAGG;IACU,iCAAiC,CAAC,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IA+B5G;;OAEG;IACU,oBAAoB,CAAC,WAAW,EAAE,QAAQ,EAAE,mBAAmB,EAAE,MAAM,GAAG,OAAO,CAAC,qBAAqB,EAAE,CAAC;IAehH,4BAA4B,CAAC,QAAQ,EAAE,MAAM,GAAG,MAAM;IAQhD,sBAAsB,CAAC,aAAa,EAAE,qBAAqB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,GAAG,CAAC,MAAM,EAAE,2BAA2B,CAAC,CAAC;IA2B5I,iCAAiC,CAAC,wBAAwB,EAAE,MAAM,GAAG,uBAAuB;IAa5F,sBAAsB,CAAC,KAAK,EAAE,MAAM,EAAE,IAAI,EAAE,MAAM,GAAG,2BAA2B;IAiBhF,eAAe,CAAC,GAAG,EAAE,MAAM,GAAG,OAAO;IAIrC,gBAAgB,CAAC,KAAK,EAAE,MAAM,GAAG,MAAM,EAAE;IAIhD;;OAEG;IACU,4BAA4B,CAAC,WAAW,EAAE,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC;IAK3E;;OAEG;IACU,2BAA2B,CAAC,eAAe,EAAE,MAAM,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAqBhG,oBAAoB,CAAC,aAAa,EAAE,MAAM,GAAG;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAA;KAAE;IAYlG,wBAAwB,CAAC,mBAAmB,EAAE,MAAM,GAAG,MAAM;IAQ7D,kBAAkB,CAAC,aAAa,EAAE,MAAM,GAAG,MAAM;IAQjD,sBAAsB,CAAC,iBAAiB,EAAE,MAAM,GAAG,MAAM;IAQzD,8BAA8B,CAAC,aAAa,EAAE,MAAM,GAAG,MAAM;IAS7D,yBAAyB,CAAC,mBAAmB,EAAE,mBAAmB,GAAG,MAAM;IAOrE,kBAAkB,CAAC,GAAG,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAMhD,mBAAmB,CAAC,IAAI,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAIlD,uBAAuB,CAAC,mBAAmB,EAAE,mBAAmB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,MAAM,CAAC;IAgBtH;;OAEG;IACU,cAAc,CAAC,gBAAgB,EAAE,gBAAgB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAYxF,QAAQ,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAK7C,SAAS,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAK9C,SAAS,CAAC,IAAI,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAWxC,iBAAiB,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAejE;;;;;;OAMG;IACU,qBAAqB,CAC9B,KAAK,EAAE,mBAAmB,EAAE,EAC5B,WAAW,EAAE,QAAQ,EACrB,UAAU,CAAC,EAAE,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,KAAK,IAAI,EACvD,SAAS,GAAE,MAAqC,GACjD,OAAO,CAAC;QAAE,UAAU,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAA;KAAE,CAAC;IAmCnD;;;OAGG;YACW,cAAc;IAmD5B;;;OAGG;YACW,4BAA4B;IA2C1C;;;OAGG;IACH,OAAO,CAAC,4BAA4B;IA2BpC;;;OAGG;IACH,OAAO,CAAC,0BAA0B;IAkBlC,8EAA8E;IAC9E,OAAO,CAAC,aAAa;IAMrB;;;OAGG;YACW,0BAA0B;IAcxC;;;OAGG;YACW,yBAAyB;IAqBvC;;OAEG;YACW,8BAA8B;IAiB5C;;;OAGG;YACW,6BAA6B;IAgC3C,8EAA8E;IAC9E,OAAO,CAAC,kBAAkB;IAU1B,gEAAgE;IAChE,OAAO,CAAC,uBAAuB;IAU/B,2DAA2D;IAC3D,OAAO,CAAC,sBAAsB;IAU9B,uDAAuD;IACvD,OAAO,CAAC,mBAAmB;IAI3B,yEAAyE;IACzE,OAAO,CAAC,kBAAkB;IAQ1B,8EAA8E;IAC9E,OAAO,CAAC,mBAAmB;IAgB3B,iEAAiE;YACnD,gBAAgB;CAsBjC"}
|