@memberjunction/content-autotagging 5.21.0 → 5.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +78 -18
- package/dist/CloudStorage/generic/CloudStorageBase.d.ts +2 -2
- package/dist/CloudStorage/generic/CloudStorageBase.d.ts.map +1 -1
- package/dist/CloudStorage/generic/CloudStorageBase.js +3 -3
- package/dist/CloudStorage/generic/CloudStorageBase.js.map +1 -1
- package/dist/CloudStorage/providers/AutotagAzureBlob.js +1 -1
- package/dist/CloudStorage/providers/AutotagAzureBlob.js.map +1 -1
- package/dist/Core/generic/AutotagBase.d.ts +3 -1
- package/dist/Core/generic/AutotagBase.d.ts.map +1 -1
- package/dist/Core/generic/AutotagBase.js.map +1 -1
- package/dist/Engine/generic/AutotagBaseEngine.d.ts +149 -90
- package/dist/Engine/generic/AutotagBaseEngine.d.ts.map +1 -1
- package/dist/Engine/generic/AutotagBaseEngine.js +701 -397
- package/dist/Engine/generic/AutotagBaseEngine.js.map +1 -1
- package/dist/Engine/generic/content.types.d.ts +2 -1
- package/dist/Engine/generic/content.types.d.ts.map +1 -1
- package/dist/Engine/generic/content.types.js.map +1 -1
- package/dist/Entity/generic/AutotagEntity.d.ts +10 -2
- package/dist/Entity/generic/AutotagEntity.d.ts.map +1 -1
- package/dist/Entity/generic/AutotagEntity.js +34 -31
- package/dist/Entity/generic/AutotagEntity.js.map +1 -1
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.d.ts +2 -2
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.d.ts.map +1 -1
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.js +4 -4
- package/dist/LocalFileSystem/generic/AutotagLocalFileSystem.js.map +1 -1
- package/dist/RSSFeed/generic/AutotagRSSFeed.d.ts +2 -2
- package/dist/RSSFeed/generic/AutotagRSSFeed.d.ts.map +1 -1
- package/dist/RSSFeed/generic/AutotagRSSFeed.js +4 -4
- package/dist/RSSFeed/generic/AutotagRSSFeed.js.map +1 -1
- package/dist/Websites/generic/AutotagWebsite.d.ts +2 -2
- package/dist/Websites/generic/AutotagWebsite.d.ts.map +1 -1
- package/dist/Websites/generic/AutotagWebsite.js +4 -4
- package/dist/Websites/generic/AutotagWebsite.js.map +1 -1
- package/package.json +11 -7
package/README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
# @memberjunction/content-autotagging
|
|
2
2
|
|
|
3
|
-
AI-powered content ingestion and
|
|
3
|
+
AI-powered content ingestion, autotagging, and vectorization engine for MemberJunction. Scans content from multiple sources (local files, websites, RSS feeds, cloud storage), extracts text from documents, uses LLMs to generate weighted tags and metadata attributes, and vectorizes content for semantic search.
|
|
4
4
|
|
|
5
5
|
## Overview
|
|
6
6
|
|
|
7
|
-
The `@memberjunction/content-autotagging` package provides an extensible framework for ingesting content from diverse sources and leveraging AI models to extract meaningful tags, summaries, and metadata. Built on the MemberJunction platform, it helps organizations automatically organize and categorize their content.
|
|
7
|
+
The `@memberjunction/content-autotagging` package provides an extensible framework for ingesting content from diverse sources and leveraging AI models to extract meaningful tags, summaries, and metadata. Built on the MemberJunction platform, it helps organizations automatically organize and categorize their content. The engine uses the managed **"Content Autotagging"** AI prompt via `AIPromptRunner` (rather than direct `BaseLLM` calls), enabling prompt versioning, model routing, and centralized prompt management.
|
|
8
8
|
|
|
9
9
|
```mermaid
|
|
10
10
|
graph TD
|
|
@@ -19,10 +19,14 @@ graph TD
|
|
|
19
19
|
G --> I["Office Parser"]
|
|
20
20
|
G --> J["HTML Parser<br/>(Cheerio)"]
|
|
21
21
|
|
|
22
|
-
A --> K["
|
|
23
|
-
K --> L["Tag Generation"]
|
|
22
|
+
A --> K["AIPromptRunner<br/>(Content Autotagging prompt)"]
|
|
23
|
+
K --> L["Tag Generation<br/>(with weights)"]
|
|
24
24
|
K --> M["Attribute Extraction"]
|
|
25
25
|
|
|
26
|
+
A --> V["Vectorization"]
|
|
27
|
+
V --> W["Embedding Model"]
|
|
28
|
+
V --> X["Vector DB Upsert"]
|
|
29
|
+
|
|
26
30
|
A --> N["Content Items<br/>(Database)"]
|
|
27
31
|
A --> O["Content Item Attributes<br/>(Database)"]
|
|
28
32
|
|
|
@@ -34,10 +38,21 @@ graph TD
|
|
|
34
38
|
style F fill:#2d8659,stroke:#1a5c3a,color:#fff
|
|
35
39
|
style G fill:#b8762f,stroke:#8a5722,color:#fff
|
|
36
40
|
style K fill:#7c5295,stroke:#563a6b,color:#fff
|
|
41
|
+
style V fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
37
42
|
style N fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
38
43
|
style O fill:#2d6a9f,stroke:#1a4971,color:#fff
|
|
39
44
|
```
|
|
40
45
|
|
|
46
|
+
## Key Features
|
|
47
|
+
|
|
48
|
+
- **AIPromptRunner integration**: Uses the managed "Content Autotagging" prompt, enabling prompt versioning and model routing through MJ's prompt management system (no direct `BaseLLM` calls)
|
|
49
|
+
- **Tag weights**: Each generated tag includes a relevance weight (0.0--1.0) indicating how strongly the tag relates to the content
|
|
50
|
+
- **Batch processing**: Configurable batch size (default: 20) with concurrent processing within each batch
|
|
51
|
+
- **Parallel tagging + vectorization**: Tagging and vectorization run in parallel for maximum throughput
|
|
52
|
+
- **Per-source/type embedding model selection**: Cascade resolution for embedding model and vector index -- source override, then content type default, then global fallback (first active vector index)
|
|
53
|
+
- **Real-time progress reporting**: `AutotagProgressCallback` provides per-item progress updates during processing
|
|
54
|
+
- **Graceful provider skip**: Providers skip gracefully when no content sources are configured for their type
|
|
55
|
+
|
|
41
56
|
## Installation
|
|
42
57
|
|
|
43
58
|
```bash
|
|
@@ -51,7 +66,8 @@ sequenceDiagram
|
|
|
51
66
|
participant Source as Content Source
|
|
52
67
|
participant Engine as AutotagBaseEngine
|
|
53
68
|
participant Extract as Text Extractor
|
|
54
|
-
participant
|
|
69
|
+
participant Prompt as AIPromptRunner
|
|
70
|
+
participant Vec as Embedding + VectorDB
|
|
55
71
|
participant DB as Database
|
|
56
72
|
|
|
57
73
|
Source->>Engine: Provide content items
|
|
@@ -59,9 +75,14 @@ sequenceDiagram
|
|
|
59
75
|
Engine->>Extract: Extract text (PDF/Office/HTML)
|
|
60
76
|
Extract-->>Engine: Raw text
|
|
61
77
|
Engine->>Engine: Chunk text for token limits
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
78
|
+
par Tagging
|
|
79
|
+
Engine->>Prompt: Run "Content Autotagging" prompt
|
|
80
|
+
Prompt-->>Engine: Tags (with weights) + Attributes
|
|
81
|
+
and Vectorization
|
|
82
|
+
Engine->>Vec: Embed text + upsert to vector DB
|
|
83
|
+
Vec-->>Engine: Vectorization result
|
|
84
|
+
end
|
|
85
|
+
Engine->>DB: Save ContentItem + Tags + Attributes
|
|
65
86
|
Engine->>DB: Create ProcessRun record
|
|
66
87
|
```
|
|
67
88
|
|
|
@@ -74,7 +95,7 @@ sequenceDiagram
|
|
|
74
95
|
| RSS Feeds | `AutotagRSSFeed` | Parses RSS/Atom feeds for articles |
|
|
75
96
|
| Azure Blob | `AutotagAzureBlob` | Processes files from Azure Blob Storage |
|
|
76
97
|
|
|
77
|
-
All sources extend `AutotagBase`, which provides the common interface for content discovery and ingestion.
|
|
98
|
+
All sources extend `AutotagBase`, which provides the common interface for content discovery and ingestion. Each source's `Autotag()` method accepts an optional `AutotagProgressCallback` for real-time progress reporting. Sources skip gracefully when no content sources of their type are configured in the database.
|
|
78
99
|
|
|
79
100
|
## Supported File Formats
|
|
80
101
|
|
|
@@ -85,6 +106,32 @@ All sources extend `AutotagBase`, which provides the common interface for conten
|
|
|
85
106
|
| HTML/Web Pages | `cheerio` | .html, .htm |
|
|
86
107
|
| Plain Text | Native | .txt, .md, .csv |
|
|
87
108
|
|
|
109
|
+
## Tag Weights
|
|
110
|
+
|
|
111
|
+
The LLM prompt returns tags with relevance weights between 0.0 and 1.0 indicating how strongly each tag relates to the content. Both old-style (plain string array) and new-style (object with `tag` + `weight`) responses are supported:
|
|
112
|
+
|
|
113
|
+
```json
|
|
114
|
+
// New format (preferred) — returned by the "Content Autotagging" prompt
|
|
115
|
+
[
|
|
116
|
+
{ "tag": "machine learning", "weight": 0.95 },
|
|
117
|
+
{ "tag": "neural networks", "weight": 0.82 },
|
|
118
|
+
{ "tag": "data science", "weight": 0.70 }
|
|
119
|
+
]
|
|
120
|
+
|
|
121
|
+
// Legacy format — auto-normalized with weight 1.0
|
|
122
|
+
["machine learning", "neural networks", "data science"]
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Embedding Model and Vector Index Resolution
|
|
126
|
+
|
|
127
|
+
The engine resolves the embedding model and vector index for each content item using a three-level cascade:
|
|
128
|
+
|
|
129
|
+
1. **Content Source override**: If the source has `EmbeddingModelID` and `VectorIndexID` set, those are used
|
|
130
|
+
2. **Content Type default**: If the source has no override, the content type's defaults are used
|
|
131
|
+
3. **Global fallback**: If neither source nor type specifies, the first active vector index in the system is used
|
|
132
|
+
|
|
133
|
+
Items sharing the same (embeddingModel, vectorIndex) pair are grouped and processed together for efficient batching.
|
|
134
|
+
|
|
88
135
|
## Usage
|
|
89
136
|
|
|
90
137
|
### RSS Feed Processing
|
|
@@ -93,7 +140,9 @@ All sources extend `AutotagBase`, which provides the common interface for conten
|
|
|
93
140
|
import { AutotagRSSFeed } from '@memberjunction/content-autotagging';
|
|
94
141
|
|
|
95
142
|
const rssTagger = new AutotagRSSFeed();
|
|
96
|
-
await rssTagger.Autotag(contextUser)
|
|
143
|
+
await rssTagger.Autotag(contextUser, (processed, total, currentItem) => {
|
|
144
|
+
console.log(`[${processed}/${total}] Processing: ${currentItem}`);
|
|
145
|
+
});
|
|
97
146
|
```
|
|
98
147
|
|
|
99
148
|
### Website Content Processing
|
|
@@ -133,13 +182,19 @@ await blobTagger.Autotag(contextUser);
|
|
|
133
182
|
import { AutotagBaseEngine } from '@memberjunction/content-autotagging';
|
|
134
183
|
|
|
135
184
|
const engine = AutotagBaseEngine.Instance;
|
|
136
|
-
|
|
185
|
+
|
|
186
|
+
// Process content items with custom batch size
|
|
187
|
+
await engine.ExtractTextAndProcessWithLLM(contentItems, contextUser, batchSize);
|
|
188
|
+
|
|
189
|
+
// Vectorize content items (runs in parallel with tagging)
|
|
190
|
+
const result = await engine.VectorizeContentItems(contentItems, tagMap, contextUser, batchSize);
|
|
191
|
+
console.log(`Vectorized: ${result.vectorized}, Skipped: ${result.skipped}`);
|
|
137
192
|
```
|
|
138
193
|
|
|
139
194
|
## Creating a Custom Content Source
|
|
140
195
|
|
|
141
196
|
```typescript
|
|
142
|
-
import { AutotagBase } from '@memberjunction/content-autotagging';
|
|
197
|
+
import { AutotagBase, AutotagProgressCallback } from '@memberjunction/content-autotagging';
|
|
143
198
|
import { RegisterClass } from '@memberjunction/global';
|
|
144
199
|
|
|
145
200
|
@RegisterClass(AutotagBase, 'AutotagCustomSource')
|
|
@@ -149,13 +204,14 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
149
204
|
return contentItems;
|
|
150
205
|
}
|
|
151
206
|
|
|
152
|
-
public async Autotag(contextUser) {
|
|
207
|
+
public async Autotag(contextUser, onProgress?: AutotagProgressCallback) {
|
|
153
208
|
const contentSourceTypeID = await this.engine.setSubclassContentSourceType(
|
|
154
209
|
'Custom Source', contextUser
|
|
155
210
|
);
|
|
156
211
|
const contentSources = await this.engine.getAllContentSources(
|
|
157
212
|
contextUser, contentSourceTypeID
|
|
158
213
|
);
|
|
214
|
+
if (contentSources.length === 0) return; // Skip gracefully
|
|
159
215
|
const contentItems = await this.SetContentItemsToProcess(contentSources);
|
|
160
216
|
await this.engine.ExtractTextAndProcessWithLLM(contentItems, contextUser);
|
|
161
217
|
}
|
|
@@ -166,12 +222,12 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
166
222
|
|
|
167
223
|
| Entity | Purpose |
|
|
168
224
|
|--------|---------|
|
|
169
|
-
| Content Sources | Configuration for each content source |
|
|
225
|
+
| Content Sources | Configuration for each content source (with optional EmbeddingModelID/VectorIndexID overrides) |
|
|
170
226
|
| Content Items | Individual pieces of content with extracted text |
|
|
171
|
-
| Content Item Tags | AI-generated tags |
|
|
227
|
+
| Content Item Tags | AI-generated tags with relevance weights (0.0--1.0) |
|
|
172
228
|
| Content Item Attributes | Additional extracted metadata |
|
|
173
229
|
| Content Process Runs | Processing history and audit trail |
|
|
174
|
-
| Content Types | Content categorization definitions |
|
|
230
|
+
| Content Types | Content categorization definitions (with default EmbeddingModelID/VectorIndexID) |
|
|
175
231
|
| Content Source Types | Source type definitions |
|
|
176
232
|
| Content File Types | Supported file format definitions |
|
|
177
233
|
|
|
@@ -182,8 +238,12 @@ export class AutotagCustomSource extends AutotagBase {
|
|
|
182
238
|
| `@memberjunction/core` | Entity system and metadata |
|
|
183
239
|
| `@memberjunction/global` | Class registration |
|
|
184
240
|
| `@memberjunction/core-entities` | Content entity types |
|
|
185
|
-
| `@memberjunction/ai` |
|
|
186
|
-
| `@memberjunction/aiengine` | AI Engine
|
|
241
|
+
| `@memberjunction/ai` | Embedding model integration |
|
|
242
|
+
| `@memberjunction/aiengine` | AI Engine for prompt cache access |
|
|
243
|
+
| `@memberjunction/ai-prompts` | AIPromptRunner for managed prompt execution |
|
|
244
|
+
| `@memberjunction/ai-core-plus` | AIPromptParams types |
|
|
245
|
+
| `@memberjunction/ai-vectors` | TextChunker for content chunking |
|
|
246
|
+
| `@memberjunction/ai-vectordb` | VectorDBBase for vector storage |
|
|
187
247
|
| `pdf-parse` | PDF text extraction |
|
|
188
248
|
| `officeparser` | Office document parsing |
|
|
189
249
|
| `cheerio` | HTML parsing |
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { AutotagBase } from "../../Core/index.js";
|
|
1
|
+
import { AutotagBase, AutotagProgressCallback } from "../../Core/index.js";
|
|
2
2
|
import { AutotagBaseEngine } from "../../Engine/index.js";
|
|
3
3
|
import { ContentSourceParams } from "../../Engine/index.js";
|
|
4
4
|
import { UserInfo } from "@memberjunction/core";
|
|
@@ -22,7 +22,7 @@ export declare abstract class CloudStorageBase extends AutotagBase {
|
|
|
22
22
|
* @returns - An array of content source items that have been modified or added after the most recent process run for that content source
|
|
23
23
|
*/
|
|
24
24
|
abstract SetNewAndModifiedContentItems(contentSourceParams: ContentSourceParams, lastRunDate: Date, contextUser: UserInfo): Promise<MJContentItemEntity[]>;
|
|
25
|
-
Autotag(contextUser: UserInfo): Promise<void>;
|
|
25
|
+
Autotag(contextUser: UserInfo, onProgress?: AutotagProgressCallback): Promise<void>;
|
|
26
26
|
SetContentItemsToProcess(contentSources: MJContentSourceEntity[]): Promise<MJContentItemEntity[]>;
|
|
27
27
|
}
|
|
28
28
|
//# sourceMappingURL=CloudStorageBase.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"CloudStorageBase.d.ts","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,MAAM,YAAY,CAAC;
|
|
1
|
+
{"version":3,"file":"CloudStorageBase.d.ts","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAAE,uBAAuB,EAAE,MAAM,YAAY,CAAC;AAClE,OAAO,EAAE,iBAAiB,EAAE,MAAM,cAAc,CAAC;AACjD,OAAO,EAAE,mBAAmB,EAAE,MAAM,cAAc,CAAC;AACnD,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAI3F,8BAAsB,gBAAiB,SAAQ,WAAW;IACtD,SAAS,CAAC,WAAW,EAAE,QAAQ,CAAC;IAChC,SAAS,CAAC,MAAM,EAAE,iBAAiB,CAAC;IACpC,SAAS,CAAC,mBAAmB,EAAE,MAAM,CAAA;;IAOrC;;MAEE;aACc,YAAY,IAAI,OAAO,CAAC,IAAI,CAAC;IAE7C;;;;;;;;MAQE;aACc,6BAA6B,CAAC,mBAAmB,EAAE,mBAAmB,EAAE,WAAW,EAAE,IAAI,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;IAEpJ,OAAO,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,CAAC,EAAE,uBAAuB,GAAG,OAAO,CAAC,IAAI,CAAC;IAQnF,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;CAyBjH"}
|
|
@@ -7,12 +7,12 @@ export class CloudStorageBase extends AutotagBase {
|
|
|
7
7
|
super();
|
|
8
8
|
this.engine = AutotagBaseEngine.Instance;
|
|
9
9
|
}
|
|
10
|
-
async Autotag(contextUser) {
|
|
10
|
+
async Autotag(contextUser, onProgress) {
|
|
11
11
|
this.contextUser = contextUser;
|
|
12
|
-
this.contentSourceTypeID =
|
|
12
|
+
this.contentSourceTypeID = this.engine.SetSubclassContentSourceType('Cloud Storage');
|
|
13
13
|
const contentSources = await this.engine.getAllContentSources(this.contextUser, this.contentSourceTypeID) || [];
|
|
14
14
|
const contentItemsToProcess = await this.SetContentItemsToProcess(contentSources);
|
|
15
|
-
await this.engine.ExtractTextAndProcessWithLLM(contentItemsToProcess, this.contextUser);
|
|
15
|
+
await this.engine.ExtractTextAndProcessWithLLM(contentItemsToProcess, this.contextUser, undefined, onProgress);
|
|
16
16
|
}
|
|
17
17
|
async SetContentItemsToProcess(contentSources) {
|
|
18
18
|
const contentItemsToProcess = [];
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"CloudStorageBase.js","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,
|
|
1
|
+
{"version":3,"file":"CloudStorageBase.js","sourceRoot":"","sources":["../../../src/CloudStorage/generic/CloudStorageBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,WAAW,EAA2B,MAAM,YAAY,CAAC;AAClE,OAAO,EAAE,iBAAiB,EAAE,MAAM,cAAc,CAAC;AAIjD,OAAO,MAAM,MAAM,QAAQ,CAAC;AAC5B,MAAM,CAAC,MAAM,CAAC,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAA;AAE9B,MAAM,OAAgB,gBAAiB,SAAQ,WAAW;IAKtD;QACI,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,MAAM,GAAG,iBAAiB,CAAC,QAAQ,CAAC;IAC7C,CAAC;IAkBM,KAAK,CAAC,OAAO,CAAC,WAAqB,EAAE,UAAoC;QAC5E,IAAI,CAAC,WAAW,GAAG,WAAW,CAAC;QAC/B,IAAI,CAAC,mBAAmB,GAAG,IAAI,CAAC,MAAM,CAAC,4BAA4B,CAAC,eAAe,CAAC,CAAC;QACrF,MAAM,cAAc,GAA4B,MAAM,IAAI,CAAC,MAAM,CAAC,oBAAoB,CAAC,IAAI,CAAC,WAAW,EAAE,IAAI,CAAC,mBAAmB,CAAC,IAAI,EAAE,CAAC;QACzI,MAAM,qBAAqB,GAA0B,MAAM,IAAI,CAAC,wBAAwB,CAAC,cAAc,CAAC,CAAA;QACxG,MAAM,IAAI,CAAC,MAAM,CAAC,4BAA4B,CAAC,qBAAqB,EAAE,IAAI,CAAC,WAAW,EAAE,SAAS,EAAE,UAAU,CAAC,CAAC;IACnH,CAAC;IAEM,KAAK,CAAC,wBAAwB,CAAC,cAAuC;QACzE,MAAM,qBAAqB,GAA0B,EAAE,CAAA;QAEvD,KAAK,MAAM,aAAa,IAAI,cAAc,EAAE,CAAC;YACzC,MAAM,IAAI,CAAC,YAAY,EAAE,CAAC;YAE1B,MAAM,mBAAmB,GAAwB;gBAC7C,eAAe,EAAE,aAAa,CAAC,EAAE;gBACjC,IAAI,EAAE,aAAa,CAAC,IAAI;gBACxB,aAAa,EAAE,aAAa,CAAC,aAAa;gBAC1C,mBAAmB,EAAE,aAAa,CAAC,mBAAmB;gBACtD,iBAAiB,EAAE,aAAa,CAAC,iBAAiB;gBAClD,GAAG,EAAE,aAAa,CAAC,GAAG;aACzB,CAAA;YAED,MAAM,WAAW,GAAS,MAAM,IAAI,CAAC,MAAM,CAAC,2BAA2B,CAAC,mBAAmB,CAAC,eAAe,EAAE,IAAI,CAAC,WAAW,CAAC,CAAC;YAE/H,IAAI,WAAW,EAAE,CAAC;gBACd,MAAM,YAAY,GAAG,MAAM,IAAI,CAAC,6BAA6B,CAAC,mBAAmB,EAAE,WAAW,EAAE,IAAI,CAAC,WAAW,CAAC,CAAC;gBAClH,qBAAqB,CAAC,IAAI,CAAC,GAAG,YAAY,CAAC,CAAC;YAChD,CAAC;QACL,CAAC;QAED,OAAO,qBAAqB,CAAC;IACjC,CAAC;CACJ"}
|
|
@@ -38,7 +38,7 @@ export class AutotagAzureBlob extends CloudStorageBase {
|
|
|
38
38
|
const text = await this.extractText(blob.name);
|
|
39
39
|
contentItem.ContentSourceID = contentSourceParams.contentSourceID;
|
|
40
40
|
contentItem.Name = blob.name;
|
|
41
|
-
contentItem.Description =
|
|
41
|
+
contentItem.Description = this.engine.GetContentItemDescription(contentSourceParams);
|
|
42
42
|
contentItem.URL = filePath;
|
|
43
43
|
contentItem.ContentTypeID = contentSourceParams.ContentTypeID;
|
|
44
44
|
contentItem.ContentSourceTypeID = contentSourceParams.ContentSourceTypeID;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagAzureBlob.js","sourceRoot":"","sources":["../../../src/CloudStorage/providers/AutotagAzureBlob.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,gBAAgB,EAAE,MAAM,6BAA6B,CAAC;AAE/D,OAAO,EAAE,iBAAiB,EAAmB,MAAM,qBAAqB,CAAC;AACzE,OAAO,MAAM,MAAM,QAAQ,CAAC;AAE5B,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,IAAI,MAAM,MAAM,CAAC;AAExB,MAAM,CAAC,MAAM,CAAC,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAA;AAE9B,MAAM,OAAO,gBAAiB,SAAQ,gBAAgB;IAMlD,YAAY,gBAAwB,EAAE,aAAqB;QACvD,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,gBAAgB,GAAG,gBAAgB,CAAA;QACxC,IAAI,CAAC,aAAa,GAAG,aAAa,CAAA;IACtC,CAAC;IAGD;;;MAGE;IACK,KAAK,CAAC,YAAY;QACrB,IAAI,CAAC;YACD,IAAI,CAAC,iBAAiB,GAAG,iBAAiB,CAAC,oBAAoB,CAAC,IAAI,CAAC,gBAAgB,CAAC,CAAC;YACvF,IAAI,KAAK,EAAE,MAAM,SAAS,IAAI,IAAI,CAAC,iBAAiB,CAAC,cAAc,EAAE,EAAE,CAAC;gBACpE,OAAO,CAAC,GAAG,CAAC,cAAc,SAAS,CAAC,IAAI,EAAE,CAAC,CAAC;YAChD,CAAC;YAED,IAAI,CAAC,eAAe,GAAG,IAAI,CAAC,iBAAiB,CAAC,kBAAkB,CAAC,IAAI,CAAC,aAAa,CAAC,CAAC;QACzF,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACb,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAA;YACpB,MAAM,IAAI,KAAK,CAAC,4CAA4C,CAAC,CAAA;QACjE,CAAC;IACL,CAAC;IAEM,KAAK,CAAC,6BAA6B,CAAC,mBAAwC,EAAE,WAAiB,EAAE,WAAqB,EAAE,MAAM,GAAC,EAAE;QACpI,MAAM,qBAAqB,GAA0B,EAAE,CAAA;QAEvD,IAAI,KAAK,EAAE,MAAM,IAAI,IAAI,IAAI,CAAC,eAAe,CAAC,aAAa,EAAE,EAAE,CAAC;YAC5D,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,aAAa,EAAE,IAAI,CAAC,IAAI,CAAC,CAAA;YACzD,IAAI,IAAI,CAAC,UAAU,CAAC,SAAS,IAAI,IAAI,CAAC,UAAU,CAAC,SAAS,GAAG,WAAW,EAAE,CAAC;gBACvE,4DAA4D;gBAC5D,MAAM,EAAE,GAAG,IAAI,QAAQ,EAAE,CAAA;gBACzB,MAAM,WAAW,GAAG,MAAM,EAAE,CAAC,eAAe,CAAsB,mBAAmB,EAAE,WAAW,CAAC,CAAA;gBACnG,MAAM,IAAI,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,CAAA;gBAC9C,WAAW,CAAC,eAAe,GAAG,mBAAmB,CAAC,eAAe,CAAA;gBACjE,WAAW,CAAC,IAAI,GAAG,IAAI,CAAC,IAAI,CAAA;gBAC5B,WAAW,CAAC,WAAW,GAAG,
|
|
1
|
+
{"version":3,"file":"AutotagAzureBlob.js","sourceRoot":"","sources":["../../../src/CloudStorage/providers/AutotagAzureBlob.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,gBAAgB,EAAE,MAAM,6BAA6B,CAAC;AAE/D,OAAO,EAAE,iBAAiB,EAAmB,MAAM,qBAAqB,CAAC;AACzE,OAAO,MAAM,MAAM,QAAQ,CAAC;AAE5B,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,IAAI,MAAM,MAAM,CAAC;AAExB,MAAM,CAAC,MAAM,CAAC,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAA;AAE9B,MAAM,OAAO,gBAAiB,SAAQ,gBAAgB;IAMlD,YAAY,gBAAwB,EAAE,aAAqB;QACvD,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,gBAAgB,GAAG,gBAAgB,CAAA;QACxC,IAAI,CAAC,aAAa,GAAG,aAAa,CAAA;IACtC,CAAC;IAGD;;;MAGE;IACK,KAAK,CAAC,YAAY;QACrB,IAAI,CAAC;YACD,IAAI,CAAC,iBAAiB,GAAG,iBAAiB,CAAC,oBAAoB,CAAC,IAAI,CAAC,gBAAgB,CAAC,CAAC;YACvF,IAAI,KAAK,EAAE,MAAM,SAAS,IAAI,IAAI,CAAC,iBAAiB,CAAC,cAAc,EAAE,EAAE,CAAC;gBACpE,OAAO,CAAC,GAAG,CAAC,cAAc,SAAS,CAAC,IAAI,EAAE,CAAC,CAAC;YAChD,CAAC;YAED,IAAI,CAAC,eAAe,GAAG,IAAI,CAAC,iBAAiB,CAAC,kBAAkB,CAAC,IAAI,CAAC,aAAa,CAAC,CAAC;QACzF,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACb,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAA;YACpB,MAAM,IAAI,KAAK,CAAC,4CAA4C,CAAC,CAAA;QACjE,CAAC;IACL,CAAC;IAEM,KAAK,CAAC,6BAA6B,CAAC,mBAAwC,EAAE,WAAiB,EAAE,WAAqB,EAAE,MAAM,GAAC,EAAE;QACpI,MAAM,qBAAqB,GAA0B,EAAE,CAAA;QAEvD,IAAI,KAAK,EAAE,MAAM,IAAI,IAAI,IAAI,CAAC,eAAe,CAAC,aAAa,EAAE,EAAE,CAAC;YAC5D,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,aAAa,EAAE,IAAI,CAAC,IAAI,CAAC,CAAA;YACzD,IAAI,IAAI,CAAC,UAAU,CAAC,SAAS,IAAI,IAAI,CAAC,UAAU,CAAC,SAAS,GAAG,WAAW,EAAE,CAAC;gBACvE,4DAA4D;gBAC5D,MAAM,EAAE,GAAG,IAAI,QAAQ,EAAE,CAAA;gBACzB,MAAM,WAAW,GAAG,MAAM,EAAE,CAAC,eAAe,CAAsB,mBAAmB,EAAE,WAAW,CAAC,CAAA;gBACnG,MAAM,IAAI,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,CAAA;gBAC9C,WAAW,CAAC,eAAe,GAAG,mBAAmB,CAAC,eAAe,CAAA;gBACjE,WAAW,CAAC,IAAI,GAAG,IAAI,CAAC,IAAI,CAAA;gBAC5B,WAAW,CAAC,WAAW,GAAG,IAAI,CAAC,MAAM,CAAC,yBAAyB,CAAC,mBAAmB,CAAC,CAAA;gBACpF,WAAW,CAAC,GAAG,GAAG,QAAQ,CAAA;gBAC1B,WAAW,CAAC,aAAa,GAAG,mBAAmB,CAAC,aAAa,CAAA;gBAC7D,WAAW,CAAC,mBAAmB,GAAI,mBAAmB,CAAC,mBAAmB,CAAA;gBAC1E,WAAW,CAAC,iBAAiB,GAAG,mBAAmB,CAAC,iBAAiB,CAAA;gBACrE,WAAW,CAAC,QAAQ,GAAG,MAAM,IAAI,CAAC,MAAM,CAAC,mBAAmB,CAAC,IAAI,CAAC,CAAA;gBAClE,WAAW,CAAC,IAAI,GAAG,IAAI,CAAA;gBAEvB,MAAM,WAAW,CAAC,IAAI,EAAE,CAAA;gBACxB,qBAAqB,CAAC,IAAI,CAAC,WAAW,CAAC,CAAA;YAC3C,CAAC;iBACI,IAAI,IAAI,CAAC,UAAU,CAAC,YAAY,IAAI,IAAI,CAAC,UAAU,CAAC,YAAY,GAAG,WAAW,EAAE,CAAC;gBAClF,8DAA8D;gBAC9D,MAAM,EAAE,GAAG,IAAI,QAAQ,EAAE,CAAA;gBACzB,MAAM,WAAW,GAAG,MAAM,EAAE,CAAC,eAAe,CAAsB,mBAAmB,EAAE,WAAW,CAAC,CAAA;gBACnG,MAAM,aAAa,GAAG,MAAM,IAAI,CAAC,MAAM,CAAC,uBAAuB,CAAC,mBAAmB,EAAE,WAAW,CAAC,CAAA;gBACjG,MAAM,WAAW,CAAC,IAAI,CAAC,aAAa,CAAC,CAAA;gBACrC,MAAM,IAAI,GAAG,MAAM,IAAI,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,CAAA;gBAC9C,WAAW,CAAC,IAAI,GAAG,IAAI,CAAA;gBACvB,WAAW,CAAC,QAAQ,GAAG,MAAM,IAAI,CAAC,MAAM,CAAC,mBAAmB,CAAC,IAAI,CAAC,CAAA;gBAClE,WAAW,CAAC,IAAI,EAAE,CAAA;gBAClB,qBAAqB,CAAC,IAAI,CAAC,WAAW,CAAC,CAAA;YAC3C,CAAC;QACL,CAAC;QAED,OAAO,qBAAqB,CAAA;IAChC,CAAC;IAEM,KAAK,CAAC,WAAW,CAAC,IAAY;QACjC,MAAM,eAAe,GAAG,IAAI,CAAC,eAAe,CAAC,kBAAkB,CAAC,IAAI,CAAC,CAAA;QACrE,MAAM,yBAAyB,GAAG,MAAM,eAAe,CAAC,QAAQ,EAAE,CAAA;QAClE,MAAM,QAAQ,GAAW,MAAM,IAAI,CAAC,cAAc,CAAC,yBAAyB,CAAC,kBAAkB,CAAC,CAAA;QAChG,MAAM,IAAI,GAAW,MAAM,IAAI,CAAC,MAAM,CAAC,QAAQ,CAAC,QAAQ,CAAC,CAAA;QACzD,OAAO,IAAI,CAAA;IACf,CAAC;IAEM,KAAK,CAAC,cAAc,CAAC,cAAqC;QAC7D,OAAO,IAAI,OAAO,CAAC,CAAC,OAAO,EAAE,MAAM,EAAE,EAAE;YACnC,MAAM,MAAM,GAAa,EAAE,CAAC;YAC5B,cAAc,CAAC,EAAE,CAAC,MAAM,EAAE,CAAC,IAAI,EAAE,EAAE;gBAC/B,MAAM,CAAC,IAAI,CAAC,IAAI,YAAY,MAAM,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;YACnE,CAAC,CAAC,CAAC;YACH,cAAc,CAAC,EAAE,CAAC,KAAK,EAAE,GAAG,EAAE;gBAC1B,OAAO,CAAC,MAAM,CAAC,MAAM,CAAC,MAAiC,CAAC,CAAC,CAAC;YAC9D,CAAC,CAAC,CAAC;YACH,cAAc,CAAC,EAAE,CAAC,OAAO,EAAE,MAAM,CAAC,CAAC;QACvC,CAAC,CAAC,CAAC;IACP,CAAC;CACJ"}
|
|
@@ -1,7 +1,9 @@
|
|
|
1
1
|
import { UserInfo } from '@memberjunction/core';
|
|
2
2
|
import { MJContentSourceEntity, MJContentItemEntity } from '@memberjunction/core-entities';
|
|
3
|
+
/** Progress callback for per-item updates during autotagging */
|
|
4
|
+
export type AutotagProgressCallback = (processed: number, total: number, currentItem?: string) => void;
|
|
3
5
|
export declare abstract class AutotagBase {
|
|
4
6
|
abstract SetContentItemsToProcess(contentSources: MJContentSourceEntity[]): Promise<MJContentItemEntity[]>;
|
|
5
|
-
abstract Autotag(contextUser: UserInfo): Promise<void>;
|
|
7
|
+
abstract Autotag(contextUser: UserInfo, onProgress?: AutotagProgressCallback): Promise<void>;
|
|
6
8
|
}
|
|
7
9
|
//# sourceMappingURL=AutotagBase.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBase.d.ts","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAE3F,8BAAsB,WAAW;aACb,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;aACjG,OAAO,CAAC,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;
|
|
1
|
+
{"version":3,"file":"AutotagBase.d.ts","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,sBAAsB,CAAC;AAChD,OAAO,EAAE,qBAAqB,EAAE,mBAAmB,EAAE,MAAM,+BAA+B,CAAC;AAE3F,gEAAgE;AAChE,MAAM,MAAM,uBAAuB,GAAG,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,WAAW,CAAC,EAAE,MAAM,KAAK,IAAI,CAAC;AAEvG,8BAAsB,WAAW;aACb,wBAAwB,CAAC,cAAc,EAAE,qBAAqB,EAAE,GAAG,OAAO,CAAC,mBAAmB,EAAE,CAAC;aACjG,OAAO,CAAC,WAAW,EAAE,QAAQ,EAAE,UAAU,CAAC,EAAE,uBAAuB,GAAG,OAAO,CAAC,IAAI,CAAC;CACtG"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBase.js","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"AutotagBase.js","sourceRoot":"","sources":["../../../src/Core/generic/AutotagBase.ts"],"names":[],"mappings":"AAMA,MAAM,OAAgB,WAAW;CAGhC"}
|
|
@@ -1,131 +1,190 @@
|
|
|
1
|
-
import { UserInfo } from '@memberjunction/core';
|
|
2
|
-
import { MJContentSourceEntity, MJContentItemEntity } from '@memberjunction/core-entities';
|
|
3
|
-
import { ContentSourceParams, ContentSourceTypeParams } from './content.types.js';
|
|
1
|
+
import { BaseEngine, IMetadataProvider, UserInfo } from '@memberjunction/core';
|
|
2
|
+
import { MJContentSourceEntity, MJContentItemEntity, MJContentFileTypeEntity, MJContentTypeEntity, MJContentSourceTypeEntity, MJContentTypeAttributeEntity, MJContentSourceTypeParamEntity } from '@memberjunction/core-entities';
|
|
3
|
+
import { ContentSourceParams, ContentSourceTypeParams, ContentSourceTypeParamValue } from './content.types.js';
|
|
4
4
|
import { ProcessRunParams, JsonObject, ContentItemProcessParams } from './process.types.js';
|
|
5
|
-
import {
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
5
|
+
import type { MJAIPromptEntityExtended } from '@memberjunction/ai-core-plus';
|
|
6
|
+
/**
|
|
7
|
+
* Core engine for content autotagging. Extends BaseEngine to cache content metadata
|
|
8
|
+
* (types, source types, file types, attributes) at startup. Uses AIEngine via composition
|
|
9
|
+
* for AI model access, then delegates to LLM for text analysis and tagging.
|
|
10
|
+
*/
|
|
11
|
+
export declare class AutotagBaseEngine extends BaseEngine<AutotagBaseEngine> {
|
|
9
12
|
static get Instance(): AutotagBaseEngine;
|
|
13
|
+
private _ContentTypes;
|
|
14
|
+
private _ContentSourceTypes;
|
|
15
|
+
private _ContentFileTypes;
|
|
16
|
+
private _ContentTypeAttributes;
|
|
17
|
+
private _ContentSourceTypeParams;
|
|
18
|
+
/** All content types, cached at startup */
|
|
19
|
+
get ContentTypes(): MJContentTypeEntity[];
|
|
20
|
+
/** All content source types, cached at startup */
|
|
21
|
+
get ContentSourceTypes(): MJContentSourceTypeEntity[];
|
|
22
|
+
/** All content file types, cached at startup */
|
|
23
|
+
get ContentFileTypes(): MJContentFileTypeEntity[];
|
|
24
|
+
/** All content type attributes, cached at startup */
|
|
25
|
+
get ContentTypeAttributes(): MJContentTypeAttributeEntity[];
|
|
26
|
+
/** All content source type params, cached at startup */
|
|
27
|
+
get ContentSourceTypeParams(): MJContentSourceTypeParamEntity[];
|
|
28
|
+
Config(forceRefresh?: boolean, contextUser?: UserInfo, provider?: IMetadataProvider): Promise<unknown>;
|
|
10
29
|
/**
|
|
11
|
-
* Given a list of content items, extract the text from each
|
|
12
|
-
*
|
|
13
|
-
* @returns
|
|
30
|
+
* Given a list of content items, extract the text from each and process with LLM for tagging.
|
|
31
|
+
* Items are processed in configurable batches with controlled concurrency within each batch.
|
|
14
32
|
*/
|
|
15
|
-
ExtractTextAndProcessWithLLM(contentItems: MJContentItemEntity[], contextUser: UserInfo): Promise<void>;
|
|
33
|
+
ExtractTextAndProcessWithLLM(contentItems: MJContentItemEntity[], contextUser: UserInfo, batchSize?: number, onProgress?: (processed: number, total: number, currentItem?: string) => void): Promise<void>;
|
|
16
34
|
/**
|
|
17
|
-
*
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
35
|
+
* Builds processing parameters for a single content item
|
|
36
|
+
*/
|
|
37
|
+
private buildProcessingParams;
|
|
38
|
+
/**
|
|
39
|
+
* Process a content item's text with the LLM and save results.
|
|
21
40
|
*/
|
|
22
41
|
ProcessContentItemText(params: ContentItemProcessParams, contextUser: UserInfo): Promise<void>;
|
|
42
|
+
/**
|
|
43
|
+
* Resolves the "Content Autotagging" prompt from the AIEngine cache.
|
|
44
|
+
* Throws if the prompt is not found or not active.
|
|
45
|
+
*/
|
|
46
|
+
private getAutotagPrompt;
|
|
47
|
+
/**
|
|
48
|
+
* Builds template data for the autotagging prompt from processing params and chunk context.
|
|
49
|
+
*/
|
|
50
|
+
private buildPromptData;
|
|
23
51
|
promptAndRetrieveResultsFromLLM(params: ContentItemProcessParams, contextUser: UserInfo): Promise<JsonObject>;
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
52
|
+
/**
|
|
53
|
+
* Resolves the input token limit for chunking. Uses the model specified by modelID if available,
|
|
54
|
+
* otherwise falls back to a conservative default.
|
|
55
|
+
*/
|
|
56
|
+
private resolveTokenLimit;
|
|
57
|
+
/**
|
|
58
|
+
* Processes a single text chunk using AIPromptRunner and merges results.
|
|
59
|
+
* Uses the prompt's configured model by default. If ContentType.AIModelID is set,
|
|
60
|
+
* it is passed as a runtime model override via AIPromptParams.override.
|
|
61
|
+
*/
|
|
62
|
+
processChunkWithPromptRunner(prompt: MJAIPromptEntityExtended, params: ContentItemProcessParams, chunk: string, LLMResults: JsonObject, contextUser: UserInfo): Promise<JsonObject>;
|
|
29
63
|
saveLLMResults(LLMResults: JsonObject, contextUser: UserInfo): Promise<void>;
|
|
30
64
|
deleteInvalidContentItem(contentItemID: string, contextUser: UserInfo): Promise<void>;
|
|
65
|
+
/**
|
|
66
|
+
* Chunks text using the shared TextChunker utility for token-aware splitting.
|
|
67
|
+
* Falls back to simple character-based splitting when TextChunker is not available.
|
|
68
|
+
*/
|
|
31
69
|
chunkExtractedText(text: string, tokenLimit: number): string[];
|
|
32
70
|
/**
|
|
33
|
-
*
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
*
|
|
38
|
-
*
|
|
71
|
+
* Simple character-based chunking as fallback
|
|
72
|
+
*/
|
|
73
|
+
private fallbackChunkText;
|
|
74
|
+
/**
|
|
75
|
+
* Saves keyword tags from LLM results as Content Item Tags.
|
|
76
|
+
* Uses batched saves for better performance.
|
|
39
77
|
*/
|
|
40
78
|
saveContentItemTags(contentItemID: string, LLMResults: JsonObject, contextUser: UserInfo): Promise<void>;
|
|
79
|
+
/**
|
|
80
|
+
* Saves LLM-extracted attributes to the database.
|
|
81
|
+
* Updates content item name/description, then creates attribute records for other fields.
|
|
82
|
+
*/
|
|
41
83
|
saveResultsToContentItemAttribute(LLMResults: JsonObject, contextUser: UserInfo): Promise<void>;
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
* @returns A list of content sources
|
|
46
|
-
*/
|
|
84
|
+
/**
|
|
85
|
+
* Retrieves all content sources for a given content source type.
|
|
86
|
+
*/
|
|
47
87
|
getAllContentSources(contextUser: UserInfo, contentSourceTypeID: string): Promise<MJContentSourceEntity[]>;
|
|
48
|
-
|
|
49
|
-
getContentSourceParams(contentSource: MJContentSourceEntity, contextUser: UserInfo): Promise<
|
|
50
|
-
|
|
51
|
-
castValueAsCorrectType(value: string, type: string):
|
|
52
|
-
stringToBoolean(
|
|
88
|
+
SetSubclassContentSourceType(subclass: string): string;
|
|
89
|
+
getContentSourceParams(contentSource: MJContentSourceEntity, contextUser: UserInfo): Promise<Map<string, ContentSourceTypeParamValue>>;
|
|
90
|
+
GetDefaultContentSourceTypeParams(contentSourceTypeParamID: string): ContentSourceTypeParams;
|
|
91
|
+
castValueAsCorrectType(value: string, type: string): ContentSourceTypeParamValue;
|
|
92
|
+
stringToBoolean(str: string): boolean;
|
|
53
93
|
parseStringArray(value: string): string[];
|
|
54
94
|
/**
|
|
55
|
-
*
|
|
56
|
-
* @param lastRunDate: The retrieved last run date from the database
|
|
57
|
-
* @returns The last run date converted to the user's timezone
|
|
95
|
+
* Converts a run date to the user's local timezone.
|
|
58
96
|
*/
|
|
59
97
|
convertLastRunDateToTimezone(lastRunDate: Date): Promise<Date>;
|
|
60
98
|
/**
|
|
61
|
-
* Retrieves the last run date
|
|
62
|
-
* @param contentSourceID: The ID of the content source to retrieve the last run date
|
|
63
|
-
* @param contextUser: The user context to retrieve the last run date
|
|
64
|
-
* @returns
|
|
99
|
+
* Retrieves the last run date for a content source. Returns epoch date if no runs exist.
|
|
65
100
|
*/
|
|
66
101
|
getContentSourceLastRunDate(contentSourceID: string, contextUser: UserInfo): Promise<Date>;
|
|
67
|
-
|
|
102
|
+
GetContentItemParams(contentTypeID: string): {
|
|
68
103
|
modelID: string;
|
|
69
104
|
minTags: number;
|
|
70
105
|
maxTags: number;
|
|
106
|
+
};
|
|
107
|
+
GetContentSourceTypeName(contentSourceTypeID: string): string;
|
|
108
|
+
GetContentTypeName(contentTypeID: string): string;
|
|
109
|
+
GetContentFileTypeName(contentFileTypeID: string): string;
|
|
110
|
+
GetAdditionalContentTypePrompt(contentTypeID: string): string;
|
|
111
|
+
GetContentItemDescription(contentSourceParams: ContentSourceParams): string;
|
|
112
|
+
getChecksumFromURL(url: string): Promise<string>;
|
|
113
|
+
getChecksumFromText(text: string): Promise<string>;
|
|
114
|
+
getContentItemIDFromURL(contentSourceParams: ContentSourceParams, contextUser: UserInfo): Promise<string>;
|
|
115
|
+
/**
|
|
116
|
+
* Saves process run metadata to the database.
|
|
117
|
+
*/
|
|
118
|
+
saveProcessRun(processRunParams: ProcessRunParams, contextUser: UserInfo): Promise<void>;
|
|
119
|
+
parsePDF(dataBuffer: Buffer): Promise<string>;
|
|
120
|
+
parseDOCX(dataBuffer: Buffer): Promise<string>;
|
|
121
|
+
parseHTML(data: string): Promise<string>;
|
|
122
|
+
parseFileFromPath(filePath: string): Promise<string>;
|
|
123
|
+
/**
|
|
124
|
+
* Embeds content items and upserts them to the appropriate vector index.
|
|
125
|
+
* Items are grouped by their resolved (embeddingModel + vectorIndex) pair — derived
|
|
126
|
+
* from per-ContentSource overrides, per-ContentType defaults, or the global fallback
|
|
127
|
+
* (first active VectorIndex). Each group is processed in configurable batches with
|
|
128
|
+
* parallel upserts within each batch.
|
|
129
|
+
*/
|
|
130
|
+
VectorizeContentItems(items: MJContentItemEntity[], contextUser: UserInfo, onProgress?: (processed: number, total: number) => void, batchSize?: number): Promise<{
|
|
131
|
+
vectorized: number;
|
|
132
|
+
skipped: number;
|
|
71
133
|
}>;
|
|
72
134
|
/**
|
|
73
|
-
*
|
|
74
|
-
*
|
|
75
|
-
* @param contextUser
|
|
76
|
-
* @returns
|
|
135
|
+
* Process a single infrastructure group: embed texts in batches and upsert to vector DB.
|
|
136
|
+
* Upserts within each batch run in parallel for throughput.
|
|
77
137
|
*/
|
|
78
|
-
|
|
138
|
+
private vectorizeGroup;
|
|
79
139
|
/**
|
|
80
|
-
*
|
|
81
|
-
*
|
|
82
|
-
* @param contextUser
|
|
83
|
-
* @returns
|
|
140
|
+
* Load content source and content type records for all unique source/type IDs
|
|
141
|
+
* referenced by the given items. Returns maps keyed by normalized ID.
|
|
84
142
|
*/
|
|
85
|
-
|
|
143
|
+
private loadContentSourceAndTypeMaps;
|
|
86
144
|
/**
|
|
87
|
-
*
|
|
88
|
-
*
|
|
89
|
-
* @param contextUser
|
|
90
|
-
* @returns
|
|
145
|
+
* Resolve the (embeddingModelID, vectorIndexID) pair for a content item using
|
|
146
|
+
* the cascade: ContentSource override -> ContentType default -> null (global fallback).
|
|
91
147
|
*/
|
|
92
|
-
|
|
93
|
-
getAdditionalContentTypePrompt(contentTypeID: string, contextUser: UserInfo): Promise<string>;
|
|
148
|
+
private resolveItemInfrastructureIds;
|
|
94
149
|
/**
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
getChecksumFromText(text: string): Promise<string>;
|
|
102
|
-
getContentItemIDFromURL(contentSourceParams: ContentSourceParams, contextUser: UserInfo): Promise<string>;
|
|
150
|
+
* Group items by their resolved (embeddingModelID + vectorIndexID) key.
|
|
151
|
+
* Items with the same pair share infrastructure and can be batched together.
|
|
152
|
+
*/
|
|
153
|
+
private groupItemsByInfrastructure;
|
|
154
|
+
/** Create a stable cache key for an (embeddingModelID, vectorIndexID) pair */
|
|
155
|
+
private infraGroupKey;
|
|
103
156
|
/**
|
|
104
|
-
*
|
|
105
|
-
*
|
|
106
|
-
* @param contextUser: The user context to save the process run
|
|
107
|
-
* @returns
|
|
157
|
+
* Resolve a group key into concrete infrastructure instances. For the 'default|default'
|
|
158
|
+
* key, falls back to the first active VectorIndex (original behavior).
|
|
108
159
|
*/
|
|
109
|
-
|
|
160
|
+
private resolveGroupInfrastructure;
|
|
110
161
|
/**
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
parsePDF(dataBuffer: Buffer): Promise<string>;
|
|
162
|
+
* Build infrastructure from explicit embeddingModelID and vectorIndexID.
|
|
163
|
+
* Looks up the vector index by ID and the embedding model from AIEngine.
|
|
164
|
+
*/
|
|
165
|
+
private buildVectorInfrastructure;
|
|
116
166
|
/**
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
*/
|
|
121
|
-
parseDOCX(dataBuffer: Buffer): Promise<string>;
|
|
122
|
-
parseHTML(data: string): Promise<string>;
|
|
167
|
+
* Fallback: resolve infrastructure from the first active VectorIndex (original behavior).
|
|
168
|
+
*/
|
|
169
|
+
private getDefaultVectorInfrastructure;
|
|
123
170
|
/**
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
*/
|
|
129
|
-
|
|
171
|
+
* Shared helper: given a vector index record and embedding model ID, resolve all
|
|
172
|
+
* driver instances needed for embedding + upsert.
|
|
173
|
+
*/
|
|
174
|
+
private createInfrastructureFromIndex;
|
|
175
|
+
/** Find an embedding model by ID in AIEngine, with helpful error reporting */
|
|
176
|
+
private findEmbeddingModel;
|
|
177
|
+
/** Create a BaseEmbeddings instance for a given driver class */
|
|
178
|
+
private createEmbeddingInstance;
|
|
179
|
+
/** Create a VectorDBBase instance for a given class key */
|
|
180
|
+
private createVectorDBInstance;
|
|
181
|
+
/** SHA-1 deterministic vector ID for a content item */
|
|
182
|
+
private contentItemVectorId;
|
|
183
|
+
/** Build the text that gets embedded: Title + Description + full Text */
|
|
184
|
+
private buildEmbeddingText;
|
|
185
|
+
/** Build metadata stored alongside the vector — truncate large text fields */
|
|
186
|
+
private buildVectorMetadata;
|
|
187
|
+
/** Load all tags for the given items in a single RunView call */
|
|
188
|
+
private loadTagsForItems;
|
|
130
189
|
}
|
|
131
190
|
//# sourceMappingURL=AutotagBaseEngine.d.ts.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"AutotagBaseEngine.d.ts","sourceRoot":"","sources":["../../../src/Engine/generic/AutotagBaseEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAqB,QAAQ,
|
|
1
|
+
{"version":3,"file":"AutotagBaseEngine.d.ts","sourceRoot":"","sources":["../../../src/Engine/generic/AutotagBaseEngine.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAA4B,iBAAiB,EAAqB,QAAQ,EAAuB,MAAM,sBAAsB,CAAA;AAEhJ,OAAO,EACH,qBAAqB,EAAE,mBAAmB,EAAE,uBAAuB,EACxC,mBAAmB,EAAE,yBAAyB,EACzE,4BAA4B,EACE,8BAA8B,EAC/D,MAAM,+BAA+B,CAAA;AACtC,OAAO,EAAE,mBAAmB,EAAE,uBAAuB,EAAE,2BAA2B,EAAE,MAAM,iBAAiB,CAAA;AAI3G,OAAO,EAAE,gBAAgB,EAAE,UAAU,EAAE,wBAAwB,EAAE,MAAM,iBAAiB,CAAA;AASxF,OAAO,KAAK,EAAE,wBAAwB,EAAE,MAAM,8BAA8B,CAAA;AAkB5E;;;;GAIG;AACH,qBACa,iBAAkB,SAAQ,UAAU,CAAC,iBAAiB,CAAC;IAChE,WAAkB,QAAQ,IAAI,iBAAiB,CAE9C;IAGD,OAAO,CAAC,aAAa,CAA6B;IAClD,OAAO,CAAC,mBAAmB,CAAmC;IAC9D,OAAO,CAAC,iBAAiB,CAAiC;IAC1D,OAAO,CAAC,sBAAsB,CAAsC;IACpE,OAAO,CAAC,wBAAwB,CAAwC;IAExE,2CAA2C;IAC3C,IAAW,YAAY,IAAI,mBAAmB,EAAE,CAA+B;IAC/E,kDAAkD;IAClD,IAAW,kBAAkB,IAAI,yBAAyB,EAAE,CAAqC;IACjG,gDAAgD;IAChD,IAAW,gBAAgB,IAAI,uBAAuB,EAAE,CAAmC;IAC3F,qDAAqD;IACrD,IAAW,qBAAqB,IAAI,4BAA4B,EAAE,CAAwC;IAC1G,wDAAwD;IACxD,IAAW,uBAAuB,IAAI,8BAA8B,EAAE,CAA0C;IAEnG,MAAM,CAAC,YAAY,CAAC,EAAE,OAAO,EAAE,WAAW,CAAC,EAAE,QAAQ,EAAE,QAAQ,CAAC,EAAE,iBAAiB,GAAG,OAAO,CAAC,OAAO,CAAC;IAgCnH;;;OAGG;IACU,4BAA4B,CACrC,YAAY,EAAE,mBAAmB,EAAE,EACnC,WAAW,EAAE,QAAQ,EACrB,SAAS,GAAE,MAAqC,EAChD,UAAU,CAAC,EAAE,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,WAAW,CAAC,EAAE,MAAM,KAAK,IAAI,GAC9E,OAAO,CAAC,IAAI,CAAC;IA4ChB;;OAEG;YACW,qBAAqB;IAgBnC;;OAEG;IACU,sBAAsB,CAAC,MAAM,EAAE,wBAAwB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAK3G;;;OAGG;IACH,OAAO,CAAC,gBAAgB;IAWxB;;OAEG;IACH,OAAO,CAAC,eAAe;IAqBV,+BAA+B,CAAC,MAAM,EAAE,wBAAwB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,UAAU,CAAC;IAuB1H;;;OAGG;IACH,OAAO,CAAC,iBAAiB;IAWzB;;;;OAIG;IACU,4BAA4B,CACrC,MAAM,EAAE,wBAAwB,EAChC,MAAM,EAAE,wBAAwB,EAChC,KAAK,EAAE,MAAM,EACb,UAAU,EAAE,UAAU,EACtB,WAAW,EAAE,QAAQ,GACtB,OAAO,CAAC,UAAU,CAAC;IAkDT,cAAc,CAAC,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAS5E,wBAAwB,CAAC,aAAa,EAAE,MAAM,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAOlG;;;OAGG;IACI,kBAAkB,CAAC,IAAI,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,GAAG,MAAM,EAAE;IA0BrE;;OAEG;IACH,OAAO,CAAC,iBAAiB;IAYzB;;;OAGG;IACU,mBAAmB,CAAC,aAAa,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAiCrH;;;OAGG;IACU,iCAAiC,CAAC,UAAU,EAAE,UAAU,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IA+B5G;;OAEG;IACU,oBAAoB,CAAC,WAAW,EAAE,QAAQ,EAAE,mBAAmB,EAAE,MAAM,GAAG,OAAO,CAAC,qBAAqB,EAAE,CAAC;IAehH,4BAA4B,CAAC,QAAQ,EAAE,MAAM,GAAG,MAAM;IAQhD,sBAAsB,CAAC,aAAa,EAAE,qBAAqB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,GAAG,CAAC,MAAM,EAAE,2BAA2B,CAAC,CAAC;IA2B5I,iCAAiC,CAAC,wBAAwB,EAAE,MAAM,GAAG,uBAAuB;IAa5F,sBAAsB,CAAC,KAAK,EAAE,MAAM,EAAE,IAAI,EAAE,MAAM,GAAG,2BAA2B;IAiBhF,eAAe,CAAC,GAAG,EAAE,MAAM,GAAG,OAAO;IAIrC,gBAAgB,CAAC,KAAK,EAAE,MAAM,GAAG,MAAM,EAAE;IAIhD;;OAEG;IACU,4BAA4B,CAAC,WAAW,EAAE,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC;IAK3E;;OAEG;IACU,2BAA2B,CAAC,eAAe,EAAE,MAAM,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAqBhG,oBAAoB,CAAC,aAAa,EAAE,MAAM,GAAG;QAAE,OAAO,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAA;KAAE;IAYlG,wBAAwB,CAAC,mBAAmB,EAAE,MAAM,GAAG,MAAM;IAQ7D,kBAAkB,CAAC,aAAa,EAAE,MAAM,GAAG,MAAM;IAQjD,sBAAsB,CAAC,iBAAiB,EAAE,MAAM,GAAG,MAAM;IAQzD,8BAA8B,CAAC,aAAa,EAAE,MAAM,GAAG,MAAM;IAS7D,yBAAyB,CAAC,mBAAmB,EAAE,mBAAmB,GAAG,MAAM;IAOrE,kBAAkB,CAAC,GAAG,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAMhD,mBAAmB,CAAC,IAAI,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAIlD,uBAAuB,CAAC,mBAAmB,EAAE,mBAAmB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,MAAM,CAAC;IAgBtH;;OAEG;IACU,cAAc,CAAC,gBAAgB,EAAE,gBAAgB,EAAE,WAAW,EAAE,QAAQ,GAAG,OAAO,CAAC,IAAI,CAAC;IAYxF,QAAQ,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAK7C,SAAS,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAK9C,SAAS,CAAC,IAAI,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAWxC,iBAAiB,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IAejE;;;;;;OAMG;IACU,qBAAqB,CAC9B,KAAK,EAAE,mBAAmB,EAAE,EAC5B,WAAW,EAAE,QAAQ,EACrB,UAAU,CAAC,EAAE,CAAC,SAAS,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,KAAK,IAAI,EACvD,SAAS,GAAE,MAAqC,GACjD,OAAO,CAAC;QAAE,UAAU,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAA;KAAE,CAAC;IAmCnD;;;OAGG;YACW,cAAc;IAmD5B;;;OAGG;YACW,4BAA4B;IA2C1C;;;OAGG;IACH,OAAO,CAAC,4BAA4B;IA2BpC;;;OAGG;IACH,OAAO,CAAC,0BAA0B;IAkBlC,8EAA8E;IAC9E,OAAO,CAAC,aAAa;IAMrB;;;OAGG;YACW,0BAA0B;IAcxC;;;OAGG;YACW,yBAAyB;IAqBvC;;OAEG;YACW,8BAA8B;IAiB5C;;;OAGG;YACW,6BAA6B;IAgC3C,8EAA8E;IAC9E,OAAO,CAAC,kBAAkB;IAU1B,gEAAgE;IAChE,OAAO,CAAC,uBAAuB;IAU/B,2DAA2D;IAC3D,OAAO,CAAC,sBAAsB;IAU9B,uDAAuD;IACvD,OAAO,CAAC,mBAAmB;IAI3B,yEAAyE;IACzE,OAAO,CAAC,kBAAkB;IAQ1B,8EAA8E;IAC9E,OAAO,CAAC,mBAAmB;IAgB3B,iEAAiE;YACnD,gBAAgB;CAsBjC"}
|