@memberjunction/content-autotagging 2.32.2 → 2.33.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +143 -0
- package/package.json +6 -6
package/README.md
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# MemberJunction Content Autotagging
|
|
2
|
+
|
|
3
|
+
A powerful package for automatically processing and tagging content from various sources using AI models.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
The `@memberjunction/content-autotagging` package is designed to automate the process of ingesting, analyzing, and tagging content from diverse sources such as RSS feeds, websites, local files, and cloud storage. It leverages AI capabilities to extract meaningful tags, summaries, and metadata from content, helping organizations better organize and retrieve information.
|
|
8
|
+
|
|
9
|
+
## Installation
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
npm install @memberjunction/content-autotagging
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Dependencies
|
|
16
|
+
|
|
17
|
+
This package is part of the MemberJunction ecosystem and relies on several core MemberJunction packages:
|
|
18
|
+
|
|
19
|
+
- `@memberjunction/ai` - For AI model integration
|
|
20
|
+
- `@memberjunction/aiengine` - For AI processing pipeline
|
|
21
|
+
- `@memberjunction/core` - For MemberJunction core functionality
|
|
22
|
+
- `@memberjunction/core-entities` - For entity models
|
|
23
|
+
- `@memberjunction/global` - For global utilities and configurations
|
|
24
|
+
|
|
25
|
+
External dependencies include:
|
|
26
|
+
- `axios` - For HTTP requests
|
|
27
|
+
- `cheerio` - For HTML parsing
|
|
28
|
+
- `pdf-parse` - For PDF document parsing
|
|
29
|
+
- `officeparser` - For Microsoft Office document parsing
|
|
30
|
+
- `rss-parser` - For RSS feed parsing
|
|
31
|
+
- `date-fns` - For date manipulation
|
|
32
|
+
|
|
33
|
+
## Architecture
|
|
34
|
+
|
|
35
|
+
The package follows a modular, extensible architecture with several key components:
|
|
36
|
+
|
|
37
|
+
1. **Content Sources** - Adapters for different content types (RSS, web, files)
|
|
38
|
+
2. **Content Processors** - Logic for processing content based on its type
|
|
39
|
+
3. **Tag Generators** - AI-powered systems for generating tags from content
|
|
40
|
+
4. **Storage Adapters** - For persisting processed content to the MemberJunction database
|
|
41
|
+
|
|
42
|
+
## Usage
|
|
43
|
+
|
|
44
|
+
### Basic Example
|
|
45
|
+
|
|
46
|
+
```typescript
|
|
47
|
+
import { ContentAutotaggingProcessor } from '@memberjunction/content-autotagging';
|
|
48
|
+
|
|
49
|
+
// Initialize the processor
|
|
50
|
+
const processor = new ContentAutotaggingProcessor();
|
|
51
|
+
|
|
52
|
+
// Process content from various sources
|
|
53
|
+
async function processContent() {
|
|
54
|
+
// Process RSS feeds
|
|
55
|
+
await processor.processRSSFeeds();
|
|
56
|
+
|
|
57
|
+
// Process web content
|
|
58
|
+
await processor.processWebContent('https://example.com/article');
|
|
59
|
+
|
|
60
|
+
// Process local documents
|
|
61
|
+
await processor.processLocalDocument('/path/to/document.pdf');
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
processContent();
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Cloud Storage Integration
|
|
68
|
+
|
|
69
|
+
```typescript
|
|
70
|
+
import {
|
|
71
|
+
ContentAutotaggingProcessor,
|
|
72
|
+
AzureBlobStorageAdapter
|
|
73
|
+
} from '@memberjunction/content-autotagging';
|
|
74
|
+
|
|
75
|
+
// Create storage adapter
|
|
76
|
+
const storageAdapter = new AzureBlobStorageAdapter({
|
|
77
|
+
connectionString: process.env.AZURE_STORAGE_CONNECTION_STRING,
|
|
78
|
+
containerName: 'content'
|
|
79
|
+
});
|
|
80
|
+
|
|
81
|
+
// Initialize processor with storage adapter
|
|
82
|
+
const processor = new ContentAutotaggingProcessor({ storageAdapter });
|
|
83
|
+
|
|
84
|
+
// Process documents from cloud storage
|
|
85
|
+
async function processCloudDocuments() {
|
|
86
|
+
await processor.processCloudDocuments();
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
processCloudDocuments();
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Content Processing Flow
|
|
93
|
+
|
|
94
|
+
1. **Content Acquisition** - Content is retrieved from the source (RSS feed, web page, file system, cloud storage)
|
|
95
|
+
2. **Parsing & Extraction** - Raw content is parsed based on its type (HTML, PDF, Office document, etc.)
|
|
96
|
+
3. **AI Processing** - Extracted text is sent to AI models for analysis, generating tags, summaries, and metadata
|
|
97
|
+
4. **Storage** - Processed content with its AI-generated tags is stored in the MemberJunction database
|
|
98
|
+
5. **Notification** - Optional notifications are sent upon completion (if configured)
|
|
99
|
+
|
|
100
|
+
## Extending the Package
|
|
101
|
+
|
|
102
|
+
### Adding Custom Content Sources
|
|
103
|
+
|
|
104
|
+
You can extend the package by creating custom content source adapters:
|
|
105
|
+
|
|
106
|
+
```typescript
|
|
107
|
+
import { BaseContentSource, ContentItem } from '@memberjunction/content-autotagging';
|
|
108
|
+
|
|
109
|
+
export class MyCustomSource extends BaseContentSource {
|
|
110
|
+
async getContentItems(): Promise<ContentItem[]> {
|
|
111
|
+
// Implementation for retrieving content items from your source
|
|
112
|
+
return [
|
|
113
|
+
{
|
|
114
|
+
id: 'unique-id',
|
|
115
|
+
title: 'Content Title',
|
|
116
|
+
content: 'Content text...',
|
|
117
|
+
url: 'https://source.url',
|
|
118
|
+
publishDate: new Date(),
|
|
119
|
+
author: 'Author Name',
|
|
120
|
+
sourceType: 'custom'
|
|
121
|
+
}
|
|
122
|
+
];
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Configuration
|
|
128
|
+
|
|
129
|
+
The package supports configuration through environment variables or a configuration object:
|
|
130
|
+
|
|
131
|
+
```typescript
|
|
132
|
+
const processor = new ContentAutotaggingProcessor({
|
|
133
|
+
aiModel: 'gpt-4',
|
|
134
|
+
maxContentLength: 10000,
|
|
135
|
+
taggingPrompt: 'Extract relevant tags from the following content:',
|
|
136
|
+
batchSize: 5,
|
|
137
|
+
storageAdapter: customStorageAdapter
|
|
138
|
+
});
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## License
|
|
142
|
+
|
|
143
|
+
ISC
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@memberjunction/content-autotagging",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.33.0",
|
|
4
4
|
"description": "MemberJunction Content Autotagging Application",
|
|
5
5
|
"main": "dist/src/index.js",
|
|
6
6
|
"types": "dist/src/index.d.ts",
|
|
@@ -15,11 +15,11 @@
|
|
|
15
15
|
"author": "MemberJunction.com",
|
|
16
16
|
"license": "ISC",
|
|
17
17
|
"dependencies": {
|
|
18
|
-
"@memberjunction/ai": "2.
|
|
19
|
-
"@memberjunction/aiengine": "2.
|
|
20
|
-
"@memberjunction/core": "2.
|
|
21
|
-
"@memberjunction/core-entities": "2.
|
|
22
|
-
"@memberjunction/global": "2.
|
|
18
|
+
"@memberjunction/ai": "2.33.0",
|
|
19
|
+
"@memberjunction/aiengine": "2.33.0",
|
|
20
|
+
"@memberjunction/core": "2.33.0",
|
|
21
|
+
"@memberjunction/core-entities": "2.33.0",
|
|
22
|
+
"@memberjunction/global": "2.33.0",
|
|
23
23
|
"axios": "^1.7.2",
|
|
24
24
|
"cheerio": "^1.0.0-rc.12",
|
|
25
25
|
"crypto": "^1.0.1",
|