@memberjunction/content-autotagging 2.32.2 → 2.33.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +143 -0
  2. package/package.json +6 -6
package/README.md ADDED
@@ -0,0 +1,143 @@
1
+ # MemberJunction Content Autotagging
2
+
3
+ A powerful package for automatically processing and tagging content from various sources using AI models.
4
+
5
+ ## Overview
6
+
7
+ The `@memberjunction/content-autotagging` package is designed to automate the process of ingesting, analyzing, and tagging content from diverse sources such as RSS feeds, websites, local files, and cloud storage. It leverages AI capabilities to extract meaningful tags, summaries, and metadata from content, helping organizations better organize and retrieve information.
8
+
9
+ ## Installation
10
+
11
+ ```bash
12
+ npm install @memberjunction/content-autotagging
13
+ ```
14
+
15
+ ## Dependencies
16
+
17
+ This package is part of the MemberJunction ecosystem and relies on several core MemberJunction packages:
18
+
19
+ - `@memberjunction/ai` - For AI model integration
20
+ - `@memberjunction/aiengine` - For AI processing pipeline
21
+ - `@memberjunction/core` - For MemberJunction core functionality
22
+ - `@memberjunction/core-entities` - For entity models
23
+ - `@memberjunction/global` - For global utilities and configurations
24
+
25
+ External dependencies include:
26
+ - `axios` - For HTTP requests
27
+ - `cheerio` - For HTML parsing
28
+ - `pdf-parse` - For PDF document parsing
29
+ - `officeparser` - For Microsoft Office document parsing
30
+ - `rss-parser` - For RSS feed parsing
31
+ - `date-fns` - For date manipulation
32
+
33
+ ## Architecture
34
+
35
+ The package follows a modular, extensible architecture with several key components:
36
+
37
+ 1. **Content Sources** - Adapters for different content types (RSS, web, files)
38
+ 2. **Content Processors** - Logic for processing content based on its type
39
+ 3. **Tag Generators** - AI-powered systems for generating tags from content
40
+ 4. **Storage Adapters** - For persisting processed content to the MemberJunction database
41
+
42
+ ## Usage
43
+
44
+ ### Basic Example
45
+
46
+ ```typescript
47
+ import { ContentAutotaggingProcessor } from '@memberjunction/content-autotagging';
48
+
49
+ // Initialize the processor
50
+ const processor = new ContentAutotaggingProcessor();
51
+
52
+ // Process content from various sources
53
+ async function processContent() {
54
+ // Process RSS feeds
55
+ await processor.processRSSFeeds();
56
+
57
+ // Process web content
58
+ await processor.processWebContent('https://example.com/article');
59
+
60
+ // Process local documents
61
+ await processor.processLocalDocument('/path/to/document.pdf');
62
+ }
63
+
64
+ processContent();
65
+ ```
66
+
67
+ ### Cloud Storage Integration
68
+
69
+ ```typescript
70
+ import {
71
+ ContentAutotaggingProcessor,
72
+ AzureBlobStorageAdapter
73
+ } from '@memberjunction/content-autotagging';
74
+
75
+ // Create storage adapter
76
+ const storageAdapter = new AzureBlobStorageAdapter({
77
+ connectionString: process.env.AZURE_STORAGE_CONNECTION_STRING,
78
+ containerName: 'content'
79
+ });
80
+
81
+ // Initialize processor with storage adapter
82
+ const processor = new ContentAutotaggingProcessor({ storageAdapter });
83
+
84
+ // Process documents from cloud storage
85
+ async function processCloudDocuments() {
86
+ await processor.processCloudDocuments();
87
+ }
88
+
89
+ processCloudDocuments();
90
+ ```
91
+
92
+ ## Content Processing Flow
93
+
94
+ 1. **Content Acquisition** - Content is retrieved from the source (RSS feed, web page, file system, cloud storage)
95
+ 2. **Parsing & Extraction** - Raw content is parsed based on its type (HTML, PDF, Office document, etc.)
96
+ 3. **AI Processing** - Extracted text is sent to AI models for analysis, generating tags, summaries, and metadata
97
+ 4. **Storage** - Processed content with its AI-generated tags is stored in the MemberJunction database
98
+ 5. **Notification** - Optional notifications are sent upon completion (if configured)
99
+
100
+ ## Extending the Package
101
+
102
+ ### Adding Custom Content Sources
103
+
104
+ You can extend the package by creating custom content source adapters:
105
+
106
+ ```typescript
107
+ import { BaseContentSource, ContentItem } from '@memberjunction/content-autotagging';
108
+
109
+ export class MyCustomSource extends BaseContentSource {
110
+ async getContentItems(): Promise<ContentItem[]> {
111
+ // Implementation for retrieving content items from your source
112
+ return [
113
+ {
114
+ id: 'unique-id',
115
+ title: 'Content Title',
116
+ content: 'Content text...',
117
+ url: 'https://source.url',
118
+ publishDate: new Date(),
119
+ author: 'Author Name',
120
+ sourceType: 'custom'
121
+ }
122
+ ];
123
+ }
124
+ }
125
+ ```
126
+
127
+ ## Configuration
128
+
129
+ The package supports configuration through environment variables or a configuration object:
130
+
131
+ ```typescript
132
+ const processor = new ContentAutotaggingProcessor({
133
+ aiModel: 'gpt-4',
134
+ maxContentLength: 10000,
135
+ taggingPrompt: 'Extract relevant tags from the following content:',
136
+ batchSize: 5,
137
+ storageAdapter: customStorageAdapter
138
+ });
139
+ ```
140
+
141
+ ## License
142
+
143
+ ISC
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@memberjunction/content-autotagging",
3
- "version": "2.32.2",
3
+ "version": "2.33.0",
4
4
  "description": "MemberJunction Content Autotagging Application",
5
5
  "main": "dist/src/index.js",
6
6
  "types": "dist/src/index.d.ts",
@@ -15,11 +15,11 @@
15
15
  "author": "MemberJunction.com",
16
16
  "license": "ISC",
17
17
  "dependencies": {
18
- "@memberjunction/ai": "2.32.2",
19
- "@memberjunction/aiengine": "2.32.2",
20
- "@memberjunction/core": "2.32.2",
21
- "@memberjunction/core-entities": "2.32.2",
22
- "@memberjunction/global": "2.32.2",
18
+ "@memberjunction/ai": "2.33.0",
19
+ "@memberjunction/aiengine": "2.33.0",
20
+ "@memberjunction/core": "2.33.0",
21
+ "@memberjunction/core-entities": "2.33.0",
22
+ "@memberjunction/global": "2.33.0",
23
23
  "axios": "^1.7.2",
24
24
  "cheerio": "^1.0.0-rc.12",
25
25
  "crypto": "^1.0.1",