@aikeytake/social-automation 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +39 -0
- package/CLAUDE.md +256 -0
- package/CURRENT_CAPABILITIES.md +493 -0
- package/DATA_ORGANIZATION.md +416 -0
- package/IMPLEMENTATION_SUMMARY.md +287 -0
- package/INSTRUCTIONS.md +316 -0
- package/MASTER_PLAN.md +1096 -0
- package/README.md +280 -0
- package/config/sources.json +296 -0
- package/package.json +37 -0
- package/src/cli.js +197 -0
- package/src/fetchers/api.js +232 -0
- package/src/fetchers/hackernews.js +86 -0
- package/src/fetchers/linkedin.js +400 -0
- package/src/fetchers/linkedin_browser.js +167 -0
- package/src/fetchers/reddit.js +77 -0
- package/src/fetchers/rss.js +50 -0
- package/src/fetchers/twitter.js +194 -0
- package/src/index.js +346 -0
- package/src/query.js +316 -0
- package/src/utils/logger.js +74 -0
- package/src/utils/storage.js +134 -0
- package/src/writing-agents/QUICK-REFERENCE.md +111 -0
- package/src/writing-agents/WRITING-SKILLS-IMPROVEMENTS.md +273 -0
- package/src/writing-agents/utils/prompt-templates-improved.js +665 -0
package/.env.example
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Claude API for content rewriting
|
|
2
|
+
ANTHROPIC_API_KEY=your_anthropic_api_key_here
|
|
3
|
+
|
|
4
|
+
# Twitter/X API (optional - for posting)
|
|
5
|
+
TWITTER_API_KEY=your_twitter_api_key
|
|
6
|
+
TWITTER_API_SECRET=your_twitter_api_secret
|
|
7
|
+
TWITTER_ACCESS_TOKEN=your_twitter_access_token
|
|
8
|
+
TWITTER_ACCESS_SECRET=your_twitter_access_secret
|
|
9
|
+
TWITTER_BEARER_TOKEN=your_twitter_bearer_token
|
|
10
|
+
|
|
11
|
+
# LinkedIn (optional - for posting)
|
|
12
|
+
LINKEDIN_ACCESS_TOKEN=your_linkedin_access_token
|
|
13
|
+
LINKEDIN_PERSON_ID=your_linkedin_person_id
|
|
14
|
+
|
|
15
|
+
# BrightData API
|
|
16
|
+
BRIGHTDATA_API_KEY=your_brightdata_api_key_here
|
|
17
|
+
BRIGHTDATA_ZONE=mcp_unlocker
|
|
18
|
+
|
|
19
|
+
# Schedule (cron format)
|
|
20
|
+
FETCH_SCHEDULE=0 */2 * * *
|
|
21
|
+
PROCESS_SCHEDULE=30 */2 * * *
|
|
22
|
+
PUBLISH_SCHEDULE=0 9,15,21 * * *
|
|
23
|
+
|
|
24
|
+
# Content settings
|
|
25
|
+
MAX_POSTS_PER_RUN=5
|
|
26
|
+
MIN_POST_LENGTH=100
|
|
27
|
+
MAX_POST_LENGTH=3000
|
|
28
|
+
ENABLE_AUTO_PUBLISH=false
|
|
29
|
+
|
|
30
|
+
# Daily Post Generator Schedule
|
|
31
|
+
DAILY_POST_SCHEDULE=0 9 * * *
|
|
32
|
+
|
|
33
|
+
# Image Generation Settings
|
|
34
|
+
INFERENCE_SH_API_KEY=your_inference_sh_api_key
|
|
35
|
+
IMAGE_MODEL=falai/flux-dev-lora
|
|
36
|
+
|
|
37
|
+
# Product Hunt API - Register by creating an app at https://www.producthunt.com/v2/oauth/applications
|
|
38
|
+
PRODUCT_HUNT_API_KEY=your_product_hunt_api_key_here
|
|
39
|
+
PRODUCT_HUNT_API_SECRET=your_product_hunt_api_secret_here
|
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
Content research and aggregation tool that scrapes AI/tech news from multiple sources and stores structured JSON for AI agents to consume. The project has three main components:
|
|
8
|
+
|
|
9
|
+
1. **Content Scraping** - Scrapes RSS feeds, Reddit, Hacker News, and LinkedIn
|
|
10
|
+
2. **Newsletter System** - Email newsletter management with subscriber tracking
|
|
11
|
+
3. **Writing Agents** - AI-powered iterative writing using LangChain and Claude API
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Common Commands
|
|
16
|
+
|
|
17
|
+
### Content Scraping
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
# Scrape all sources (RSS, Reddit, Hacker News, LinkedIn)
|
|
21
|
+
npm run scrape
|
|
22
|
+
|
|
23
|
+
# Alternative:
|
|
24
|
+
node src/index.js scrape
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Output saved to `data/YYYY-MM-DD/`:
|
|
28
|
+
- `trending.json` - Top 20 trending items ranked by engagement
|
|
29
|
+
- `all.json` - All items combined from all sources
|
|
30
|
+
- `rss.json`, `reddit.json`, `hackernews.json`, `linkedin.json` - Per-source data
|
|
31
|
+
|
|
32
|
+
### Newsletter Management
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Show newsletter statistics
|
|
36
|
+
npm run newsletter:stats
|
|
37
|
+
|
|
38
|
+
# Add a subscriber
|
|
39
|
+
npm run newsletter:add user@example.com "John Doe"
|
|
40
|
+
|
|
41
|
+
# List all newsletters
|
|
42
|
+
npm run newsletter
|
|
43
|
+
|
|
44
|
+
# Generate newsletter from trending data (basic)
|
|
45
|
+
npm run newsletter:generate 2026-03-22
|
|
46
|
+
|
|
47
|
+
# Generate AI-enhanced newsletter (using writing agents)
|
|
48
|
+
npm run newsletter:generate:enhanced 2026-03-22
|
|
49
|
+
|
|
50
|
+
# Send newsletter
|
|
51
|
+
npm run newsletter:send <newsletter-id>
|
|
52
|
+
|
|
53
|
+
# Test send to a single email
|
|
54
|
+
npm run newsletter:test <newsletter-id> test@example.com
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Query Data
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# Interactive query mode
|
|
61
|
+
npm run query
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Architecture
|
|
67
|
+
|
|
68
|
+
### Content Scraping Flow
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
config/sources.json (configuration)
|
|
72
|
+
↓
|
|
73
|
+
src/index.js (ContentScraper orchestrator)
|
|
74
|
+
↓
|
|
75
|
+
src/fetchers/
|
|
76
|
+
├── rss.js # 17 RSS feeds (OpenAI, Anthropic, Claude Blog, arXiv, etc.)
|
|
77
|
+
├── reddit.js # 7 AI subreddits (min 100 upvotes)
|
|
78
|
+
├── hackernews.js # AI-filtered stories (min 50 points)
|
|
79
|
+
└── linkedin.js # LinkedIn KOL posts via BrightData SERP
|
|
80
|
+
↓
|
|
81
|
+
data/YYYY-MM-DD/*.json (daily output)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Newsletter System
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
src/newsletter/
|
|
88
|
+
├── api/newsletter-service.js # Main service class
|
|
89
|
+
├── models/
|
|
90
|
+
│ ├── subscriber.js # Subscriber data model
|
|
91
|
+
│ └── newsletter.js # Newsletter model with generateIterative()
|
|
92
|
+
├── utils/
|
|
93
|
+
│ ├── email-sender.js # SMTP/SendGrid email sending
|
|
94
|
+
│ └── helpers.js # Helper functions
|
|
95
|
+
├── cli.js # CLI interface
|
|
96
|
+
└── data/
|
|
97
|
+
├── subscribers.json # Subscriber storage
|
|
98
|
+
└── newsletters.json # Newsletter storage
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Writing Agents (LangChain + Claude API)
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
src/writing-agents/
|
|
105
|
+
├── core/
|
|
106
|
+
│ └── iterative-workflow.js # Writer ↔ Critic orchestration loop
|
|
107
|
+
├── agents/
|
|
108
|
+
│ ├── writer-agent.js # Generates/revises newsletter content
|
|
109
|
+
│ └── critic-agent.js # Reviews content with 1-10 quality scoring
|
|
110
|
+
├── models/
|
|
111
|
+
│ ├── draft.js # Tracks newsletter versions and history
|
|
112
|
+
│ └── critique.js # Stores feedback and quality metrics
|
|
113
|
+
├── utils/
|
|
114
|
+
│ ├── prompt-templates.js # Current agent prompts
|
|
115
|
+
│ ├── prompt-templates-improved.js # Improved prompts (ready for testing)
|
|
116
|
+
│ └── cache-manager.js # Response caching for cost reduction
|
|
117
|
+
└── config/
|
|
118
|
+
└── agent-config.js # Configuration & cost estimation
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Writing Agent Workflow:**
|
|
122
|
+
```
|
|
123
|
+
Articles (from trending.json)
|
|
124
|
+
↓
|
|
125
|
+
Writer Agent → Initial Draft
|
|
126
|
+
↓
|
|
127
|
+
Critic Agent → Review & Score (6 criteria: accuracy, clarity, value, completeness, voice, citations)
|
|
128
|
+
↓
|
|
129
|
+
Quality Check → Is score ≥ threshold? (default: 8/10)
|
|
130
|
+
↓ if NO: Writer Agent → Revise based on critique → loop back to Critic
|
|
131
|
+
↓ if YES: Finalize → Newsletter created
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Configuration
|
|
137
|
+
|
|
138
|
+
### Source Configuration (`config/sources.json`)
|
|
139
|
+
|
|
140
|
+
- **rssFeeds**: 17 RSS sources with categories (ai-news, ai-research, company-news, etc.)
|
|
141
|
+
- **trendingSources.reddit**: 7 AI subreddits with minScore and maxAge filters
|
|
142
|
+
- **trendingSources.hackernews**: AI keyword filtering with minPoints threshold
|
|
143
|
+
- **linkedin**: KOL profiles file path, batch size, enrichment settings
|
|
144
|
+
|
|
145
|
+
### Environment Variables (`.env`)
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Content scraping
|
|
149
|
+
BRIGHTDATA_API_KEY=... # Required for LinkedIn scraping
|
|
150
|
+
BRIGHTDATA_ZONE=mcp_unlocker # BrightData zone name
|
|
151
|
+
|
|
152
|
+
# Newsletter
|
|
153
|
+
EMAIL_PROVIDER=smtp # smtp, sendgrid, or console
|
|
154
|
+
SMTP_HOST=smtp.gmail.com
|
|
155
|
+
SMTP_PORT=587
|
|
156
|
+
SMTP_SECURE=false
|
|
157
|
+
SMTP_USER=your-email@gmail.com
|
|
158
|
+
SMTP_PASS=your-app-password
|
|
159
|
+
|
|
160
|
+
# Writing Agents
|
|
161
|
+
WRITING_AGENTS_ENABLED=true
|
|
162
|
+
WRITING_AGENTS_MAX_ITERATIONS=3
|
|
163
|
+
WRITING_AGENTS_QUALITY_THRESHOLD=8
|
|
164
|
+
WRITING_AGENTS_MODEL=claude-3-5-sonnet-20241022
|
|
165
|
+
ANTHROPIC_API_KEY=sk-ant-... # Required for writing agents
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Data Models
|
|
171
|
+
|
|
172
|
+
### Scraped Item Structure
|
|
173
|
+
|
|
174
|
+
```json
|
|
175
|
+
{
|
|
176
|
+
"id": "unique_id",
|
|
177
|
+
"source": "rss|reddit|hackernews|linkedin",
|
|
178
|
+
"sourceName": "Source Name",
|
|
179
|
+
"category": "ai-news|company-news|research",
|
|
180
|
+
"title": "Article Title",
|
|
181
|
+
"url": "https://...",
|
|
182
|
+
"summary": "Content summary...",
|
|
183
|
+
"content": "Full content...",
|
|
184
|
+
"pubDate": "2026-03-22T10:00:00Z",
|
|
185
|
+
"age_hours": 24,
|
|
186
|
+
"engagement": {
|
|
187
|
+
"upvotes": 4500,
|
|
188
|
+
"comments": 823
|
|
189
|
+
},
|
|
190
|
+
"scraped_at": "2026-03-22T10:00:00Z"
|
|
191
|
+
}
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Trending Item Structure
|
|
195
|
+
|
|
196
|
+
```json
|
|
197
|
+
{
|
|
198
|
+
"rank": 1,
|
|
199
|
+
"score": 4750,
|
|
200
|
+
"sources": ["reddit", "hackernews"],
|
|
201
|
+
"title": "Article Title",
|
|
202
|
+
"url": "https://...",
|
|
203
|
+
"summary": "...",
|
|
204
|
+
"keywords": ["AI", "GPT"],
|
|
205
|
+
"engagement": { "upvotes": 4500, "comments": 823 }
|
|
206
|
+
}
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Important Design Decisions
|
|
212
|
+
|
|
213
|
+
1. **No Vercel Deployment** - This is a Node.js CLI tool, not a web application. Uses `dotenv` for config, not Vercel env vars.
|
|
214
|
+
|
|
215
|
+
2. **Daily Data Organization** - Each scrape run creates a new `data/YYYY-MM-DD/` folder. Output files are overwritten on re-scrape for the same date.
|
|
216
|
+
|
|
217
|
+
3. **Quality Threshold for Writing Agents** - Default is 8/10. Agent loop terminates early when threshold is met to save API costs. Max iterations: 3.
|
|
218
|
+
|
|
219
|
+
4. **BrightData for LinkedIn** - LinkedIn scraping requires BrightData SERP API with zone `mcp_unlocker`. KOL profiles loaded from external path.
|
|
220
|
+
|
|
221
|
+
5. **Cache Manager** - Writing agents cache LLM responses for 7 days (TTL) to reduce API costs. ~70% cache hit rate expected.
|
|
222
|
+
|
|
223
|
+
6. **ES Modules** - Project uses `"type": "module"` in package.json. All imports use `.js` extensions.
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## Testing Writing Agents
|
|
228
|
+
|
|
229
|
+
The project has two prompt template sets for writing agents:
|
|
230
|
+
|
|
231
|
+
1. **Current**: `prompt-templates.js` - Basic prompts
|
|
232
|
+
2. **Improved**: `prompt-templates-improved.js` - Enhanced with examples, rubrics, anti-patterns
|
|
233
|
+
|
|
234
|
+
To test improved prompts, update the import in agents/writer-agent.js and agents/critic-agent.js from `'./prompt-templates.js'` to `'./prompt-templates-improved.js'`.
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## Key Dependencies
|
|
239
|
+
|
|
240
|
+
- `rss-parser` - RSS feed parsing
|
|
241
|
+
- `axios` - HTTP requests
|
|
242
|
+
- `cheerio` - HTML parsing
|
|
243
|
+
- `nodemailer` - Email sending
|
|
244
|
+
- `@langchain/anthropic` - LangChain integration with Claude
|
|
245
|
+
- `@langchain/langgraph` - Agent workflow orchestration
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## LinkedIn KOL Configuration
|
|
250
|
+
|
|
251
|
+
LinkedIn KOL (Key Opinion Leader) profiles are stored in an external file:
|
|
252
|
+
```
|
|
253
|
+
/home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
This path is configured in `config/sources.json` under `linkedin.profilesFile`.
|