@aikeytake/social-automation 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example ADDED
@@ -0,0 +1,39 @@
1
+ # Claude API for content rewriting
2
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
3
+
4
+ # Twitter/X API (optional - for posting)
5
+ TWITTER_API_KEY=your_twitter_api_key
6
+ TWITTER_API_SECRET=your_twitter_api_secret
7
+ TWITTER_ACCESS_TOKEN=your_twitter_access_token
8
+ TWITTER_ACCESS_SECRET=your_twitter_access_secret
9
+ TWITTER_BEARER_TOKEN=your_twitter_bearer_token
10
+
11
+ # LinkedIn (optional - for posting)
12
+ LINKEDIN_ACCESS_TOKEN=your_linkedin_access_token
13
+ LINKEDIN_PERSON_ID=your_linkedin_person_id
14
+
15
+ # BrightData API
16
+ BRIGHTDATA_API_KEY=your_brightdata_api_key_here
17
+ BRIGHTDATA_ZONE=mcp_unlocker
18
+
19
+ # Schedule (cron format)
20
+ FETCH_SCHEDULE=0 */2 * * *
21
+ PROCESS_SCHEDULE=30 */2 * * *
22
+ PUBLISH_SCHEDULE=0 9,15,21 * * *
23
+
24
+ # Content settings
25
+ MAX_POSTS_PER_RUN=5
26
+ MIN_POST_LENGTH=100
27
+ MAX_POST_LENGTH=3000
28
+ ENABLE_AUTO_PUBLISH=false
29
+
30
+ # Daily Post Generator Schedule
31
+ DAILY_POST_SCHEDULE=0 9 * * *
32
+
33
+ # Image Generation Settings
34
+ INFERENCE_SH_API_KEY=your_inference_sh_api_key
35
+ IMAGE_MODEL=falai/flux-dev-lora
36
+
37
+ # Product Hunt API - Register by creating an app at https://www.producthunt.com/v2/oauth/applications
38
+ PRODUCT_HUNT_API_KEY=your_product_hunt_api_key_here
39
+ PRODUCT_HUNT_API_SECRET=your_product_hunt_api_secret_here
package/CLAUDE.md ADDED
@@ -0,0 +1,256 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ Content research and aggregation tool that scrapes AI/tech news from multiple sources and stores structured JSON for AI agents to consume. The project has three main components:
8
+
9
+ 1. **Content Scraping** - Scrapes RSS feeds, Reddit, Hacker News, and LinkedIn
10
+ 2. **Newsletter System** - Email newsletter management with subscriber tracking
11
+ 3. **Writing Agents** - AI-powered iterative writing using LangChain and Claude API
12
+
13
+ ---
14
+
15
+ ## Common Commands
16
+
17
+ ### Content Scraping
18
+
19
+ ```bash
20
+ # Scrape all sources (RSS, Reddit, Hacker News, LinkedIn)
21
+ npm run scrape
22
+
23
+ # Alternative:
24
+ node src/index.js scrape
25
+ ```
26
+
27
+ Output saved to `data/YYYY-MM-DD/`:
28
+ - `trending.json` - Top 20 trending items ranked by engagement
29
+ - `all.json` - All items combined from all sources
30
+ - `rss.json`, `reddit.json`, `hackernews.json`, `linkedin.json` - Per-source data
31
+
32
+ ### Newsletter Management
33
+
34
+ ```bash
35
+ # Show newsletter statistics
36
+ npm run newsletter:stats
37
+
38
+ # Add a subscriber
39
+ npm run newsletter:add user@example.com "John Doe"
40
+
41
+ # List all newsletters
42
+ npm run newsletter
43
+
44
+ # Generate newsletter from trending data (basic)
45
+ npm run newsletter:generate 2026-03-22
46
+
47
+ # Generate AI-enhanced newsletter (using writing agents)
48
+ npm run newsletter:generate:enhanced 2026-03-22
49
+
50
+ # Send newsletter
51
+ npm run newsletter:send <newsletter-id>
52
+
53
+ # Test send to a single email
54
+ npm run newsletter:test <newsletter-id> test@example.com
55
+ ```
56
+
57
+ ### Query Data
58
+
59
+ ```bash
60
+ # Interactive query mode
61
+ npm run query
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Architecture
67
+
68
+ ### Content Scraping Flow
69
+
70
+ ```
71
+ config/sources.json (configuration)
72
+
73
+ src/index.js (ContentScraper orchestrator)
74
+
75
+ src/fetchers/
76
+ ├── rss.js # 17 RSS feeds (OpenAI, Anthropic, Claude Blog, arXiv, etc.)
77
+ ├── reddit.js # 7 AI subreddits (min 100 upvotes)
78
+ ├── hackernews.js # AI-filtered stories (min 50 points)
79
+ └── linkedin.js # LinkedIn KOL posts via BrightData SERP
80
+
81
+ data/YYYY-MM-DD/*.json (daily output)
82
+ ```
83
+
84
+ ### Newsletter System
85
+
86
+ ```
87
+ src/newsletter/
88
+ ├── api/newsletter-service.js # Main service class
89
+ ├── models/
90
+ │ ├── subscriber.js # Subscriber data model
91
+ │ └── newsletter.js # Newsletter model with generateIterative()
92
+ ├── utils/
93
+ │ ├── email-sender.js # SMTP/SendGrid email sending
94
+ │ └── helpers.js # Helper functions
95
+ ├── cli.js # CLI interface
96
+ └── data/
97
+ ├── subscribers.json # Subscriber storage
98
+ └── newsletters.json # Newsletter storage
99
+ ```
100
+
101
+ ### Writing Agents (LangChain + Claude API)
102
+
103
+ ```
104
+ src/writing-agents/
105
+ ├── core/
106
+ │ └── iterative-workflow.js # Writer ↔ Critic orchestration loop
107
+ ├── agents/
108
+ │ ├── writer-agent.js # Generates/revises newsletter content
109
+ │ └── critic-agent.js # Reviews content with 1-10 quality scoring
110
+ ├── models/
111
+ │ ├── draft.js # Tracks newsletter versions and history
112
+ │ └── critique.js # Stores feedback and quality metrics
113
+ ├── utils/
114
+ │ ├── prompt-templates.js # Current agent prompts
115
+ │ ├── prompt-templates-improved.js # Improved prompts (ready for testing)
116
+ │ └── cache-manager.js # Response caching for cost reduction
117
+ └── config/
118
+ └── agent-config.js # Configuration & cost estimation
119
+ ```
120
+
121
+ **Writing Agent Workflow:**
122
+ ```
123
+ Articles (from trending.json)
124
+
125
+ Writer Agent → Initial Draft
126
+
127
+ Critic Agent → Review & Score (6 criteria: accuracy, clarity, value, completeness, voice, citations)
128
+
129
+ Quality Check → Is score ≥ threshold? (default: 8/10)
130
+ ↓ if NO: Writer Agent → Revise based on critique → loop back to Critic
131
+ ↓ if YES: Finalize → Newsletter created
132
+ ```
133
+
134
+ ---
135
+
136
+ ## Configuration
137
+
138
+ ### Source Configuration (`config/sources.json`)
139
+
140
+ - **rssFeeds**: 17 RSS sources with categories (ai-news, ai-research, company-news, etc.)
141
+ - **trendingSources.reddit**: 7 AI subreddits with minScore and maxAge filters
142
+ - **trendingSources.hackernews**: AI keyword filtering with minPoints threshold
143
+ - **linkedin**: KOL profiles file path, batch size, enrichment settings
144
+
145
+ ### Environment Variables (`.env`)
146
+
147
+ ```bash
148
+ # Content scraping
149
+ BRIGHTDATA_API_KEY=... # Required for LinkedIn scraping
150
+ BRIGHTDATA_ZONE=mcp_unlocker # BrightData zone name
151
+
152
+ # Newsletter
153
+ EMAIL_PROVIDER=smtp # smtp, sendgrid, or console
154
+ SMTP_HOST=smtp.gmail.com
155
+ SMTP_PORT=587
156
+ SMTP_SECURE=false
157
+ SMTP_USER=your-email@gmail.com
158
+ SMTP_PASS=your-app-password
159
+
160
+ # Writing Agents
161
+ WRITING_AGENTS_ENABLED=true
162
+ WRITING_AGENTS_MAX_ITERATIONS=3
163
+ WRITING_AGENTS_QUALITY_THRESHOLD=8
164
+ WRITING_AGENTS_MODEL=claude-3-5-sonnet-20241022
165
+ ANTHROPIC_API_KEY=sk-ant-... # Required for writing agents
166
+ ```
167
+
168
+ ---
169
+
170
+ ## Data Models
171
+
172
+ ### Scraped Item Structure
173
+
174
+ ```json
175
+ {
176
+ "id": "unique_id",
177
+ "source": "rss|reddit|hackernews|linkedin",
178
+ "sourceName": "Source Name",
179
+ "category": "ai-news|company-news|research",
180
+ "title": "Article Title",
181
+ "url": "https://...",
182
+ "summary": "Content summary...",
183
+ "content": "Full content...",
184
+ "pubDate": "2026-03-22T10:00:00Z",
185
+ "age_hours": 24,
186
+ "engagement": {
187
+ "upvotes": 4500,
188
+ "comments": 823
189
+ },
190
+ "scraped_at": "2026-03-22T10:00:00Z"
191
+ }
192
+ ```
193
+
194
+ ### Trending Item Structure
195
+
196
+ ```json
197
+ {
198
+ "rank": 1,
199
+ "score": 4750,
200
+ "sources": ["reddit", "hackernews"],
201
+ "title": "Article Title",
202
+ "url": "https://...",
203
+ "summary": "...",
204
+ "keywords": ["AI", "GPT"],
205
+ "engagement": { "upvotes": 4500, "comments": 823 }
206
+ }
207
+ ```
208
+
209
+ ---
210
+
211
+ ## Important Design Decisions
212
+
213
+ 1. **No Vercel Deployment** - This is a Node.js CLI tool, not a web application. Uses `dotenv` for config, not Vercel env vars.
214
+
215
+ 2. **Daily Data Organization** - Each scrape run creates a new `data/YYYY-MM-DD/` folder. Output files are overwritten on re-scrape for the same date.
216
+
217
+ 3. **Quality Threshold for Writing Agents** - Default is 8/10. Agent loop terminates early when threshold is met to save API costs. Max iterations: 3.
218
+
219
+ 4. **BrightData for LinkedIn** - LinkedIn scraping requires BrightData SERP API with zone `mcp_unlocker`. KOL profiles loaded from external path.
220
+
221
+ 5. **Cache Manager** - Writing agents cache LLM responses for 7 days (TTL) to reduce API costs. ~70% cache hit rate expected.
222
+
223
+ 6. **ES Modules** - Project uses `"type": "module"` in package.json. All imports use `.js` extensions.
224
+
225
+ ---
226
+
227
+ ## Testing Writing Agents
228
+
229
+ The project has two prompt template sets for writing agents:
230
+
231
+ 1. **Current**: `prompt-templates.js` - Basic prompts
232
+ 2. **Improved**: `prompt-templates-improved.js` - Enhanced with examples, rubrics, anti-patterns
233
+
234
+ To test improved prompts, update the import in agents/writer-agent.js and agents/critic-agent.js from `'./prompt-templates.js'` to `'./prompt-templates-improved.js'`.
235
+
236
+ ---
237
+
238
+ ## Key Dependencies
239
+
240
+ - `rss-parser` - RSS feed parsing
241
+ - `axios` - HTTP requests
242
+ - `cheerio` - HTML parsing
243
+ - `nodemailer` - Email sending
244
+ - `@langchain/anthropic` - LangChain integration with Claude
245
+ - `@langchain/langgraph` - Agent workflow orchestration
246
+
247
+ ---
248
+
249
+ ## LinkedIn KOL Configuration
250
+
251
+ LinkedIn KOL (Key Opinion Leader) profiles are stored in an external file:
252
+ ```
253
+ /home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json
254
+ ```
255
+
256
+ This path is configured in `config/sources.json` under `linkedin.profilesFile`.