@aikeytake/social-automation 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +39 -0
- package/CLAUDE.md +256 -0
- package/CURRENT_CAPABILITIES.md +493 -0
- package/DATA_ORGANIZATION.md +416 -0
- package/IMPLEMENTATION_SUMMARY.md +287 -0
- package/INSTRUCTIONS.md +316 -0
- package/MASTER_PLAN.md +1096 -0
- package/README.md +280 -0
- package/config/sources.json +296 -0
- package/package.json +37 -0
- package/src/cli.js +197 -0
- package/src/fetchers/api.js +232 -0
- package/src/fetchers/hackernews.js +86 -0
- package/src/fetchers/linkedin.js +400 -0
- package/src/fetchers/linkedin_browser.js +167 -0
- package/src/fetchers/reddit.js +77 -0
- package/src/fetchers/rss.js +50 -0
- package/src/fetchers/twitter.js +194 -0
- package/src/index.js +346 -0
- package/src/query.js +316 -0
- package/src/utils/logger.js +74 -0
- package/src/utils/storage.js +134 -0
- package/src/writing-agents/QUICK-REFERENCE.md +111 -0
- package/src/writing-agents/WRITING-SKILLS-IMPROVEMENTS.md +273 -0
- package/src/writing-agents/utils/prompt-templates-improved.js +665 -0
|
@@ -0,0 +1,493 @@
|
|
|
1
|
+
# 🎯 Current Capabilities: AI Agent Research Tool
|
|
2
|
+
|
|
3
|
+
**Project:** Content Research & Aggregation Tool
|
|
4
|
+
**Purpose:** Feed AI agents with organized, scraped content
|
|
5
|
+
**Date:** 2025-03-06
|
|
6
|
+
**Status:** Scraping Works, Agent Interface To Build
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## ✅ What This Tool Does Right Now
|
|
11
|
+
|
|
12
|
+
### 1. 📥 Data Scraping (FULLY FUNCTIONAL)
|
|
13
|
+
|
|
14
|
+
The tool can automatically scrape content from multiple sources:
|
|
15
|
+
|
|
16
|
+
#### RSS Feeds (10+ sources)
|
|
17
|
+
- ✅ TechCrunch AI
|
|
18
|
+
- ✅ OpenAI Blog
|
|
19
|
+
- ✅ Anthropic News
|
|
20
|
+
- ✅ Google AI Blog
|
|
21
|
+
- ✅ arXiv AI/ML papers
|
|
22
|
+
- ✅ And more...
|
|
23
|
+
|
|
24
|
+
**Use case:** Get official announcements and blog posts
|
|
25
|
+
|
|
26
|
+
#### Reddit (7 subreddits)
|
|
27
|
+
- ✅ r/MachineLearning
|
|
28
|
+
- ✅ r/artificial
|
|
29
|
+
- ✅ r/ArtificialIntelligence
|
|
30
|
+
- ✅ r/deeplearning
|
|
31
|
+
- ✅ r/OpenAI
|
|
32
|
+
- ✅ r/LocalLLaMA
|
|
33
|
+
- ✅ r/singularity
|
|
34
|
+
|
|
35
|
+
**Data collected:**
|
|
36
|
+
- Upvotes, comments, awards
|
|
37
|
+
- Post titles and content
|
|
38
|
+
- Engagement metrics
|
|
39
|
+
- Age tracking
|
|
40
|
+
|
|
41
|
+
**Use case:** Find what the AI community is discussing
|
|
42
|
+
|
|
43
|
+
#### Hacker News
|
|
44
|
+
- ✅ Top AI-related stories
|
|
45
|
+
- ✅ Best stories (filtered by keywords)
|
|
46
|
+
|
|
47
|
+
**Keywords tracked:**
|
|
48
|
+
AI, machine learning, GPT, LLM, OpenAI, Anthropic, Google AI
|
|
49
|
+
|
|
50
|
+
**Use case:** Discover tech industry trends
|
|
51
|
+
|
|
52
|
+
#### LinkedIn (Optional)
|
|
53
|
+
- ✅ KOL (Key Opinion Leader) tracking
|
|
54
|
+
- ⚠️ Requires BrightData configuration
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
### 2. 🔄 Data Organization (MOSTLY FUNCTIONAL)
|
|
59
|
+
|
|
60
|
+
The tool organizes scraped data:
|
|
61
|
+
|
|
62
|
+
#### Automatic Processing
|
|
63
|
+
- ✅ **Deduplication**: Same story across sources → one entry
|
|
64
|
+
- ✅ **Scoring**: By engagement (upvotes, points, comments)
|
|
65
|
+
- ✅ **Filtering**: By age and relevance
|
|
66
|
+
- ✅ **Timestamping**: Track when content was created
|
|
67
|
+
|
|
68
|
+
#### Data Storage
|
|
69
|
+
- ✅ JSON format (easy to read/query)
|
|
70
|
+
- ✅ Structured data model
|
|
71
|
+
- ✅ Source tracking
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
### 3. 🔍 Query Interface (PARTIALLY FUNCTIONAL)
|
|
76
|
+
|
|
77
|
+
#### Current Status
|
|
78
|
+
- ✅ Raw data is saved to JSON files
|
|
79
|
+
- ✅ Can read files directly
|
|
80
|
+
- ❌ No CLI query interface yet
|
|
81
|
+
- ❌ No MCP server yet
|
|
82
|
+
|
|
83
|
+
#### Manual Query (Current Workflow)
|
|
84
|
+
```bash
|
|
85
|
+
# 1. Scrape data
|
|
86
|
+
npm run fetch
|
|
87
|
+
|
|
88
|
+
# 2. Read the JSON files
|
|
89
|
+
cat data/queue/*.json | jq '.[] | select(.score > 100)'
|
|
90
|
+
|
|
91
|
+
# 3. Or write a simple script
|
|
92
|
+
node -e "const data = require('./data/queue/latest.json'); console.log(data.filter(item => item.score > 100));"
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## ❌ What This Tool Does NOT Do (By Design)
|
|
98
|
+
|
|
99
|
+
### NOT a Content Generator
|
|
100
|
+
- ❌ Does NOT write posts
|
|
101
|
+
- ❌ Does NOT create content
|
|
102
|
+
- ❌ Does NOT generate images
|
|
103
|
+
- ❌ Does NOT translate languages
|
|
104
|
+
|
|
105
|
+
**Why?** AI agents (like Claude) are better at this. The tool just feeds data to agents.
|
|
106
|
+
|
|
107
|
+
### NOT a Publisher
|
|
108
|
+
- ❌ Does NOT post to LinkedIn
|
|
109
|
+
- ❌ Does NOT post to Twitter
|
|
110
|
+
- ❌ Does NOT post to Facebook
|
|
111
|
+
- ❌ Does NOT schedule posts
|
|
112
|
+
|
|
113
|
+
**Why?** Human oversight is important for quality control.
|
|
114
|
+
|
|
115
|
+
### NOT an Automator
|
|
116
|
+
- ❌ Does NOT run on schedules
|
|
117
|
+
- ❌ Does NOT auto-post anything
|
|
118
|
+
- ❌ Does NOT have workflows
|
|
119
|
+
|
|
120
|
+
**Why?** You run it when you need data. Simple and on-demand.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## 🎯 How to Use This Tool (Right Now)
|
|
125
|
+
|
|
126
|
+
### For AI Agents
|
|
127
|
+
|
|
128
|
+
**Step 1: Scrape Data**
|
|
129
|
+
```bash
|
|
130
|
+
cd /home/vankhoa/projects/social-automation
|
|
131
|
+
npm run fetch
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Step 2: Access Data**
|
|
135
|
+
```bash
|
|
136
|
+
# View latest data
|
|
137
|
+
cat data/queue/*.json | jq
|
|
138
|
+
|
|
139
|
+
# Or read programmatically
|
|
140
|
+
const fs = require('fs');
|
|
141
|
+
const data = fs.readFileSync('data/queue/latest.json', 'utf8');
|
|
142
|
+
const items = JSON.parse(data);
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
**Step 3: AI Agent Creates Content**
|
|
146
|
+
```
|
|
147
|
+
Agent: "I'll read the scraped data and create engaging posts"
|
|
148
|
+
- Reads JSON files
|
|
149
|
+
- Identifies trending topics
|
|
150
|
+
- Writes creative content
|
|
151
|
+
- Adds business angle
|
|
152
|
+
- Generates translations
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### For Humans
|
|
156
|
+
|
|
157
|
+
**Step 1: Scrape Data**
|
|
158
|
+
```bash
|
|
159
|
+
npm run fetch
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
**Step 2: Review Data**
|
|
163
|
+
```bash
|
|
164
|
+
# View what was scraped
|
|
165
|
+
ls -la data/queue/
|
|
166
|
+
|
|
167
|
+
# Read specific file
|
|
168
|
+
cat data/queue/1772813421296-1jiel2fs7.json | jq
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
**Step 3: Share with AI**
|
|
172
|
+
```
|
|
173
|
+
"Here's the scraped data. Can you write a LinkedIn post about the GPT-5 rumors with a French translation?"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## 📊 Data Structure
|
|
179
|
+
|
|
180
|
+
### What You Get
|
|
181
|
+
|
|
182
|
+
```json
|
|
183
|
+
{
|
|
184
|
+
"id": "1772813421296-1jiel2fs7",
|
|
185
|
+
"source": "reddit",
|
|
186
|
+
"sourceName": "r/MachineLearning",
|
|
187
|
+
"title": "GPT-5 Release Confirmed by OpenAI",
|
|
188
|
+
"link": "https://reddit.com/r/MachineLearning/comments/...",
|
|
189
|
+
"content": "Full post content here...",
|
|
190
|
+
"pubDate": "2025-03-06T10:00:00Z",
|
|
191
|
+
"queuedAt": "2025-03-06T10:05:00Z",
|
|
192
|
+
"status": "queued",
|
|
193
|
+
"metadata": {
|
|
194
|
+
"upvotes": 4500,
|
|
195
|
+
"comments": 823,
|
|
196
|
+
"score": 95
|
|
197
|
+
}
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### Fields Available
|
|
202
|
+
|
|
203
|
+
- `id`: Unique identifier
|
|
204
|
+
- `source`: Where it came from (rss, reddit, hackernews)
|
|
205
|
+
- `sourceName`: Specific source name
|
|
206
|
+
- `title`: Headline
|
|
207
|
+
- `link`: Original URL
|
|
208
|
+
- `content`: Full content or excerpt
|
|
209
|
+
- `pubDate`: When published
|
|
210
|
+
- `metadata`: Engagement metrics
|
|
211
|
+
- `status`: Processing status
|
|
212
|
+
|
|
213
|
+
---
|
|
214
|
+
|
|
215
|
+
## 🚀 Example Workflow
|
|
216
|
+
|
|
217
|
+
### Complete Workflow Example
|
|
218
|
+
|
|
219
|
+
**Scenario:** You want to create a LinkedIn post about trending AI news
|
|
220
|
+
|
|
221
|
+
**Step 1: Scrape**
|
|
222
|
+
```bash
|
|
223
|
+
npm run fetch
|
|
224
|
+
|
|
225
|
+
# Output:
|
|
226
|
+
# 📰 Fetching from RSS feeds...
|
|
227
|
+
# ✅ RSS: 25 items
|
|
228
|
+
# 📱 Fetching from Reddit...
|
|
229
|
+
# ✅ Reddit: 47 items
|
|
230
|
+
# 📰 Fetching from Hacker News...
|
|
231
|
+
# ✅ Hacker News: 12 items
|
|
232
|
+
# ✅ Total: 84 items queued
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Step 2: Review**
|
|
236
|
+
```bash
|
|
237
|
+
# View high-score items
|
|
238
|
+
cat data/queue/*.json | jq '[.[] | select(.metadata.score > 80)] | sort_by(.metadata.score) | reverse | .[0:5]'
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
**Step 3: Feed to AI Agent**
|
|
242
|
+
```
|
|
243
|
+
"Here are the top 5 trending AI stories from the last 24 hours:
|
|
244
|
+
|
|
245
|
+
1. GPT-5 Release Confirmed (4500 upvotes)
|
|
246
|
+
[content from JSON]
|
|
247
|
+
|
|
248
|
+
2. Google's New Gemini Model (3200 upvotes)
|
|
249
|
+
[content from JSON]
|
|
250
|
+
|
|
251
|
+
...
|
|
252
|
+
|
|
253
|
+
Please create:
|
|
254
|
+
1. An engaging LinkedIn post about the #1 story
|
|
255
|
+
2. Add a business angle for local companies
|
|
256
|
+
3. Include a French translation
|
|
257
|
+
4. Add relevant hashtags"
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**Step 4: AI Agent Creates**
|
|
261
|
+
```
|
|
262
|
+
Agent generates:
|
|
263
|
+
- Compelling hook
|
|
264
|
+
- Key insights
|
|
265
|
+
- Business implications
|
|
266
|
+
- Call-to-action
|
|
267
|
+
- French translation
|
|
268
|
+
- Hashtags
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
**Step 5: Human Reviews & Posts**
|
|
272
|
+
- Review the AI-generated content
|
|
273
|
+
- Make edits if needed
|
|
274
|
+
- Post manually to LinkedIn
|
|
275
|
+
- Engage with comments
|
|
276
|
+
|
|
277
|
+
---
|
|
278
|
+
|
|
279
|
+
## 🎯 What Makes This Tool Useful
|
|
280
|
+
|
|
281
|
+
### 1. Time Saving
|
|
282
|
+
- No more manual browsing of 10+ sources
|
|
283
|
+
- All data in one place
|
|
284
|
+
- Already scored by engagement
|
|
285
|
+
|
|
286
|
+
### 2. Quality Focus
|
|
287
|
+
- High-engagement content surfaced first
|
|
288
|
+
- Multiple sources for verification
|
|
289
|
+
- Community-curated (Reddit upvotes)
|
|
290
|
+
|
|
291
|
+
### 3. Agent-Friendly
|
|
292
|
+
- Clean JSON structure
|
|
293
|
+
- Easy to parse programmatically
|
|
294
|
+
- Rich metadata for filtering
|
|
295
|
+
|
|
296
|
+
### 4. Flexibility
|
|
297
|
+
- Run when you need data
|
|
298
|
+
- Query what you want
|
|
299
|
+
- Feed to any AI agent
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## 📋 Current Commands
|
|
304
|
+
|
|
305
|
+
### Scraping Commands
|
|
306
|
+
```bash
|
|
307
|
+
# Fetch from all sources
|
|
308
|
+
npm run fetch
|
|
309
|
+
|
|
310
|
+
# Original command (still works)
|
|
311
|
+
node src/cli.js fetch
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### Viewing Data
|
|
315
|
+
```bash
|
|
316
|
+
# View queue (basic)
|
|
317
|
+
npm run queue
|
|
318
|
+
|
|
319
|
+
# View specific file
|
|
320
|
+
cat data/queue/FILE_ID.json | jq
|
|
321
|
+
|
|
322
|
+
# View all queued items
|
|
323
|
+
ls -la data/queue/
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## 🛠️ Configuration
|
|
329
|
+
|
|
330
|
+
### Edit Sources
|
|
331
|
+
```bash
|
|
332
|
+
nano config/sources.json
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
### Add RSS Feed
|
|
336
|
+
```json
|
|
337
|
+
{
|
|
338
|
+
"rssFeeds": [
|
|
339
|
+
{
|
|
340
|
+
"name": "My AI Blog",
|
|
341
|
+
"url": "https://example.com/feed.xml",
|
|
342
|
+
"category": "ai-news",
|
|
343
|
+
"enabled": true
|
|
344
|
+
}
|
|
345
|
+
]
|
|
346
|
+
}
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
### Add Subreddit
|
|
350
|
+
```json
|
|
351
|
+
{
|
|
352
|
+
"trendingSources": {
|
|
353
|
+
"reddit": {
|
|
354
|
+
"subreddits": [
|
|
355
|
+
"MachineLearning",
|
|
356
|
+
"artificial",
|
|
357
|
+
"MyNewSubreddit"
|
|
358
|
+
]
|
|
359
|
+
}
|
|
360
|
+
}
|
|
361
|
+
}
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## 🎯 Next Steps to Build
|
|
367
|
+
|
|
368
|
+
### To Make This Tool Complete:
|
|
369
|
+
|
|
370
|
+
1. **Simple Query CLI** (Priority)
|
|
371
|
+
```bash
|
|
372
|
+
npm run query --trending
|
|
373
|
+
npm run query --topic=GPT
|
|
374
|
+
npm run query --fresh=6h
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
2. **MCP Server** (For AI Agents)
|
|
378
|
+
- Expose scraping tools
|
|
379
|
+
- Expose query tools
|
|
380
|
+
- Standard interface for agents
|
|
381
|
+
|
|
382
|
+
3. **Better Query Interface**
|
|
383
|
+
- Filter by score
|
|
384
|
+
- Filter by topic
|
|
385
|
+
- Filter by age
|
|
386
|
+
- Sort options
|
|
387
|
+
|
|
388
|
+
4. **Documentation for Agents**
|
|
389
|
+
- How to query data
|
|
390
|
+
- Data structure reference
|
|
391
|
+
- Example workflows
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## 📊 What the Tool Provides
|
|
396
|
+
|
|
397
|
+
### Input (What You Give)
|
|
398
|
+
- ⚙️ Configuration (sources, filters)
|
|
399
|
+
- 📝 List of sources to scrape
|
|
400
|
+
|
|
401
|
+
### Output (What You Get)
|
|
402
|
+
- 📦 JSON files with structured data
|
|
403
|
+
- 🎯 Scored by engagement
|
|
404
|
+
- 🏷️ Categorized by source
|
|
405
|
+
- ⏰ Timestamped for freshness
|
|
406
|
+
|
|
407
|
+
### The Middle (What the Tool Does)
|
|
408
|
+
- 🔍 Scrapes multiple sources
|
|
409
|
+
- 🔄 Deduplicates content
|
|
410
|
+
- 📊 Scores by engagement
|
|
411
|
+
- 💾 Saves as JSON
|
|
412
|
+
|
|
413
|
+
### NOT Included (By Design)
|
|
414
|
+
- ❌ Content writing (AI agents do this)
|
|
415
|
+
- ❌ Publishing (humans do this)
|
|
416
|
+
- ❌ Scheduling (run on-demand)
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## 🎯 Real-World Use Cases
|
|
421
|
+
|
|
422
|
+
### Use Case 1: Daily Social Media
|
|
423
|
+
```
|
|
424
|
+
1. Run: npm run fetch
|
|
425
|
+
2. Review: Check top 5 stories
|
|
426
|
+
3. Ask AI: "Write a post about story #3"
|
|
427
|
+
4. Post: Manual publish to LinkedIn
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
### Use Case 2: Newsletter
|
|
431
|
+
```
|
|
432
|
+
1. Run: npm run fetch
|
|
433
|
+
2. Query: Get all stories from last week
|
|
434
|
+
3. Ask AI: "Create a roundup newsletter"
|
|
435
|
+
4. Send: Manual email to list
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
### Use Case 3: Research
|
|
439
|
+
```
|
|
440
|
+
1. Run: npm run fetch
|
|
441
|
+
2. Query: Find all GPT-related stories
|
|
442
|
+
3. Ask AI: "Summarize GPT developments"
|
|
443
|
+
4. Use: Inform strategy or clients
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
---
|
|
447
|
+
|
|
448
|
+
## 🚀 Quick Start
|
|
449
|
+
|
|
450
|
+
```bash
|
|
451
|
+
# 1. Install
|
|
452
|
+
cd /home/vankhoa/projects/social-automation
|
|
453
|
+
npm install
|
|
454
|
+
|
|
455
|
+
# 2. Configure (optional)
|
|
456
|
+
nano config/sources.json
|
|
457
|
+
|
|
458
|
+
# 3. Scrape
|
|
459
|
+
npm run fetch
|
|
460
|
+
|
|
461
|
+
# 4. View data
|
|
462
|
+
ls data/queue/
|
|
463
|
+
cat data/queue/*.json | jq
|
|
464
|
+
|
|
465
|
+
# 5. Use with AI agent
|
|
466
|
+
"Read the scraped data and create a post about..."
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
## 📝 Summary
|
|
472
|
+
|
|
473
|
+
**This tool = Data scraper + organizer**
|
|
474
|
+
|
|
475
|
+
**What it does:**
|
|
476
|
+
- ✅ Scrapes content from multiple sources
|
|
477
|
+
- ✅ Organizes and scores it
|
|
478
|
+
- ✅ Saves as structured JSON
|
|
479
|
+
- ✅ Ready for AI agents to consume
|
|
480
|
+
|
|
481
|
+
**What you do:**
|
|
482
|
+
- Run the scraper when needed
|
|
483
|
+
- Feed data to AI agents
|
|
484
|
+
- Let agents create content
|
|
485
|
+
- Review and publish manually
|
|
486
|
+
|
|
487
|
+
**Simple, focused, effective.**
|
|
488
|
+
|
|
489
|
+
---
|
|
490
|
+
|
|
491
|
+
**Status:** ✅ Scraping Works | 🔨 Query Interface To Build
|
|
492
|
+
**Focus:** Research Tool, Not Publishing Platform
|
|
493
|
+
**Perfect for:** AI agents who need fresh, organized content data
|