@aikeytake/social-automation 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +39 -0
- package/CLAUDE.md +256 -0
- package/CURRENT_CAPABILITIES.md +493 -0
- package/DATA_ORGANIZATION.md +416 -0
- package/IMPLEMENTATION_SUMMARY.md +287 -0
- package/INSTRUCTIONS.md +316 -0
- package/MASTER_PLAN.md +1096 -0
- package/README.md +280 -0
- package/config/sources.json +296 -0
- package/package.json +37 -0
- package/src/cli.js +197 -0
- package/src/fetchers/api.js +232 -0
- package/src/fetchers/hackernews.js +86 -0
- package/src/fetchers/linkedin.js +400 -0
- package/src/fetchers/linkedin_browser.js +167 -0
- package/src/fetchers/reddit.js +77 -0
- package/src/fetchers/rss.js +50 -0
- package/src/fetchers/twitter.js +194 -0
- package/src/index.js +346 -0
- package/src/query.js +316 -0
- package/src/utils/logger.js +74 -0
- package/src/utils/storage.js +134 -0
- package/src/writing-agents/QUICK-REFERENCE.md +111 -0
- package/src/writing-agents/WRITING-SKILLS-IMPROVEMENTS.md +273 -0
- package/src/writing-agents/utils/prompt-templates-improved.js +665 -0
package/INSTRUCTIONS.md
ADDED
|
@@ -0,0 +1,316 @@
|
|
|
1
|
+
# Instructions: Social Automation — Content Research Tool
|
|
2
|
+
|
|
3
|
+
**Project:** Content scraping & aggregation for AI agents
|
|
4
|
+
**Last Updated:** 2026-03-07
|
|
5
|
+
**Version:** 2.0
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What This Tool Does
|
|
10
|
+
|
|
11
|
+
Scrapes AI/tech content from multiple sources and stores it as structured JSON. AI agents then read this data to create posts, research topics, or track industry trends.
|
|
12
|
+
|
|
13
|
+
**Sources:**
|
|
14
|
+
- 17 RSS feeds (including OpenAI, Anthropic, Claude, Google AI, arXiv, etc.)
|
|
15
|
+
- Reddit (7 AI subreddits)
|
|
16
|
+
- Hacker News (AI-filtered, 50+ points)
|
|
17
|
+
- LinkedIn KOL posts (via BrightData SERP)
|
|
18
|
+
|
|
19
|
+
**Output:** Structured JSON files in `data/YYYY-MM-DD/`
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## The Only Command
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
npm run scrape
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
That's it. Run this whenever you need fresh data.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Quick Start
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
cd /home/vankhoa/projects/social-automation
|
|
37
|
+
npm run scrape
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Expected output:**
|
|
41
|
+
```
|
|
42
|
+
🔄 Starting scrape cycle...
|
|
43
|
+
📰 Fetching from RSS feeds...
|
|
44
|
+
✅ RSS: 35 items
|
|
45
|
+
📱 Fetching from Reddit...
|
|
46
|
+
✅ Reddit: 17 items
|
|
47
|
+
📰 Fetching from Hacker News...
|
|
48
|
+
✅ Hacker News: 8 items
|
|
49
|
+
💼 Fetching from LinkedIn...
|
|
50
|
+
✅ LinkedIn: 12 items
|
|
51
|
+
✅ Generated: trending.json (20 items)
|
|
52
|
+
✅ Generated: all.json (72 items)
|
|
53
|
+
✨ Scraping complete: 72 total items
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Output Files
|
|
59
|
+
|
|
60
|
+
All files saved to `data/YYYY-MM-DD/` (today's date folder):
|
|
61
|
+
|
|
62
|
+
| File | Description |
|
|
63
|
+
|------|-------------|
|
|
64
|
+
| `trending.json` | Top 20 items ranked by engagement score — **start here** |
|
|
65
|
+
| `all.json` | Every item from all sources combined |
|
|
66
|
+
| `rss.json` | RSS feed items only |
|
|
67
|
+
| `reddit.json` | Reddit posts only |
|
|
68
|
+
| `hackernews.json` | Hacker News stories only |
|
|
69
|
+
| `linkedin.json` | LinkedIn KOL posts only |
|
|
70
|
+
|
|
71
|
+
### trending.json structure
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"date": "2026-03-07",
|
|
76
|
+
"total_items": 20,
|
|
77
|
+
"items": [
|
|
78
|
+
{
|
|
79
|
+
"rank": 1,
|
|
80
|
+
"score": 4500,
|
|
81
|
+
"title": "GPT-5 Released by OpenAI",
|
|
82
|
+
"url": "https://...",
|
|
83
|
+
"summary": "...",
|
|
84
|
+
"sources": ["reddit", "r/MachineLearning"],
|
|
85
|
+
"engagement": { "upvotes": 4500, "comments": 823 }
|
|
86
|
+
}
|
|
87
|
+
]
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### all.json structure
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"date": "2026-03-07",
|
|
96
|
+
"total_items": 72,
|
|
97
|
+
"sources": { "rss": 35, "reddit": 17, "hackernews": 8, "linkedin": 12 },
|
|
98
|
+
"items": [
|
|
99
|
+
{
|
|
100
|
+
"id": "abc123",
|
|
101
|
+
"source": "rss",
|
|
102
|
+
"sourceName": "Claude Blog",
|
|
103
|
+
"category": "company-news",
|
|
104
|
+
"title": "Common workflow patterns for AI agents",
|
|
105
|
+
"link": "https://claude.com/blog/...",
|
|
106
|
+
"content": "...",
|
|
107
|
+
"summary": "...",
|
|
108
|
+
"pubDate": "2026-03-05T00:00:00Z",
|
|
109
|
+
"author": "Claude Blog",
|
|
110
|
+
"age_hours": 48,
|
|
111
|
+
"scraped_at": "2026-03-07T10:00:00Z"
|
|
112
|
+
}
|
|
113
|
+
]
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Configuration
|
|
120
|
+
|
|
121
|
+
### Environment Variables (`.env`)
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
ANTHROPIC_API_KEY=sk-ant-... # Claude API key
|
|
125
|
+
BRIGHTDATA_API_KEY=... # Required for LinkedIn scraping
|
|
126
|
+
BRIGHTDATA_ZONE=mcp_unlocker # BrightData zone (default: mcp_unlocker)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Sources (`config/sources.json`)
|
|
130
|
+
|
|
131
|
+
**RSS Feeds — 17 sources:**
|
|
132
|
+
|
|
133
|
+
| Name | Category |
|
|
134
|
+
|------|----------|
|
|
135
|
+
| TechCrunch AI | ai-news |
|
|
136
|
+
| The Gradient | ai-research |
|
|
137
|
+
| MIT Technology Review AI | tech-news |
|
|
138
|
+
| OpenAI Blog | company-news |
|
|
139
|
+
| Anthropic Blog | company-news |
|
|
140
|
+
| Claude Blog | company-news |
|
|
141
|
+
| Google AI Blog | company-news |
|
|
142
|
+
| DeepMind Blog | research |
|
|
143
|
+
| Hugging Face Blog | ml-frameworks |
|
|
144
|
+
| Meta Engineering | company-engineering |
|
|
145
|
+
| Netflix Tech Blog | company-engineering |
|
|
146
|
+
| AWS Machine Learning Blog | cloud-ai |
|
|
147
|
+
| Microsoft AI Blog | company-news |
|
|
148
|
+
| NVIDIA Technical Blog | company-engineering |
|
|
149
|
+
| LinkedIn Engineering | company-engineering |
|
|
150
|
+
| arXiv AI (cs.AI) | research-papers |
|
|
151
|
+
| arXiv Machine Learning (cs.LG) | research-papers |
|
|
152
|
+
|
|
153
|
+
**Reddit — 7 subreddits** (min 100 upvotes, max 24h old):
|
|
154
|
+
`MachineLearning`, `artificial`, `ArtificialIntelligence`, `deeplearning`, `OpenAI`, `LocalLLaMA`, `singularity`
|
|
155
|
+
|
|
156
|
+
**Hacker News** — keyword-filtered, min 50 points
|
|
157
|
+
|
|
158
|
+
**LinkedIn** — top 20 KOLs from `workspace/marketing/linkedin_kol_clean.json`, scraped via BrightData Google SERP
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Adding/Changing Sources
|
|
163
|
+
|
|
164
|
+
### Add an RSS feed
|
|
165
|
+
|
|
166
|
+
Edit `config/sources.json` under `rssFeeds`:
|
|
167
|
+
|
|
168
|
+
```json
|
|
169
|
+
{
|
|
170
|
+
"name": "My Blog",
|
|
171
|
+
"url": "https://example.com/feed.xml",
|
|
172
|
+
"category": "ai-news",
|
|
173
|
+
"enabled": true
|
|
174
|
+
}
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Disable an RSS feed
|
|
178
|
+
|
|
179
|
+
Set `"enabled": false` for that feed.
|
|
180
|
+
|
|
181
|
+
### Change LinkedIn KOL limit
|
|
182
|
+
|
|
183
|
+
```json
|
|
184
|
+
"linkedin": {
|
|
185
|
+
"limit": 30
|
|
186
|
+
}
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Add a Reddit subreddit
|
|
190
|
+
|
|
191
|
+
Add to `trendingSources.reddit.subreddits` array.
|
|
192
|
+
|
|
193
|
+
### Add a Hacker News keyword
|
|
194
|
+
|
|
195
|
+
Add to `trendingSources.hackernews.keywords` array.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Reading the Data
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
# Today's trending items (titles only)
|
|
203
|
+
cat data/$(date +%Y-%m-%d)/trending.json | jq '.items[] | "\(.rank). \(.title)"'
|
|
204
|
+
|
|
205
|
+
# All items from LinkedIn
|
|
206
|
+
cat data/$(date +%Y-%m-%d)/linkedin.json | jq '.items[] | {author: .sourceName, title, link}'
|
|
207
|
+
|
|
208
|
+
# Filter all items by keyword
|
|
209
|
+
cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.title | test("GPT|Claude|Anthropic"; "i"))]'
|
|
210
|
+
|
|
211
|
+
# Items from a specific source
|
|
212
|
+
cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.source == "rss") | select(.sourceName == "Claude Blog")]'
|
|
213
|
+
|
|
214
|
+
# Sort by age (newest first)
|
|
215
|
+
cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.age_hours != null)] | sort_by(.age_hours)'
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
## Common AI Agent Workflows
|
|
221
|
+
|
|
222
|
+
### Daily content briefing
|
|
223
|
+
|
|
224
|
+
```
|
|
225
|
+
Read data/$(date +%Y-%m-%d)/trending.json and summarize the top 5 AI stories from today.
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
### LinkedIn post creation
|
|
229
|
+
|
|
230
|
+
```
|
|
231
|
+
Read data/$(date +%Y-%m-%d)/trending.json and create a LinkedIn post in French about the most impactful AI story. Target audience: local business owners in southern France.
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Competitive intelligence
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
Read data/$(date +%Y-%m-%d)/all.json and summarize everything related to Anthropic and Claude published in the last 48 hours.
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### KOL content analysis
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
Read data/$(date +%Y-%m-%d)/linkedin.json and identify the main themes that AI thought leaders are discussing today.
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## Troubleshooting
|
|
249
|
+
|
|
250
|
+
### LinkedIn returns 0 items
|
|
251
|
+
|
|
252
|
+
1. Check logs for the specific error: `cat logs/*.log | grep -i linkedin`
|
|
253
|
+
2. Verify the KOL file exists: `ls /home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json`
|
|
254
|
+
3. Confirm the `mcp_unlocker` zone exists in your BrightData account dashboard
|
|
255
|
+
4. Check the KOL file path in `config/sources.json` → `linkedin.profilesFile`
|
|
256
|
+
|
|
257
|
+
### An RSS feed fails
|
|
258
|
+
|
|
259
|
+
Normal — some feeds go down. The scraper logs the error and continues. Check:
|
|
260
|
+
```bash
|
|
261
|
+
cat logs/*.log | grep ERROR
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### No data folder for today
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
npm run scrape
|
|
268
|
+
ls data/$(date +%Y-%m-%d)/
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### Stale data
|
|
272
|
+
|
|
273
|
+
Just re-run — output is overwritten each time for the same date:
|
|
274
|
+
```bash
|
|
275
|
+
npm run scrape
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## Project Structure
|
|
281
|
+
|
|
282
|
+
```
|
|
283
|
+
social-automation/
|
|
284
|
+
├── src/
|
|
285
|
+
│ ├── fetchers/
|
|
286
|
+
│ │ ├── rss.js # 17 RSS feeds
|
|
287
|
+
│ │ ├── reddit.js # 7 AI subreddits
|
|
288
|
+
│ │ ├── hackernews.js # HN top stories
|
|
289
|
+
│ │ └── linkedin.js # LinkedIn via BrightData SERP
|
|
290
|
+
│ ├── utils/
|
|
291
|
+
│ │ └── logger.js
|
|
292
|
+
│ ├── cli.js
|
|
293
|
+
│ └── index.js # Main orchestrator (npm run scrape)
|
|
294
|
+
├── config/
|
|
295
|
+
│ └── sources.json # All source configuration
|
|
296
|
+
├── data/
|
|
297
|
+
│ └── YYYY-MM-DD/
|
|
298
|
+
│ ├── trending.json # Top 20 ranked items
|
|
299
|
+
│ ├── all.json # All items combined
|
|
300
|
+
│ ├── rss.json
|
|
301
|
+
│ ├── reddit.json
|
|
302
|
+
│ ├── hackernews.json
|
|
303
|
+
│ └── linkedin.json
|
|
304
|
+
├── logs/ # Scrape logs
|
|
305
|
+
├── .env # API keys
|
|
306
|
+
├── .env.example # Template
|
|
307
|
+
└── package.json
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## Summary
|
|
313
|
+
|
|
314
|
+
1. Run `npm run scrape`
|
|
315
|
+
2. Read `data/YYYY-MM-DD/trending.json` (or `all.json` for full depth)
|
|
316
|
+
3. Feed the data to an AI agent to create content
|