@aikeytake/social-automation 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,316 @@
1
+ # Instructions: Social Automation — Content Research Tool
2
+
3
+ **Project:** Content scraping & aggregation for AI agents
4
+ **Last Updated:** 2026-03-07
5
+ **Version:** 2.0
6
+
7
+ ---
8
+
9
+ ## What This Tool Does
10
+
11
+ Scrapes AI/tech content from multiple sources and stores it as structured JSON. AI agents then read this data to create posts, research topics, or track industry trends.
12
+
13
+ **Sources:**
14
+ - 17 RSS feeds (including OpenAI, Anthropic, Claude, Google AI, arXiv, etc.)
15
+ - Reddit (7 AI subreddits)
16
+ - Hacker News (AI-filtered, 50+ points)
17
+ - LinkedIn KOL posts (via BrightData SERP)
18
+
19
+ **Output:** Structured JSON files in `data/YYYY-MM-DD/`
20
+
21
+ ---
22
+
23
+ ## The Only Command
24
+
25
+ ```bash
26
+ npm run scrape
27
+ ```
28
+
29
+ That's it. Run this whenever you need fresh data.
30
+
31
+ ---
32
+
33
+ ## Quick Start
34
+
35
+ ```bash
36
+ cd /home/vankhoa/projects/social-automation
37
+ npm run scrape
38
+ ```
39
+
40
+ **Expected output:**
41
+ ```
42
+ 🔄 Starting scrape cycle...
43
+ 📰 Fetching from RSS feeds...
44
+ ✅ RSS: 35 items
45
+ 📱 Fetching from Reddit...
46
+ ✅ Reddit: 17 items
47
+ 📰 Fetching from Hacker News...
48
+ ✅ Hacker News: 8 items
49
+ 💼 Fetching from LinkedIn...
50
+ ✅ LinkedIn: 12 items
51
+ ✅ Generated: trending.json (20 items)
52
+ ✅ Generated: all.json (72 items)
53
+ ✨ Scraping complete: 72 total items
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Output Files
59
+
60
+ All files saved to `data/YYYY-MM-DD/` (today's date folder):
61
+
62
+ | File | Description |
63
+ |------|-------------|
64
+ | `trending.json` | Top 20 items ranked by engagement score — **start here** |
65
+ | `all.json` | Every item from all sources combined |
66
+ | `rss.json` | RSS feed items only |
67
+ | `reddit.json` | Reddit posts only |
68
+ | `hackernews.json` | Hacker News stories only |
69
+ | `linkedin.json` | LinkedIn KOL posts only |
70
+
71
+ ### trending.json structure
72
+
73
+ ```json
74
+ {
75
+ "date": "2026-03-07",
76
+ "total_items": 20,
77
+ "items": [
78
+ {
79
+ "rank": 1,
80
+ "score": 4500,
81
+ "title": "GPT-5 Released by OpenAI",
82
+ "url": "https://...",
83
+ "summary": "...",
84
+ "sources": ["reddit", "r/MachineLearning"],
85
+ "engagement": { "upvotes": 4500, "comments": 823 }
86
+ }
87
+ ]
88
+ }
89
+ ```
90
+
91
+ ### all.json structure
92
+
93
+ ```json
94
+ {
95
+ "date": "2026-03-07",
96
+ "total_items": 72,
97
+ "sources": { "rss": 35, "reddit": 17, "hackernews": 8, "linkedin": 12 },
98
+ "items": [
99
+ {
100
+ "id": "abc123",
101
+ "source": "rss",
102
+ "sourceName": "Claude Blog",
103
+ "category": "company-news",
104
+ "title": "Common workflow patterns for AI agents",
105
+ "link": "https://claude.com/blog/...",
106
+ "content": "...",
107
+ "summary": "...",
108
+ "pubDate": "2026-03-05T00:00:00Z",
109
+ "author": "Claude Blog",
110
+ "age_hours": 48,
111
+ "scraped_at": "2026-03-07T10:00:00Z"
112
+ }
113
+ ]
114
+ }
115
+ ```
116
+
117
+ ---
118
+
119
+ ## Configuration
120
+
121
+ ### Environment Variables (`.env`)
122
+
123
+ ```bash
124
+ ANTHROPIC_API_KEY=sk-ant-... # Claude API key
125
+ BRIGHTDATA_API_KEY=... # Required for LinkedIn scraping
126
+ BRIGHTDATA_ZONE=mcp_unlocker # BrightData zone (default: mcp_unlocker)
127
+ ```
128
+
129
+ ### Sources (`config/sources.json`)
130
+
131
+ **RSS Feeds — 17 sources:**
132
+
133
+ | Name | Category |
134
+ |------|----------|
135
+ | TechCrunch AI | ai-news |
136
+ | The Gradient | ai-research |
137
+ | MIT Technology Review AI | tech-news |
138
+ | OpenAI Blog | company-news |
139
+ | Anthropic Blog | company-news |
140
+ | Claude Blog | company-news |
141
+ | Google AI Blog | company-news |
142
+ | DeepMind Blog | research |
143
+ | Hugging Face Blog | ml-frameworks |
144
+ | Meta Engineering | company-engineering |
145
+ | Netflix Tech Blog | company-engineering |
146
+ | AWS Machine Learning Blog | cloud-ai |
147
+ | Microsoft AI Blog | company-news |
148
+ | NVIDIA Technical Blog | company-engineering |
149
+ | LinkedIn Engineering | company-engineering |
150
+ | arXiv AI (cs.AI) | research-papers |
151
+ | arXiv Machine Learning (cs.LG) | research-papers |
152
+
153
+ **Reddit — 7 subreddits** (min 100 upvotes, max 24h old):
154
+ `MachineLearning`, `artificial`, `ArtificialIntelligence`, `deeplearning`, `OpenAI`, `LocalLLaMA`, `singularity`
155
+
156
+ **Hacker News** — keyword-filtered, min 50 points
157
+
158
+ **LinkedIn** — top 20 KOLs from `workspace/marketing/linkedin_kol_clean.json`, scraped via BrightData Google SERP
159
+
160
+ ---
161
+
162
+ ## Adding/Changing Sources
163
+
164
+ ### Add an RSS feed
165
+
166
+ Edit `config/sources.json` under `rssFeeds`:
167
+
168
+ ```json
169
+ {
170
+ "name": "My Blog",
171
+ "url": "https://example.com/feed.xml",
172
+ "category": "ai-news",
173
+ "enabled": true
174
+ }
175
+ ```
176
+
177
+ ### Disable an RSS feed
178
+
179
+ Set `"enabled": false` for that feed.
180
+
181
+ ### Change LinkedIn KOL limit
182
+
183
+ ```json
184
+ "linkedin": {
185
+ "limit": 30
186
+ }
187
+ ```
188
+
189
+ ### Add a Reddit subreddit
190
+
191
+ Add to `trendingSources.reddit.subreddits` array.
192
+
193
+ ### Add a Hacker News keyword
194
+
195
+ Add to `trendingSources.hackernews.keywords` array.
196
+
197
+ ---
198
+
199
+ ## Reading the Data
200
+
201
+ ```bash
202
+ # Today's trending items (titles only)
203
+ cat data/$(date +%Y-%m-%d)/trending.json | jq '.items[] | "\(.rank). \(.title)"'
204
+
205
+ # All items from LinkedIn
206
+ cat data/$(date +%Y-%m-%d)/linkedin.json | jq '.items[] | {author: .sourceName, title, link}'
207
+
208
+ # Filter all items by keyword
209
+ cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.title | test("GPT|Claude|Anthropic"; "i"))]'
210
+
211
+ # Items from a specific source
212
+ cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.source == "rss") | select(.sourceName == "Claude Blog")]'
213
+
214
+ # Sort by age (newest first)
215
+ cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.age_hours != null)] | sort_by(.age_hours)'
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Common AI Agent Workflows
221
+
222
+ ### Daily content briefing
223
+
224
+ ```
225
+ Read data/$(date +%Y-%m-%d)/trending.json and summarize the top 5 AI stories from today.
226
+ ```
227
+
228
+ ### LinkedIn post creation
229
+
230
+ ```
231
+ Read data/$(date +%Y-%m-%d)/trending.json and create a LinkedIn post in French about the most impactful AI story. Target audience: local business owners in southern France.
232
+ ```
233
+
234
+ ### Competitive intelligence
235
+
236
+ ```
237
+ Read data/$(date +%Y-%m-%d)/all.json and summarize everything related to Anthropic and Claude published in the last 48 hours.
238
+ ```
239
+
240
+ ### KOL content analysis
241
+
242
+ ```
243
+ Read data/$(date +%Y-%m-%d)/linkedin.json and identify the main themes that AI thought leaders are discussing today.
244
+ ```
245
+
246
+ ---
247
+
248
+ ## Troubleshooting
249
+
250
+ ### LinkedIn returns 0 items
251
+
252
+ 1. Check logs for the specific error: `cat logs/*.log | grep -i linkedin`
253
+ 2. Verify the KOL file exists: `ls /home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json`
254
+ 3. Confirm the `mcp_unlocker` zone exists in your BrightData account dashboard
255
+ 4. Check the KOL file path in `config/sources.json` → `linkedin.profilesFile`
256
+
257
+ ### An RSS feed fails
258
+
259
+ Normal — some feeds go down. The scraper logs the error and continues. Check:
260
+ ```bash
261
+ cat logs/*.log | grep ERROR
262
+ ```
263
+
264
+ ### No data folder for today
265
+
266
+ ```bash
267
+ npm run scrape
268
+ ls data/$(date +%Y-%m-%d)/
269
+ ```
270
+
271
+ ### Stale data
272
+
273
+ Just re-run — output is overwritten each time for the same date:
274
+ ```bash
275
+ npm run scrape
276
+ ```
277
+
278
+ ---
279
+
280
+ ## Project Structure
281
+
282
+ ```
283
+ social-automation/
284
+ ├── src/
285
+ │ ├── fetchers/
286
+ │ │ ├── rss.js # 17 RSS feeds
287
+ │ │ ├── reddit.js # 7 AI subreddits
288
+ │ │ ├── hackernews.js # HN top stories
289
+ │ │ └── linkedin.js # LinkedIn via BrightData SERP
290
+ │ ├── utils/
291
+ │ │ └── logger.js
292
+ │ ├── cli.js
293
+ │ └── index.js # Main orchestrator (npm run scrape)
294
+ ├── config/
295
+ │ └── sources.json # All source configuration
296
+ ├── data/
297
+ │ └── YYYY-MM-DD/
298
+ │ ├── trending.json # Top 20 ranked items
299
+ │ ├── all.json # All items combined
300
+ │ ├── rss.json
301
+ │ ├── reddit.json
302
+ │ ├── hackernews.json
303
+ │ └── linkedin.json
304
+ ├── logs/ # Scrape logs
305
+ ├── .env # API keys
306
+ ├── .env.example # Template
307
+ └── package.json
308
+ ```
309
+
310
+ ---
311
+
312
+ ## Summary
313
+
314
+ 1. Run `npm run scrape`
315
+ 2. Read `data/YYYY-MM-DD/trending.json` (or `all.json` for full depth)
316
+ 3. Feed the data to an AI agent to create content