@aikeytake/social-automation 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,493 @@
1
+ # 🎯 Current Capabilities: AI Agent Research Tool
2
+
3
+ **Project:** Content Research & Aggregation Tool
4
+ **Purpose:** Feed AI agents with organized, scraped content
5
+ **Date:** 2025-03-06
6
+ **Status:** Scraping Works, Agent Interface To Build
7
+
8
+ ---
9
+
10
+ ## ✅ What This Tool Does Right Now
11
+
12
+ ### 1. 📥 Data Scraping (FULLY FUNCTIONAL)
13
+
14
+ The tool can automatically scrape content from multiple sources:
15
+
16
+ #### RSS Feeds (10+ sources)
17
+ - ✅ TechCrunch AI
18
+ - ✅ OpenAI Blog
19
+ - ✅ Anthropic News
20
+ - ✅ Google AI Blog
21
+ - ✅ arXiv AI/ML papers
22
+ - ✅ And more...
23
+
24
+ **Use case:** Get official announcements and blog posts
25
+
26
+ #### Reddit (7 subreddits)
27
+ - ✅ r/MachineLearning
28
+ - ✅ r/artificial
29
+ - ✅ r/ArtificialIntelligence
30
+ - ✅ r/deeplearning
31
+ - ✅ r/OpenAI
32
+ - ✅ r/LocalLLaMA
33
+ - ✅ r/singularity
34
+
35
+ **Data collected:**
36
+ - Upvotes, comments, awards
37
+ - Post titles and content
38
+ - Engagement metrics
39
+ - Age tracking
40
+
41
+ **Use case:** Find what the AI community is discussing
42
+
43
+ #### Hacker News
44
+ - ✅ Top AI-related stories
45
+ - ✅ Best stories (filtered by keywords)
46
+
47
+ **Keywords tracked:**
48
+ AI, machine learning, GPT, LLM, OpenAI, Anthropic, Google AI
49
+
50
+ **Use case:** Discover tech industry trends
51
+
52
+ #### LinkedIn (Optional)
53
+ - ✅ KOL (Key Opinion Leader) tracking
54
+ - ⚠️ Requires BrightData configuration
55
+
56
+ ---
57
+
58
+ ### 2. 🔄 Data Organization (MOSTLY FUNCTIONAL)
59
+
60
+ The tool organizes scraped data:
61
+
62
+ #### Automatic Processing
63
+ - ✅ **Deduplication**: Same story across sources → one entry
64
+ - ✅ **Scoring**: By engagement (upvotes, points, comments)
65
+ - ✅ **Filtering**: By age and relevance
66
+ - ✅ **Timestamping**: Track when content was created
67
+
68
+ #### Data Storage
69
+ - ✅ JSON format (easy to read/query)
70
+ - ✅ Structured data model
71
+ - ✅ Source tracking
72
+
73
+ ---
74
+
75
+ ### 3. 🔍 Query Interface (PARTIALLY FUNCTIONAL)
76
+
77
+ #### Current Status
78
+ - ✅ Raw data is saved to JSON files
79
+ - ✅ Can read files directly
80
+ - ❌ No CLI query interface yet
81
+ - ❌ No MCP server yet
82
+
83
+ #### Manual Query (Current Workflow)
84
+ ```bash
85
+ # 1. Scrape data
86
+ npm run fetch
87
+
88
+ # 2. Read the JSON files
89
+ cat data/queue/*.json | jq '.[] | select(.score > 100)'
90
+
91
+ # 3. Or write a simple script
92
+ node -e "const data = require('./data/queue/latest.json'); console.log(data.filter(item => item.score > 100));"
93
+ ```
94
+
95
+ ---
96
+
97
+ ## ❌ What This Tool Does NOT Do (By Design)
98
+
99
+ ### NOT a Content Generator
100
+ - ❌ Does NOT write posts
101
+ - ❌ Does NOT create content
102
+ - ❌ Does NOT generate images
103
+ - ❌ Does NOT translate languages
104
+
105
+ **Why?** AI agents (like Claude) are better at this. The tool just feeds data to agents.
106
+
107
+ ### NOT a Publisher
108
+ - ❌ Does NOT post to LinkedIn
109
+ - ❌ Does NOT post to Twitter
110
+ - ❌ Does NOT post to Facebook
111
+ - ❌ Does NOT schedule posts
112
+
113
+ **Why?** Human oversight is important for quality control.
114
+
115
+ ### NOT an Automator
116
+ - ❌ Does NOT run on schedules
117
+ - ❌ Does NOT auto-post anything
118
+ - ❌ Does NOT have workflows
119
+
120
+ **Why?** You run it when you need data. Simple and on-demand.
121
+
122
+ ---
123
+
124
+ ## 🎯 How to Use This Tool (Right Now)
125
+
126
+ ### For AI Agents
127
+
128
+ **Step 1: Scrape Data**
129
+ ```bash
130
+ cd /home/vankhoa/projects/social-automation
131
+ npm run fetch
132
+ ```
133
+
134
+ **Step 2: Access Data**
135
+ ```bash
136
+ # View latest data
137
+ cat data/queue/*.json | jq
138
+
139
+ # Or read programmatically
140
+ const fs = require('fs');
141
+ const data = fs.readFileSync('data/queue/latest.json', 'utf8');
142
+ const items = JSON.parse(data);
143
+ ```
144
+
145
+ **Step 3: AI Agent Creates Content**
146
+ ```
147
+ Agent: "I'll read the scraped data and create engaging posts"
148
+ - Reads JSON files
149
+ - Identifies trending topics
150
+ - Writes creative content
151
+ - Adds business angle
152
+ - Generates translations
153
+ ```
154
+
155
+ ### For Humans
156
+
157
+ **Step 1: Scrape Data**
158
+ ```bash
159
+ npm run fetch
160
+ ```
161
+
162
+ **Step 2: Review Data**
163
+ ```bash
164
+ # View what was scraped
165
+ ls -la data/queue/
166
+
167
+ # Read specific file
168
+ cat data/queue/1772813421296-1jiel2fs7.json | jq
169
+ ```
170
+
171
+ **Step 3: Share with AI**
172
+ ```
173
+ "Here's the scraped data. Can you write a LinkedIn post about the GPT-5 rumors with a French translation?"
174
+ ```
175
+
176
+ ---
177
+
178
+ ## 📊 Data Structure
179
+
180
+ ### What You Get
181
+
182
+ ```json
183
+ {
184
+ "id": "1772813421296-1jiel2fs7",
185
+ "source": "reddit",
186
+ "sourceName": "r/MachineLearning",
187
+ "title": "GPT-5 Release Confirmed by OpenAI",
188
+ "link": "https://reddit.com/r/MachineLearning/comments/...",
189
+ "content": "Full post content here...",
190
+ "pubDate": "2025-03-06T10:00:00Z",
191
+ "queuedAt": "2025-03-06T10:05:00Z",
192
+ "status": "queued",
193
+ "metadata": {
194
+ "upvotes": 4500,
195
+ "comments": 823,
196
+ "score": 95
197
+ }
198
+ }
199
+ ```
200
+
201
+ ### Fields Available
202
+
203
+ - `id`: Unique identifier
204
+ - `source`: Where it came from (rss, reddit, hackernews)
205
+ - `sourceName`: Specific source name
206
+ - `title`: Headline
207
+ - `link`: Original URL
208
+ - `content`: Full content or excerpt
209
+ - `pubDate`: When published
210
+ - `metadata`: Engagement metrics
211
+ - `status`: Processing status
212
+
213
+ ---
214
+
215
+ ## 🚀 Example Workflow
216
+
217
+ ### Complete Workflow Example
218
+
219
+ **Scenario:** You want to create a LinkedIn post about trending AI news
220
+
221
+ **Step 1: Scrape**
222
+ ```bash
223
+ npm run fetch
224
+
225
+ # Output:
226
+ # 📰 Fetching from RSS feeds...
227
+ # ✅ RSS: 25 items
228
+ # 📱 Fetching from Reddit...
229
+ # ✅ Reddit: 47 items
230
+ # 📰 Fetching from Hacker News...
231
+ # ✅ Hacker News: 12 items
232
+ # ✅ Total: 84 items queued
233
+ ```
234
+
235
+ **Step 2: Review**
236
+ ```bash
237
+ # View high-score items
238
+ cat data/queue/*.json | jq '[.[] | select(.metadata.score > 80)] | sort_by(.metadata.score) | reverse | .[0:5]'
239
+ ```
240
+
241
+ **Step 3: Feed to AI Agent**
242
+ ```
243
+ "Here are the top 5 trending AI stories from the last 24 hours:
244
+
245
+ 1. GPT-5 Release Confirmed (4500 upvotes)
246
+ [content from JSON]
247
+
248
+ 2. Google's New Gemini Model (3200 upvotes)
249
+ [content from JSON]
250
+
251
+ ...
252
+
253
+ Please create:
254
+ 1. An engaging LinkedIn post about the #1 story
255
+ 2. Add a business angle for local companies
256
+ 3. Include a French translation
257
+ 4. Add relevant hashtags"
258
+ ```
259
+
260
+ **Step 4: AI Agent Creates**
261
+ ```
262
+ Agent generates:
263
+ - Compelling hook
264
+ - Key insights
265
+ - Business implications
266
+ - Call-to-action
267
+ - French translation
268
+ - Hashtags
269
+ ```
270
+
271
+ **Step 5: Human Reviews & Posts**
272
+ - Review the AI-generated content
273
+ - Make edits if needed
274
+ - Post manually to LinkedIn
275
+ - Engage with comments
276
+
277
+ ---
278
+
279
+ ## 🎯 What Makes This Tool Useful
280
+
281
+ ### 1. Time Saving
282
+ - No more manual browsing of 10+ sources
283
+ - All data in one place
284
+ - Already scored by engagement
285
+
286
+ ### 2. Quality Focus
287
+ - High-engagement content surfaced first
288
+ - Multiple sources for verification
289
+ - Community-curated (Reddit upvotes)
290
+
291
+ ### 3. Agent-Friendly
292
+ - Clean JSON structure
293
+ - Easy to parse programmatically
294
+ - Rich metadata for filtering
295
+
296
+ ### 4. Flexibility
297
+ - Run when you need data
298
+ - Query what you want
299
+ - Feed to any AI agent
300
+
301
+ ---
302
+
303
+ ## 📋 Current Commands
304
+
305
+ ### Scraping Commands
306
+ ```bash
307
+ # Fetch from all sources
308
+ npm run fetch
309
+
310
+ # Original command (still works)
311
+ node src/cli.js fetch
312
+ ```
313
+
314
+ ### Viewing Data
315
+ ```bash
316
+ # View queue (basic)
317
+ npm run queue
318
+
319
+ # View specific file
320
+ cat data/queue/FILE_ID.json | jq
321
+
322
+ # View all queued items
323
+ ls -la data/queue/
324
+ ```
325
+
326
+ ---
327
+
328
+ ## 🛠️ Configuration
329
+
330
+ ### Edit Sources
331
+ ```bash
332
+ nano config/sources.json
333
+ ```
334
+
335
+ ### Add RSS Feed
336
+ ```json
337
+ {
338
+ "rssFeeds": [
339
+ {
340
+ "name": "My AI Blog",
341
+ "url": "https://example.com/feed.xml",
342
+ "category": "ai-news",
343
+ "enabled": true
344
+ }
345
+ ]
346
+ }
347
+ ```
348
+
349
+ ### Add Subreddit
350
+ ```json
351
+ {
352
+ "trendingSources": {
353
+ "reddit": {
354
+ "subreddits": [
355
+ "MachineLearning",
356
+ "artificial",
357
+ "MyNewSubreddit"
358
+ ]
359
+ }
360
+ }
361
+ }
362
+ ```
363
+
364
+ ---
365
+
366
+ ## 🎯 Next Steps to Build
367
+
368
+ ### To Make This Tool Complete:
369
+
370
+ 1. **Simple Query CLI** (Priority)
371
+ ```bash
372
+ npm run query --trending
373
+ npm run query --topic=GPT
374
+ npm run query --fresh=6h
375
+ ```
376
+
377
+ 2. **MCP Server** (For AI Agents)
378
+ - Expose scraping tools
379
+ - Expose query tools
380
+ - Standard interface for agents
381
+
382
+ 3. **Better Query Interface**
383
+ - Filter by score
384
+ - Filter by topic
385
+ - Filter by age
386
+ - Sort options
387
+
388
+ 4. **Documentation for Agents**
389
+ - How to query data
390
+ - Data structure reference
391
+ - Example workflows
392
+
393
+ ---
394
+
395
+ ## 📊 What the Tool Provides
396
+
397
+ ### Input (What You Give)
398
+ - ⚙️ Configuration (sources, filters)
399
+ - 📝 List of sources to scrape
400
+
401
+ ### Output (What You Get)
402
+ - 📦 JSON files with structured data
403
+ - 🎯 Scored by engagement
404
+ - 🏷️ Categorized by source
405
+ - ⏰ Timestamped for freshness
406
+
407
+ ### The Middle (What the Tool Does)
408
+ - 🔍 Scrapes multiple sources
409
+ - 🔄 Deduplicates content
410
+ - 📊 Scores by engagement
411
+ - 💾 Saves as JSON
412
+
413
+ ### NOT Included (By Design)
414
+ - ❌ Content writing (AI agents do this)
415
+ - ❌ Publishing (humans do this)
416
+ - ❌ Scheduling (run on-demand)
417
+
418
+ ---
419
+
420
+ ## 🎯 Real-World Use Cases
421
+
422
+ ### Use Case 1: Daily Social Media
423
+ ```
424
+ 1. Run: npm run fetch
425
+ 2. Review: Check top 5 stories
426
+ 3. Ask AI: "Write a post about story #3"
427
+ 4. Post: Manual publish to LinkedIn
428
+ ```
429
+
430
+ ### Use Case 2: Newsletter
431
+ ```
432
+ 1. Run: npm run fetch
433
+ 2. Query: Get all stories from last week
434
+ 3. Ask AI: "Create a roundup newsletter"
435
+ 4. Send: Manual email to list
436
+ ```
437
+
438
+ ### Use Case 3: Research
439
+ ```
440
+ 1. Run: npm run fetch
441
+ 2. Query: Find all GPT-related stories
442
+ 3. Ask AI: "Summarize GPT developments"
443
+ 4. Use: Inform strategy or clients
444
+ ```
445
+
446
+ ---
447
+
448
+ ## 🚀 Quick Start
449
+
450
+ ```bash
451
+ # 1. Install
452
+ cd /home/vankhoa/projects/social-automation
453
+ npm install
454
+
455
+ # 2. Configure (optional)
456
+ nano config/sources.json
457
+
458
+ # 3. Scrape
459
+ npm run fetch
460
+
461
+ # 4. View data
462
+ ls data/queue/
463
+ cat data/queue/*.json | jq
464
+
465
+ # 5. Use with AI agent
466
+ "Read the scraped data and create a post about..."
467
+ ```
468
+
469
+ ---
470
+
471
+ ## 📝 Summary
472
+
473
+ **This tool = Data scraper + organizer**
474
+
475
+ **What it does:**
476
+ - ✅ Scrapes content from multiple sources
477
+ - ✅ Organizes and scores it
478
+ - ✅ Saves as structured JSON
479
+ - ✅ Ready for AI agents to consume
480
+
481
+ **What you do:**
482
+ - Run the scraper when needed
483
+ - Feed data to AI agents
484
+ - Let agents create content
485
+ - Review and publish manually
486
+
487
+ **Simple, focused, effective.**
488
+
489
+ ---
490
+
491
+ **Status:** ✅ Scraping Works | 🔨 Query Interface To Build
492
+ **Focus:** Research Tool, Not Publishing Platform
493
+ **Perfect for:** AI agents who need fresh, organized content data