npm - @aikeytake/social-automation - Versions diffs - 2.0.0 - Mend

@aikeytake/social-automation 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.env.example +39 -0
package/CLAUDE.md +256 -0
package/CURRENT_CAPABILITIES.md +493 -0
package/DATA_ORGANIZATION.md +416 -0
package/IMPLEMENTATION_SUMMARY.md +287 -0
package/INSTRUCTIONS.md +316 -0
package/MASTER_PLAN.md +1096 -0
package/README.md +280 -0
package/config/sources.json +296 -0
package/package.json +37 -0
package/src/cli.js +197 -0
package/src/fetchers/api.js +232 -0
package/src/fetchers/hackernews.js +86 -0
package/src/fetchers/linkedin.js +400 -0
package/src/fetchers/linkedin_browser.js +167 -0
package/src/fetchers/reddit.js +77 -0
package/src/fetchers/rss.js +50 -0
package/src/fetchers/twitter.js +194 -0
package/src/index.js +346 -0
package/src/query.js +316 -0
package/src/utils/logger.js +74 -0
package/src/utils/storage.js +134 -0
package/src/writing-agents/QUICK-REFERENCE.md +111 -0
package/src/writing-agents/WRITING-SKILLS-IMPROVEMENTS.md +273 -0
package/src/writing-agents/utils/prompt-templates-improved.js +665 -0

package/README.md ADDED Viewed

@@ -0,0 +1,280 @@
+# Social Automation — Content Research Tool
+Content aggregation tool that scrapes AI news from multiple sources and stores structured JSON for AI agents to consume.
+## What It Does
+- Scrapes **17 RSS feeds** (TechCrunch, OpenAI, Anthropic, Claude Blog, Google AI, DeepMind, HuggingFace, arXiv, and more)
+- Scrapes **Reddit** (7 AI subreddits, top posts with 100+ upvotes)
+- Scrapes **Hacker News** (AI-related stories with 50+ points)
+- Scrapes **LinkedIn KOL posts** via BrightData SERP (top 20 KOLs from your list)
+- Outputs a `trending.json` with the top 20 ranked items
+- Everything saved as structured JSON for AI agents
+## Quick Start
+```bash
+cd /home/vankhoa/projects/social-automation
+npm install
+npm run scrape
+```
+## The Only Command You Need
+```bash
+npm run scrape
+```
+Output saved to `data/YYYY-MM-DD/`:
+| File | Contents |
+|------|----------|
+| `all.json` | All items from all sources combined |
+| `trending.json` | Top 20 items ranked by engagement score |
+| `rss.json` | All RSS feed items |
+| `reddit.json` | All Reddit posts |
+| `hackernews.json` | All Hacker News stories |
+| `linkedin.json` | LinkedIn KOL posts via BrightData |
+## Project Structure
+```
+social-automation/
+├── src/
+│   ├── fetchers/
+│   │   ├── rss.js          # 17 RSS feeds
+│   │   ├── reddit.js       # 7 AI subreddits
+│   │   ├── hackernews.js   # HN top stories
+│   │   └── linkedin.js     # LinkedIn KOL posts via BrightData SERP
+│   ├── utils/
+│   │   └── logger.js
+│   ├── cli.js
+│   └── index.js            # Main scraper
+├── config/
+│   └── sources.json        # All source configuration
+├── data/
+│   └── YYYY-MM-DD/         # Daily scraped output
+├── .env                    # API keys
+└── package.json
+```
+## Configuration
+### Environment Variables (`.env`)
+Already configured. Key variables:
+```bash
+BRIGHTDATA_API_KEY=...        # Used for LinkedIn KOL scraping
+BRIGHTDATA_ZONE=mcp_unlocker  # BrightData zone
+ANTHROPIC_API_KEY=...         # Claude API (for future AI processing)
+```
+### Sources (`config/sources.json`)
+**RSS Feeds (17 sources):**
+- TechCrunch AI, The Gradient, MIT Technology Review AI
+- OpenAI Blog, Anthropic Blog, **Claude Blog**
+- Google AI Blog, DeepMind Blog, Hugging Face Blog
+- Meta Engineering, Netflix Tech Blog, AWS ML Blog
+- Microsoft AI Blog, NVIDIA Blog, LinkedIn Engineering
+- arXiv AI (cs.AI), arXiv Machine Learning (cs.LG)
+**Reddit:** MachineLearning, artificial, ArtificialIntelligence, deeplearning, OpenAI, LocalLLaMA, singularity
+**Hacker News:** keyword-filtered (AI, LLM, GPT, Anthropic, etc.), 50+ points
+**LinkedIn:** top 20 KOLs from `workspace/marketing/linkedin_kol_clean.json`, scraped via BrightData SERP
+### Adding an RSS Feed
+Edit `config/sources.json`:
+```json
+{
+  "rssFeeds": [
+    {
+      "name": "My Blog",
+      "url": "https://example.com/feed.xml",
+      "category": "ai-news",
+      "enabled": true
+    }
+  ]
+}
+```
+### Adjusting LinkedIn KOL Limit
+Edit `config/sources.json`:
+```json
+{
+  "linkedin": {
+    "limit": 20
+  }
+}
+```
+## Reading the Data
+```bash
+# View today's trending items
+cat data/$(date +%Y-%m-%d)/trending.json | jq '.items[] | {rank, title, score}'
+# View all items from a specific source
+cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.source == "reddit")]'
+# Search by keyword
+cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.title | contains("GPT"))]'
+# View LinkedIn KOL posts
+cat data/$(date +%Y-%m-%d)/linkedin.json | jq '.items[]'
+```
+## Using with AI Agents
+Point the agent at today's data folder:
+```
+Read data/$(date +%Y-%m-%d)/trending.json and create a LinkedIn post about the top trending AI story.
+```
+Or for deeper research:
+```
+Read data/$(date +%Y-%m-%d)/all.json and summarize the most important AI developments from the last 24 hours.
+```
+## Browser-Based Sources (Twitter/X & LinkedIn Browser)
+Two sources use a real Chrome browser via Playwright to scrape without an API: **Twitter/X** and **LinkedIn Browser**. They share the same browser profile stored at `data/playwright-profile/`.
+### One-Time Setup
+Run the setup script once to log in and save the browser session:
+```bash
+npm run setup:twitter
+```
+This opens a real Chrome window. **Log in to both X and LinkedIn** in that window (they share the same profile). Once you're logged in to both, close the window — the session is saved automatically.
+> ⚠️ Use a **dedicated scraping account**, not your personal account. Sessions last several weeks. Re-run `npm run setup:twitter` when you see auth errors.
+---
+### Twitter / X
+**Enable in `config/sources.json`:**
+```json
+"trendingSources": {
+  "twitter": {
+    "enabled": true,
+    "accounts": ["AndrewYNg", "ylecun", "OpenAI", "AnthropicAI", "karpathy"],
+    "minLikes": 100,
+    "maxTweetsPerAccount": 5,
+    "maxAgeHours": 24,
+    "delayBetweenAccountsMs": 3000
+  }
+}
+```
+**Config options:**
+| Key | Description | Default |
+|-----|-------------|---------|
+| `accounts` | X handles to scrape (without `@`) | `[]` |
+| `minLikes` | Skip tweets below this like count | `0` |
+| `maxTweetsPerAccount` | Max tweets to fetch per account | `10` |
+| `maxAgeHours` | Only include tweets from last N hours | `24` |
+| `delayBetweenAccountsMs` | Base delay between accounts (ms) | `3000` |
+**Run:**
+```bash
+npm run test:twitter   # isolated test, prints results, no files written
+npm run scrape         # full pipeline
+```
+**How it works:**
+- Visits X home feed first, then searches for each account via the search box
+- Clicks the matching result to navigate to the profile
+- Scrolls the timeline and extracts top N tweets
+- Applies a random 20–30s delay between accounts to avoid rate limiting
+- Account visit order is randomised each run
+---
+### LinkedIn Browser
+Scrapes posts from LinkedIn profiles using direct URL navigation to their recent activity page.
+**Enable in `config/sources.json`:**
+```json
+"linkedin_browser": {
+  "enabled": true,
+  "accounts": ["julienchaumond", "another-slug"],
+  "maxPostsPerAccount": 5,
+  "maxAgeHours": 48,
+  "delayBetweenAccountsMs": 10000
+}
+```
+The `accounts` value is the LinkedIn profile slug — the part after `linkedin.com/in/`.
+**Config options:**
+| Key | Description | Default |
+|-----|-------------|---------|
+| `accounts` | LinkedIn profile slugs to scrape | `[]` |
+| `maxPostsPerAccount` | Max posts to fetch per account | `5` |
+| `maxAgeHours` | Only include posts from last N hours | `48` |
+| `delayBetweenAccountsMs` | Base delay between accounts (ms) | `10000` |
+**Run:**
+```bash
+npm run test:linkedin   # isolated test, prints results, no files written
+npm run scrape          # full pipeline
+```
+**How it works:**
+- Navigates directly to `linkedin.com/in/{slug}/recent-activity/all/`
+- Scrolls to load posts, extracts text, reactions, comments, and time
+- Post URL is constructed from LinkedIn's `data-urn` attribute
+- Account visit order is randomised each run
+---
+### Output files
+| File | Source |
+|------|--------|
+| `data/YYYY-MM-DD/twitter.json` | Twitter/X posts |
+| `data/YYYY-MM-DD/linkedin_browser.json` | LinkedIn browser posts |
+Both sources feed into `all.json` and `trending.json` automatically.
+---
+## Troubleshooting
+**LinkedIn returns 0 items:**
+- Check logs for BrightData errors: `cat logs/*.log | grep -i linkedin`
+- Confirm the KOL file exists: `ls /home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json`
+- The BrightData zone `mcp_unlocker` must exist in your BrightData account
+**RSS feed fails:**
+- Some feeds go down temporarily — the scraper skips them and continues
+- Check logs in `logs/` for specific feed errors
+**No data for today:**
+```bash
+# Run the scraper
+npm run scrape
+# Check if data folder was created
+ls data/$(date +%Y-%m-%d)/
+```

package/config/sources.json ADDED Viewed

@@ -0,0 +1,296 @@
+{
+  "rssFeeds": [
+    {
+      "name": "TechCrunch AI",
+      "url": "https://techcrunch.com/category/artificial-intelligence/feed/",
+      "category": "ai-news",
+      "enabled": true
+    },
+    {
+      "name": "The Gradient",
+      "url": "https://thegradient.pub/rss/",
+      "category": "ai-research",
+      "enabled": true
+    },
+    {
+      "name": "MIT Technology Review AI",
+      "url": "https://www.technologyreview.com/feed/",
+      "category": "tech-news",
+      "enabled": true
+    },
+    {
+      "name": "OpenAI Blog",
+      "url": "https://openai.com/blog/rss.xml",
+      "category": "company-news",
+      "enabled": true
+    },
+    {
+      "name": "Anthropic Blog",
+      "url": "https://raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_anthropic.xml",
+      "category": "company-news",
+      "enabled": true
+    },
+    {
+      "name": "Claude Blog",
+      "url": "https://raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_claude.xml",
+      "category": "company-news",
+      "enabled": true
+    },
+    {
+      "name": "Google AI Blog",
+      "url": "https://blog.google/technology/ai/rss/",
+      "category": "company-news",
+      "enabled": true
+    },
+    {
+      "name": "DeepMind Blog",
+      "url": "https://deepmind.google/blog/rss.xml",
+      "category": "research",
+      "enabled": true
+    },
+    {
+      "name": "Hugging Face Blog",
+      "url": "https://huggingface.co/blog/feed.xml",
+      "category": "ml-frameworks",
+      "enabled": true
+    },
+    {
+      "name": "Meta Engineering",
+      "url": "https://engineering.fb.com/feed/",
+      "category": "company-engineering",
+      "enabled": true
+    },
+    {
+      "name": "Netflix Tech Blog",
+      "url": "https://medium.com/feed/netflix-techblog",
+      "category": "company-engineering",
+      "enabled": true
+    },
+    {
+      "name": "AWS Machine Learning Blog",
+      "url": "https://aws.amazon.com/blogs/machine-learning/feed/",
+      "category": "cloud-ai",
+      "enabled": true
+    },
+    {
+      "name": "Microsoft AI Blog",
+      "url": "https://blogs.microsoft.com/ai/feed/",
+      "category": "company-news",
+      "enabled": true
+    },
+    {
+      "name": "NVIDIA Technical Blog",
+      "url": "https://blogs.nvidia.com/feed/",
+      "category": "company-engineering",
+      "enabled": true
+    },
+    {
+      "name": "LinkedIn Engineering",
+      "url": "https://engineering.linkedin.com/blog.rss",
+      "category": "company-engineering",
+      "enabled": true
+    },
+    {
+      "name": "arXiv AI",
+      "url": "https://rss.arxiv.org/rss/cs.AI",
+      "category": "research-papers",
+      "enabled": true
+    },
+    {
+      "name": "arXiv Machine Learning",
+      "url": "https://rss.arxiv.org/rss/cs.LG",
+      "category": "research-papers",
+      "enabled": true
+    }
+  ],
+  "linkedin_browser": {
+    "enabled": true,
+    "accounts": ["julienchaumond"],
+    "maxPostsPerAccount": 5,
+    "maxAgeHours": 48,
+    "delayBetweenAccountsMs": 10000
+  },
+  "apiSources": [
+    {
+      "id": "goodailist",
+      "name": "Good AI List",
+      "enabled": true,
+      "weight": 0.5,
+      "request": {
+        "url": "https://goodailist.com/api/repos",
+        "method": "GET",
+        "params": { "page": 1, "limit": 100, "sort": "star_1d", "order": "desc" }
+      },
+      "response": {
+        "itemsPath": "repos"
+      },
+      "mapping": {
+        "title": "repo",
+        "link": "https://github.com/{repo}",
+        "summary": "description",
+        "content": "description",
+        "author": { "field": "repo", "split": "/", "index": 0 },
+        "pubDate": "created_at",
+        "category": "category",
+        "tags": { "field": "keywords", "split": "," },
+        "engagement.upvotes": "star_1d",
+        "metadata.stars": "stars",
+        "metadata.star_7d": "star_7d",
+        "metadata.forks": "forks",
+        "metadata.language": "language"
+      }
+    },
+    {
+      "id": "producthunt",
+      "name": "Product Hunt",
+      "enabled": true,
+      "auth": {
+        "type": "oauth2_client_credentials",
+        "tokenUrl": "https://api.producthunt.com/v2/oauth/token",
+        "clientIdEnv": "PRODUCT_HUNT_API_KEY",
+        "clientSecretEnv": "PRODUCT_HUNT_API_SECRET"
+      },
+      "request": {
+        "url": "https://api.producthunt.com/v2/api/graphql",
+        "method": "POST",
+        "graphql": {
+          "query": "query($first: Int!, $order: PostsOrder!, $postedAfter: DateTime!) { posts(first: $first, order: $order, postedAfter: $postedAfter) { edges { node { name tagline description url website votesCount commentsCount createdAt topics { edges { node { name } } } user { name } } } } }",
+          "variables": { "first": 30, "order": "VOTES" }
+        },
+        "computedVariables": {
+          "postedAfter": { "type": "daysAgo", "days": 7 }
+        }
+      },
+      "response": {
+        "itemsPath": "data.posts.edges",
+        "itemUnwrap": "node"
+      },
+      "filter": {
+        "field": "votesCount",
+        "min": 50
+      },
+      "mapping": {
+        "title": "name",
+        "link": "url",
+        "summary": "tagline",
+        "content": ["tagline", "description"],
+        "author": { "path": "user.name" },
+        "pubDate": "createdAt",
+        "category": { "path": "topics.edges", "map": "node.name", "index": 0 },
+        "tags": { "path": "topics.edges", "map": "node.name" },
+        "engagement.upvotes": "votesCount",
+        "engagement.comments": "commentsCount",
+        "metadata.website": "website"
+      }
+    }
+  ],
+  "linkedin": {
+    "profilesFile": "/home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json",
+    "enabled": true,
+    "batchSize": 8,
+    "budgetPerRun": 25,
+    "checkIntervalHours": 24,
+    "timeRange": "w",
+    "resultsPerBatch": 10,
+    "enrichContent": true,
+    "enrichConcurrency": 5
+  },
+  "youtube": {
+    "channels": [
+      {
+        "name": "Andrej Karpathy",
+        "channelId": "UC之以A5_BH8q-8v6Fn4qF5A",
+        "enabled": false
+      },
+      {
+        "name": "Yannic Kilcher",
+        "channelId": "UC媒介ucH6r6tiKnM2LTC1cw",
+        "enabled": false
+      }
+    ],
+    "enabled": false
+  },
+  "keywords": {
+    "primary": [
+      "artificial intelligence",
+      "machine learning",
+      "deep learning",
+      "LLM",
+      "GPT",
+      "Claude",
+      "transformer",
+      "neural network",
+      "AGI",
+      "AI research"
+    ],
+    "secondary": [
+      "computer vision",
+      "NLP",
+      "reinforcement learning",
+      "diffusion model",
+      "multimodal",
+      "fine-tuning",
+      "RAG",
+      "agent",
+      "LangChain",
+      "vector database"
+    ]
+  },
+  "filtering": {
+    "minEngagementScore": 10,
+    "maxAgeHours": 48,
+    "deduplicationWindow": 72
+  },
+  "trendingSources": {
+    "reddit": {
+      "enabled": true,
+      "subreddits": [
+        "MachineLearning",
+        "artificial",
+        "ArtificialIntelligence",
+        "deeplearning",
+        "OpenAI",
+        "LocalLLaMA",
+        "singularity"
+      ],
+      "minScore": 100,
+      "maxAge": "24h"
+    },
+    "hackernews": {
+      "enabled": true,
+      "keywords": [
+        "AI",
+        "artificial intelligence",
+        "machine learning",
+        "deep learning",
+        "GPT",
+        "LLM",
+        "OpenAI",
+        "Anthropic",
+        "Google AI",
+        "neural network"
+      ],
+      "minPoints": 50
+    },
+    "twitter": {
+      "enabled": false,
+      "accounts": [
+        "AndrewYNg",
+        "ylecun",
+        "OpenAI",
+        "AnthropicAI",
+        "GoogleAI"
+      ],
+      "minLikes": 100,
+      "maxTweetsPerAccount": 5,
+      "maxAgeHours": 24,
+      "delayBetweenAccountsMs": 3000
+    }
+  },
+  "trendAnalysis": {
+    "minTrendScore": 70,
+    "sourceDiversity": true,
+    "engagementWeight": 0.6,
+    "recencyWeight": 0.4
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,37 @@
+{
+  "name": "@aikeytake/social-automation",
+  "version": "2.0.0",
+  "description": "Content research and aggregation tool for AI agents",
+  "main": "src/index.js",
+  "type": "module",
+  "scripts": {
+    "start": "node src/index.js scrape",
+    "scrape": "node src/index.js scrape",
+    "query": "node src/query.js",
+    "queue": "node src/cli.js queue",
+    "test": "node src/test.js",
+    "setup:twitter": "node src/setup/twitter-login.js",
+    "test:twitter": "node src/test/twitter.js",
+    "test:linkedin": "node src/test/linkedin.js"
+  },
+  "keywords": [
+    "social-media",
+    "content-aggregation",
+    "ai",
+    "research-tool",
+    "agent-tool"
+  ],
+  "author": "aikeytake",
+  "license": "MIT",
+  "dependencies": {
+    "@supabase/supabase-js": "^2.47.0",
+    "axios": "^1.7.9",
+    "cheerio": "^1.0.0",
+    "dotenv": "^16.4.7",
+    "playwright": "^1.58.2",
+    "rss-parser": "^3.13.0"
+  },
+  "devDependencies": {
+    "@types/node": "^22.13.5"
+  }
+}