npm - reelrecon - Versions diffs - 1.2.0 → 1.2.1 - Mend

reelrecon 1.2.0 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CLAUDE.md +1 -1
package/README.md +37 -12
package/ig_transcriber/pipeline.py +102 -16
package/mcp_server.py +2 -1
package/package.json +3 -3

package/CLAUDE.md CHANGED Viewed

@@ -131,6 +131,6 @@ Human-readable error details are also written to stderr.
 - Public profiles only.
 - Local audio uploads bypass Instagram entirely.
-- Instagram may rate-limit anonymous requests.
+- Instagram may rate-limit anonymous requests. Public reels work anonymously; if Instagram login-walls a public link, set `REELRECON_COOKIES_FILE` (cookies.txt export) or `REELRECON_COOKIES_FROM_BROWSER=chrome` to use your own session.
 - The wrapper prefers Python 3.11 when available to avoid `yt-dlp` Python 3.9 deprecation noise.
 - The wrapper prefers the repo-local `.venv` first when present.

package/README.md CHANGED Viewed

@@ -1,26 +1,26 @@
 <div align="center">
-# 🎬 ReelRecon
-### Reel reconnaissance for AI agents.
+<img src="https://raw.githubusercontent.com/4nw3rprod/ReelRecon/main/assets/hero.png" alt="ReelRecon — Reel reconnaissance for AI agents. Works with Claude, ChatGPT, Gemini, Hermes, OpenClaw, and any MCP-capable agent." width="820"/>
 **Transcribe and decode any public Instagram profile — hooks, CTAs, and script patterns — locally and for free.**
-**Give Claude, ChatGPT, Gemini, Hermes, OpenClaw — or any MCP-capable agent — the power to watch Instagram for you.**
-[![Python](https://img.shields.io/badge/python-3.11+-3776AB?logo=python&logoColor=white)](https://www.python.org/)
+[![npm](https://img.shields.io/npm/v/reelrecon?logo=npm&color=CB3837)](https://www.npmjs.com/package/reelrecon)
+[![Python](https://img.shields.io/badge/python-3.10+-3776AB?logo=python&logoColor=white)](https://www.python.org/)
 [![Whisper](https://img.shields.io/badge/transcription-OpenAI%20Whisper-74aa9c?logo=openai&logoColor=white)](https://github.com/openai/whisper)
 [![MCP](https://img.shields.io/badge/protocol-MCP%20native-8A2BE2)](https://modelcontextprotocol.io/)
-[![Agents](https://img.shields.io/badge/works%20with-Claude%20·%20ChatGPT%20·%20Gemini%20·%20Hermes%20·%20OpenClaw-blueviolet)](#-drop-it-into-your-agent-stack)
-[![Price](https://img.shields.io/badge/price-free-success)](#)
+[![License](https://img.shields.io/badge/license-MIT-success)](LICENSE)
 [![Privacy](https://img.shields.io/badge/runs-locally-orange)](#)
+```bash
+npx -y reelrecon
+```
 *Your agent can already write scripts. Now it can study the competition first:*
 *"Transcribe @competitor's latest 10 Reels and break down their hook formulas" — one tool call away.*
 [🤖 Agent Setup](#-drop-it-into-your-agent-stack) · [🚀 Quick Start](#-quick-start) · [🔍 Use Cases](#-what-your-agent-can-do-with-it) · [🧰 Tool Reference](#-mcp-tool-reference) · [🖥️ Web UI](#️-the-dashboard-for-humans)
-<img src="screen.png" alt="ReelRecon dashboard" width="850"/>
+<img src="https://raw.githubusercontent.com/4nw3rprod/ReelRecon/main/screen.png" alt="ReelRecon dashboard" width="850"/>
 </div>
@@ -28,12 +28,14 @@
 ## 🎯 Why this exists
-LLMs can't watch video. Agentic frameworks can browse, code, and write — but a Reel is a black box to them. **ReelRecon** closes that gap with a local, free, MCP-native pipeline:
+Today, analyzing a competitor's Reels means either paying a per-minute transcription SaaS, or uploading videos one by one to a multimodal model and burning tokens while it watches. **ReelRecon is the third option: free, open source, and local** — an MCP-native pipeline built for the part that actually matters for content strategy, the spoken word:
 1. Your agent calls one tool with a **public Instagram profile URL**.
-2. The server grabs the **latest 10 videos**, extracts audio, and transcribes every word with **OpenAI Whisper** — locally, no per-minute API fees.
+2. The server grabs the **latest 10 videos**, extracts audio, and transcribes every word with **OpenAI Whisper** — locally. No subscriptions, no per-minute fees, no tokens spent on video frames.
 3. The agent gets back **structured JSON**: full transcripts plus mined hooks, CTAs, sentiment, keyword clusters, title ideas, and a cross-video strategy overview.
+ReelRecon doesn't analyze visuals — scripts, hooks, and CTAs live in the audio, and that's what it mines for patterns and content ideas. It pairs perfectly with video-capable models: triage all ten Reels here for free in minutes, then send only the one or two that matter to a multimodal model for full visual breakdown.
 Built agent-tough: structured errors instead of exceptions, progress notifications, job queueing with hard timeouts, context-window-friendly response trimming, and a `check_health` tool so your agent can self-diagnose a broken install instead of hallucinating around it.
 ## 🤖 Drop it into your agent stack
@@ -56,7 +58,27 @@ npx -y reelrecon transcribe "https://www.instagram.com/<username>/" --json
 > Already have Python + deps? Set `REELRECON_PYTHON=/path/to/python` to skip provisioning and use your own environment.
 >
-> Package not on npm yet in your region/registry? Run it straight from GitHub — same launcher: `npx -y github:4nw3rprod/IG-Content-Transcriber`
+> Package not on npm yet in your region/registry? Run it straight from GitHub — same launcher: `npx -y github:4nw3rprod/ReelRecon`
+### 🔄 Upgrading
+- **npx users:** pin `reelrecon@latest` in your config (as in the snippets below) and every server start runs the newest published version. If plain `npx -y reelrecon` keeps serving you a stale cached copy, run `npx -y reelrecon@latest` once or clear the cache with `npm cache clean --force`.
+- **GitHub-direct:** `npx -y github:4nw3rprod/ReelRecon` always runs the latest `main` — no npm release needed.
+- **Local clone:** `git pull`. That's it — the private Python env in `~/.reelrecon` is reused automatically and only reinstalls when `requirements.txt` changes.
+If Instagram login-walled you on a public reel, upgrade to **v1.2.1+** and (optionally) hand the server your own session in the MCP config:
+```json
+{
+  "mcpServers": {
+    "reelrecon": {
+      "command": "npx",
+      "args": ["-y", "reelrecon@latest"],
+      "env": { "REELRECON_COOKIES_FILE": "/absolute/path/to/cookies.txt" }
+    }
+  }
+}
+```
 | Agent / Framework | Integration |
 |---|---|
@@ -300,6 +322,8 @@ All optional, via environment variables:
 | `REELRECON_MAX_CONCURRENT_JOBS` | `1` | Parallel transcription jobs (MCP) |
 | `REELRECON_MAX_UPLOAD_BYTES` | 2 GiB | Max local audio file size (MCP) |
 | `REELRECON_EXTRA_MODELS` | — | Comma-separated extra Whisper model names to allow |
+| `REELRECON_COOKIES_FILE` | — | Path to a `cookies.txt` export of your own Instagram session (fallback when Instagram login-walls anonymous access) |
+| `REELRECON_COOKIES_FROM_BROWSER` | — | Read your session straight from a browser, e.g. `chrome` or `firefox:ProfileName` |
 | `REELRECON_HTTP_TIMEOUT_SECONDS` | `30` | Instagram/Groq/yt-dlp socket timeout |
 | `REELRECON_FETCH_RETRIES` | `3` | Instagram profile fetch attempts (with backoff) |
@@ -320,6 +344,7 @@ The MCP server and pipeline helpers ship with a lightweight suite (no Whisper/to
 - **Public profiles only** — private accounts are detected and refused.
 - Instagram may rate-limit anonymous requests; the tool retries with backoff, but if it's blocked, wait and rerun.
+- **Hitting Instagram's login wall on a public reel?** Public videos normally work anonymously (the same access you get after dismissing the login popup in a browser). If Instagram keeps refusing, supply your own logged-in session: set `REELRECON_COOKIES_FILE` to a `cookies.txt` export, or `REELRECON_COOKIES_FROM_BROWSER=chrome`. Your session, your account, your responsibility — keep it to research-scale use.
 - Whisper models are cached after first load; already-transcribed videos are reused on reruns.
 - Everything runs locally. The only network calls are to Instagram/video hosts, and (optionally) GroqCloud with your key.
 - Agent-facing docs live in [`CLAUDE.md`](CLAUDE.md) — most MCP-aware coding agents pick it up automatically.

package/ig_transcriber/pipeline.py CHANGED Viewed

@@ -13,6 +13,7 @@ from dataclasses import dataclass
 from datetime import datetime, timezone
 from functools import lru_cache
 from hashlib import sha1
+from http.cookiejar import MozillaCookieJar
 from pathlib import Path
 from typing import Any, Callable, Dict, Iterable, Optional
 from urllib.error import HTTPError, URLError
@@ -23,10 +24,18 @@ warnings.filterwarnings("ignore", message="urllib3 v2 only supports OpenSSL 1.1.
 warnings.filterwarnings("ignore", message="Support for Python version 3.9 has been deprecated.*")
-def _env_int(name: str, default: int, *, minimum: int = 0) -> int:
+def _env_str(name: str) -> Optional[str]:
     # REELRECON_* is the primary prefix; the legacy IG_TRANSCRIBER_* prefix
     # remains supported so existing setups keep working after the rename.
-    raw = os.environ.get(f"REELRECON_{name}", os.environ.get(f"IG_TRANSCRIBER_{name}", default))
+    for key in (f"REELRECON_{name}", f"IG_TRANSCRIBER_{name}"):
+        value = os.environ.get(key)
+        if value and value.strip():
+            return value.strip()
+    return None
+def _env_int(name: str, default: int, *, minimum: int = 0) -> int:
+    raw = _env_str(name) or default
     try:
         return max(int(raw), minimum)
     except (TypeError, ValueError):
@@ -269,21 +278,35 @@ def detect_input_kind(input_url: str) -> tuple[str, str]:
     return "video", input_url
+def _instagram_cookie_header() -> Optional[str]:
+    cookies_file, _ = cookie_settings()
+    if not cookies_file:
+        return None
+    jar = MozillaCookieJar()
+    try:
+        jar.load(cookies_file, ignore_discard=True, ignore_expires=True)
+    except Exception:
+        return None
+    pairs = [f"{cookie.name}={cookie.value}" for cookie in jar if "instagram.com" in (cookie.domain or "")]
+    return "; ".join(pairs) if pairs else None
 def fetch_profile(username: str, canonical_url: str) -> Dict[str, Any]:
     api_url = f"https://www.instagram.com/api/v1/users/web_profile_info/?username={username}"
+    cookie_header = _instagram_cookie_header()
     payload: Optional[Dict[str, Any]] = None
     last_error: Optional[PipelineError] = None
     for attempt in range(1, FETCH_RETRY_ATTEMPTS + 1):
-        request = Request(
-            api_url,
-            headers={
-                "User-Agent": "Mozilla/5.0",
-                "x-ig-app-id": INSTAGRAM_APP_ID,
-                "Referer": canonical_url,
-                "Accept": "application/json",
-            },
-        )
+        headers = {
+            "User-Agent": "Mozilla/5.0",
+            "x-ig-app-id": INSTAGRAM_APP_ID,
+            "Referer": canonical_url,
+            "Accept": "application/json",
+        }
+        if cookie_header:
+            headers["Cookie"] = cookie_header
+        request = Request(api_url, headers=headers)
         try:
             with urlopen(request, timeout=DEFAULT_TIMEOUT_SECONDS) as response:
                 payload = json.load(response)
@@ -294,7 +317,9 @@ def fetch_profile(username: str, canonical_url: str) -> Dict[str, Any]:
                 raise PipelineError(f"Instagram profile not found: {canonical_url}") from exc
             if exc.code in {401, 403}:
                 raise PipelineError(
-                    "Instagram blocked the profile lookup. This pipeline currently supports public profiles only."
+                    "Instagram blocked the anonymous profile lookup. Only public profiles are supported. "
+                    "If this profile is public and the block persists, supply your own logged-in session via "
+                    "REELRECON_COOKIES_FILE (a cookies.txt export) and retry."
                 ) from exc
             if exc.code == 429:
                 last_error = PipelineError(
@@ -405,8 +430,56 @@ def collect_instagram_profile_videos(canonical_url: str) -> list[VideoCandidate]
     ]
+def cookie_settings() -> tuple[Optional[str], Optional[str]]:
+    """Optional own-session cookies for when Instagram hard-walls anonymous access.
+    Public reels normally work anonymously (same as playing the video after
+    dismissing the login popup in a browser). When Instagram rate-limits or
+    login-walls anonymous requests, users can supply their own logged-in
+    session: REELRECON_COOKIES_FILE (a Netscape cookies.txt export) or
+    REELRECON_COOKIES_FROM_BROWSER (e.g. "chrome", "firefox:ProfileName").
+    """
+    cookies_file = _env_str("COOKIES_FILE")
+    if cookies_file:
+        path = Path(cookies_file).expanduser()
+        if not path.is_file():
+            raise PipelineError(
+                f"REELRECON_COOKIES_FILE points to a missing file: {path}. "
+                "Export a cookies.txt from your logged-in browser (e.g. the 'Get cookies.txt' extension) "
+                "or unset the variable to use anonymous access."
+            )
+        cookies_file = str(path)
+    return cookies_file, _env_str("COOKIES_FROM_BROWSER")
+_LOGIN_WALL_MARKERS = (
+    "login required",
+    "log in",
+    "login",
+    "rate-limit",
+    "rate limit",
+    "requested content is not available",
+    "checkpoint required",
+    "checkpoint_required",
+)
+def _download_error(exc: Exception, target_url: str, action: str) -> PipelineError:
+    text = str(exc).strip()
+    lowered = text.lower()
+    if any(marker in lowered for marker in _LOGIN_WALL_MARKERS):
+        return PipelineError(
+            f"Instagram refused anonymous access while {action} {target_url}: {text} — "
+            "this usually means the video is private/removed, or Instagram is rate-limiting anonymous requests. "
+            "If the link plays in a private browser window (after dismissing the login popup), wait a few minutes and retry, "
+            "or supply your own logged-in session: set REELRECON_COOKIES_FILE to a cookies.txt export, or "
+            "REELRECON_COOKIES_FROM_BROWSER=chrome (also: firefox, edge, safari, brave)."
+        )
+    return PipelineError(f"Failed while {action} {target_url}: {text}")
 def _yt_dlp_base_options() -> Dict[str, Any]:
-    return {
+    options: Dict[str, Any] = {
         "quiet": True,
         "no_warnings": True,
         "noprogress": True,
@@ -415,15 +488,26 @@ def _yt_dlp_base_options() -> Dict[str, Any]:
         "fragment_retries": 3,
         "extractor_retries": 2,
     }
+    cookies_file, cookies_browser = cookie_settings()
+    if cookies_file:
+        options["cookiefile"] = cookies_file
+    elif cookies_browser:
+        options["cookiesfrombrowser"] = tuple(
+            part.strip() for part in cookies_browser.split(":") if part.strip()
+        )
+    return options
 def _yt_dlp_extract_info(target_url: str) -> Dict[str, Any]:
     YoutubeDL = _import_yt_dlp()
+    options = _yt_dlp_base_options()
     try:
-        with YoutubeDL(_yt_dlp_base_options()) as ydl:
+        with YoutubeDL(options) as ydl:
             info = ydl.extract_info(target_url, download=False)
+    except PipelineError:
+        raise
     except Exception as exc:
-        raise PipelineError(f"Failed to inspect video URL: {exc}") from exc
+        raise _download_error(exc, target_url, "inspecting") from exc
     if not isinstance(info, dict):
         raise PipelineError(f"Could not extract video information from URL: {target_url}")
     return info
@@ -827,8 +911,10 @@ def download_audio(candidate: VideoCandidate, run_dir: Path) -> Path:
     try:
         with YoutubeDL(options) as ydl:
             info = ydl.extract_info(candidate.video_url, download=True)
+    except PipelineError:
+        raise
     except Exception as exc:
-        raise PipelineError(f"Failed to download audio for {candidate.video_url}: {exc}") from exc
+        raise _download_error(exc, candidate.video_url, "downloading audio for") from exc
     if not isinstance(info, dict) or not info.get("id"):
         raise PipelineError(f"yt-dlp did not return download metadata for {candidate.video_url}")

package/mcp_server.py CHANGED Viewed

@@ -18,7 +18,7 @@ from mcp.server.fastmcp import Context, FastMCP
 from mcp.types import ToolAnnotations
 SERVER_NAME = "ReelRecon"
-SERVER_VERSION = "1.2.0"
+SERVER_VERSION = "1.2.1"
 logger = logging.getLogger("reelrecon.mcp")
@@ -936,6 +936,7 @@ def build_server(*, host: str, port: int, debug: bool) -> FastMCP:
             "output_root_writable": output_root_writable,
             "saved_batches": len(_recent_manifest_paths(MAX_LIST_LIMIT)),
             "groq_configured": bool(os.environ.get("GROQ_API_KEY")),
+            "instagram_cookies_configured": bool(_env("COOKIES_FILE") or _env("COOKIES_FROM_BROWSER")),
             "jobs": {
                 "active": _active_jobs,
                 "abandoned_after_timeout": _abandoned_jobs,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "reelrecon",
-  "version": "1.2.0",
+  "version": "1.2.1",
   "description": "Reel reconnaissance for AI agents — transcribe and decode public Instagram profiles via MCP. Whisper-powered, free, runs locally.",
   "bin": {
     "reelrecon": "bin/reelrecon.js"
@@ -20,9 +20,9 @@
   },
   "repository": {
     "type": "git",
-    "url": "git+https://github.com/4nw3rprod/IG-Content-Transcriber.git"
+    "url": "git+https://github.com/4nw3rprod/ReelRecon.git"
   },
-  "homepage": "https://github.com/4nw3rprod/IG-Content-Transcriber#readme",
+  "homepage": "https://github.com/4nw3rprod/ReelRecon#readme",
   "keywords": [
     "mcp",
     "mcp-server",