PyPI - rogue-live-redteam - Versions diffs - 1.0.0__tar.gz - Mend

rogue-live-redteam 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (322) hide show

rogue_live_redteam-1.0.0/.gitignore ADDED Viewed

@@ -0,0 +1,237 @@
+# secrets
+.env
+.env.local
+.env.*.local
+.neon_url.local
+.env.bak*
+# python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+.venv/
+env/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.coverage
+.coverage.*
+coverage.xml
+htmlcov/
+# uv / poetry caches (lockfiles ARE committed)
+.uv-cache/
+.poetry-cache/
+# node / next.js
+node_modules/
+.next/
+out/
+.vercel/
+*.tsbuildinfo
+# db / data
+postgres-data/
+pgdata/
+*.db
+*.sqlite
+*.sqlite3
+dump.sql
+# editor / OS
+.DS_Store
+.idea/
+.vscode/
+*.swp
+*.swo
+# bright data / llm cost logs (regenerated per run)
+bright_data_cost_log.csv
+llm_cost_log.csv
+*.log
+# build artifacts
+dist/
+build/
+*.egg-info/
+# runtime-generated state (bandit arm stats, dataset exports, threat-brief outputs)
+# Tracked at the directory level via tests/fixtures/.gitkeep — actual contents stay local.
+data/*
+!data/.gitkeep
+# Bandit arm stats — harmless (no secrets/PII); published so /api/bandit/stats
+# renders on a fresh clone + the Render build can COPY it.
+!data/discovery_bandit.json
+# Per-breach-type judge calibration SUMMARIES — Surface-1 RedlineGuard + the per-rule judge read
+# these at runtime (data/calibration/<breach_type>_report.json) to report REAL precision; without
+# them prod degrades to "uncalibrated". Small summary stats only (precision/recall/agreement CIs +
+# gate + one summary line) — NO eval cases / transcripts / PII. Eval corpora, kappa worksheets,
+# benchmark reports (over_block/strongreject/wildguard), and backups stay local.
+!data/calibration/
+data/calibration/*
+!data/calibration/information_disclosure_report.json
+!data/calibration/unauthorized_action_report.json
+!data/calibration/fabricated_sensitive_value_report.json
+# P2 external-benchmark result reports (back the JBB / WildGuard / StrongREJECT numbers)
+!data/calibration/jbb_judge_report_v3.json
+!data/calibration/wildguard_report.json
+!data/calibration/wildguard_report_harmful.json
+!data/calibration/strongreject_report.json
+# Released derived results behind the papers (derived only — see RESPONSIBLE_RELEASE.md)
+!data/research/
+data/research/*
+!data/research/reproducibility_gap_results.json
+!data/research/reextracted_claims.json
+!data/research/reproducibility_gap_pairs.csv
+!data/research/coverage_validity_results.json
+!data/research/scheduler_results.json
+!data/research/skill_leak_curve_2026-06-13_REDO.log
+!data/research/skill_leak_curve_2026-06-13_DIAGNOSIS.md
+# LeakHub session capture — Playwright storage_state JSON. Effectively your
+# signed-in session; never commit. See scripts/_capture_leakhub_storage.py.
+leakhub_storage_state.json
+# notebooks / scratch
+.ipynb_checkpoints/
+scratch/
+tmp/
+# vendor docs archive — local reference only, not for committing
+website/
+# research-paper code archive — same pattern as website/ above.
+# Holds shallow clones of PAIR / Crescendo / AutoDAN / PAP reference repos
+# per ROGUE_PLAN.md §10.7 implementation checklist. Research code is not
+# licensed for redistribution in our repo; canonical sources stay on GitHub.
+papers/
+# Local-only agent instructions
+CLAUDE.md
+mani.py
+# AI session files (local-only)
+frontend/AGENTS.md
+# local test creds + demo flow (plaintext throwaway key — kept out of git history)
+TESTING.md
+frontend/CLAUDE.md
+tasks/
+# Internal planning / notes — not for the public repo
+ROGUE_PLAN.md
+glossary.md
+answers.md
+new_methods.md
+assets/deck_slide_lines.md
+# rogue v2 spec — internal planning, local-only
+docs/v2/
+# Moat-building method notes (SERP queries, BD harvest recipe, harvest scripts) —
+# the corpus DATA stays committed (tests/runtime load it), but the prose that
+# documents HOW it was harvested is internal-only. Nothing in code/CI reads these.
+tests/fixtures/memory/HARVEST_NOTES.md
+tests/fixtures/oversight/HARVEST_NOTES.md
+# Dev-scratch scripts — not part of the runnable project
+scripts/_*
+# Unpublished paper teaser figure generator (the sibling research fig scripts are tracked)
+scripts/research/p3_teaser.py
+# Stray root npm files (accidental `npm install react-markdown` at repo root;
+# real frontend deps live in frontend/package.json)
+/package.json
+/package-lock.json
+# Internal marketing/positioning notes — not part of the public repo
+/marketing_claims.md
+/monetization_venues.md
+# Internal go-to-market + demo-production playbook (trailer/filming notes,
+# creative briefs, sitemap, positioning) — production process, not product.
+docs/marketing/
+# Founder's internal outbound/sales index (cold-email skeleton, prospect list,
+# pre-send checklist) — never public.
+docs/outbound_package.md
+# Personal / non-product files (must never reach the auto-deploying repo)
+*_CV.md
+skills-lock.json
+.agents/
+startup/
+video/
+# Content-marketing / demo-video tooling — not part of the red-team product
+scripts/youtube_research.py
+scripts/assemble_video.sh
+scripts/build_trailer_clips.sh
+scripts/build_trailer_cut.sh
+scripts/make_trailer_captions.py
+scripts/trailer/
+assets/trailer/
+# Local working artifacts + review UIs — not for the public repo
+.claude/
+/19_leakage_labels.json
+/19_net_effect_labels.json
+/oversight_decisions.json
+/leakage_label.html
+/net_effect_label.html
+/oversight_review.html
+# Local-only research/working docs (WIP; not for the public repo)
+docs/research/adaptive_orchestration_systems.md
+docs/research/scheduler_allocation_study.md
+docs/research/adaptive_orchestration_paper.md
+docs/research/paper_figures.md
+docs/research/figs/
+scripts/paper_figs.py
+scripts/export_paper_data.py
+docs/research/RESEARCH_TODO.md
+# arXiv/workshop submission packages — papers live on arXiv; the repo links to
+# them (see PAPERS.md) rather than shipping the LaTeX sources + internal plan.
+docs/research/publishing/
+# contingent roadmap (build-if-trigger), not implemented system
+docs/3b_v2_renderer_design.md
+.vercel
+.env*
+# ...but .env.example files ARE tracked (documentation, no secrets).
+!.env.example
+!frontend/.env.example
+# Demo/video working media — large; excluded from deploy via .vercelignore, kept local.
+assets/*.mp4
+assets/*.mp3
+assets/*.pdf
+assets/captures/
+assets/music/
+assets/rogue-*.png
+assets/0531.mp4
+# Surface-2 oversight: the harvested-page provenance CACHE is bulky (~5.7M) + reproducible from
+# source_refs — keep it local. The answer-key corpus + notes + the lightweight index ARE committed
+# (the corpus is the moat; the CI lint + tests load it).
+tests/fixtures/oversight/_raw/*
+!tests/fixtures/oversight/_raw/_index.json
+# Surface-3 skill-pool harvested-page provenance CACHE — bulky + reproducible from source_refs; local.
+# The skill pool + canary ground-truth + manifest ARE committed (the moat; tests/lint load them).
+tests/fixtures/memory/_raw/*
+!tests/fixtures/memory/_raw/_index.json
+# Surface-3 held-out-task harvested-page cache — bulky + reproducible; local. Tasks + manifest committed.
+tests/fixtures/memory/_raw_tasks/*
+!tests/fixtures/memory/_raw_tasks/_index.json
+# §08 judge-calibration labelable case fixtures — committed so a labeler's labels tie to a fixed set
+# (the harvester's Groq capture is non-deterministic and can't be regenerated identically).
+!data/calibration/leakage_label_cases.json
+!data/calibration/net_effect_label_cases.json
+!data/calibration/net_effect_synthetic_cases.json
+!data/calibration/net_effect_report.json
+# Brand/marketing process docs (map the private docs/marketing moat) — local only
+PIPELINE_WEBSITE_TO_BRAND.txt
+LESSONS_PROMO_VIDEO_AND_HIGGSFIELD.txt
+PIPELINE_MASTER_PRODUCT_TO_LAUNCH.txt

rogue_live_redteam-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Soren Obounou Nguia
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

rogue_live_redteam-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,304 @@
+Metadata-Version: 2.4
+Name: rogue-live-redteam
+Version: 1.0.0
+Summary: Continuous open-web LLM red-team: harvests live jailbreaks from 15+ open-web sources, reproduces them against your model x system-prompt x tools, and serves results over its own MCP server.
+Project-URL: Homepage, https://rogue-eosin.vercel.app
+Project-URL: Repository, https://github.com/nguiaSoren/ROGUE
+Author: Soren Obounou Nguia
+License: MIT
+License-File: LICENSE
+Keywords: ai-safety,bright-data,jailbreak,llm,mcp,prompt-injection,red-team,security
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Security
+Requires-Python: <3.12,>=3.11
+Requires-Dist: alembic>=1.13
+Requires-Dist: anthropic>=0.34
+Requires-Dist: datasets>=4.8.5
+Requires-Dist: fastapi<1,>=0.115
+Requires-Dist: httpx>=0.27
+Requires-Dist: mcp>=1.0
+Requires-Dist: openai>=1.40
+Requires-Dist: pgvector>=0.3
+Requires-Dist: pillow>=10.4
+Requires-Dist: playwright>=1.60.0
+Requires-Dist: psycopg[binary]>=3.2
+Requires-Dist: pydantic<3,>=2.7
+Requires-Dist: pypdf>=4.3
+Requires-Dist: python-dotenv>=1.0
+Requires-Dist: reportlab>=4.5.1
+Requires-Dist: sentry-sdk[fastapi]>=2.0
+Requires-Dist: slowapi>=0.1.9
+Requires-Dist: sqlalchemy<3,>=2.0
+Requires-Dist: tenacity>=8.5
+Requires-Dist: ulid-py>=1.1
+Requires-Dist: uvicorn[standard]>=0.30
+Provides-Extra: crawl4ai
+Requires-Dist: crawl4ai>=0.4.0; extra == 'crawl4ai'
+Provides-Extra: dev
+Requires-Dist: mypy>=1.10; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest-cov>=5.0; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.5; extra == 'dev'
+Description-Content-Type: text/markdown
+<p align="center">
+  <img src="assets/brand/png/logo-stacked.png" alt="ROGUE" width="300">
+</p>
+<h1 align="center">ROGUE — Red-team every way a high-stakes AI agent can fail</h1>
+<p align="center"><b><i>The Red-Team That Never Sleeps.</i></b></p>
+<p align="center"><sub>Powered end-to-end by 5 Bright Data products · built for the Bright Data real-time AI-agents hackathon (results pending)</sub></p>
+ROGUE measures **every place a high-stakes AI agent can go wrong** — whether the **model** can be broken, whether the **human oversight** around it is meaningful, and whether the **knowledge it accumulates** is safe — each against an independent, continuously-refreshed standard, with a reproducible **signed** record. And it closes the loop: it doesn't just find the break, it **generates and verifies the fix** (you own the runtime — ROGUE never sits in your request path). The continuous open-web harvest behind the model surface runs on just **$0.05–$0.30 of Bright Data** a day.
+> ### 🥇 The first continuous open-web red-team you can query over MCP.
+> ROGUE harvests new jailbreaks **through Bright Data's MCP**, reproduces each one against **your** config, and serves the results **back through its own MCP server** — so you can ask Claude / Cursor *"which live attacks breach my config?"* from your editor. A two-way MCP loop — harvest *and* distribution — that no other red-team tool closes.
+[![Demo](https://img.shields.io/badge/demo-live-brightgreen)](https://rogue-eosin.vercel.app)
+[![Trailer](https://img.shields.io/badge/%E2%96%B6%20trailer-watch-red)](https://youtu.be/pVOQYJvMC6w)
+[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20dataset-gated-yellow)](https://huggingface.co/datasets/soren19/rogue-attacks-2026-05)
+[![Research](https://img.shields.io/badge/research-papers-blueviolet)](PAPERS.md)
+[![License](https://img.shields.io/badge/license-MIT-lightgrey)](LICENSE)
+[![Python](https://img.shields.io/badge/python-3.11-blue)](pyproject.toml)
+## See it live
+- **Dashboard:** https://rogue-eosin.vercel.app — live, deployed.
+- **Trailer:** [watch the 45-second trailer on YouTube](https://youtu.be/pVOQYJvMC6w) (preview below).
+- **Dataset:** [358 attack primitives across 15 families](https://huggingface.co/datasets/soren19/rogue-attacks-2026-05), MIT-licensed and access-gated (defensive-research-only terms — see [`RESPONSIBLE_RELEASE.md`](RESPONSIBLE_RELEASE.md)).
+- **In Slack:** point a Slack incoming webhook at ROGUE and the daily threat brief + every new HIGH/CRITICAL breach post straight to your workspace (the platform integration also files findings to Jira). ROGUE comes to where your team already triages.
+https://github.com/user-attachments/assets/355df07c-71a1-44e1-8146-e59d93187d24
+## Why ROGUE
+Other LLM red-teams run a *fixed* attack set you have to keep updating. ROGUE is the only one that does all of this together:
+- **Harvests live, every day** — new jailbreaks and prompt-injections pulled from 15+ open-web sources (via all 5 Bright Data products), so your report is never older than yesterday.
+- **Reproduces against *your* exact config** — your model × system-prompt × tools, not a generic safety benchmark.
+- **Is queryable over MCP, both ways** — it *harvests* through MCP and *serves* results through its own MCP server, so you can ask "what breaches a model like mine?" from inside Cursor or Claude. No other red-team closes that loop.
+- **Measures three surfaces, signed** — the model, the human approval gate, and the shared skill-pool — each scored against an independent answer key and emitted as a tamper-evident attestation.
+- **Runs on the LLM you choose** — the judge and extraction models are configurable (`JUDGE_MODEL`), any provider or a local model (Ollama via `OPENAI_BASE_URL`); not locked to one vendor.
+Each ingredient exists somewhere; **no competitor does the whole combination** — that's what makes ROGUE a continuous, queryable, multi-surface red-team rather than a one-off scan.
+## Use it in 30 seconds
+### Query ROGUE from your IDE — hosted MCP, zero setup
+The MCP server is mounted into the live API, so there is nothing to clone or run:
+```
+https://rogue-private.onrender.com/mcp/
+```
+The [dashboard home](https://rogue-eosin.vercel.app) has one-click **Add to Cursor** / **Add to VS Code** buttons; for Claude Desktop, add it as a custom connector. It exposes ~19 tools — read-only corpus/breach queries plus scan / report / benchmark actions. Full tool list + local install: [MCP integration](#mcp-integration) below.
+### Submit an endpoint, get a report — hosted API
+`POST /v1/scans` with a target → ROGUE queues it for the same scan engine behind the dashboard and MCP, returning a scored report as **JSON, HTML, or a CISO-ready PDF** on completion. The hosted `/v1` API is **live and key-authorized today** (private beta), but the background worker that drains the scan queue isn't deployed yet, so a queued scan does not complete on the host. For a graded report today, run it locally (below) or point the SDK at your own target — the identical engine, the identical report.
+### Run it locally
+```bash
+git clone https://github.com/nguiaSoren/ROGUE && cd ROGUE
+cp .env.example .env          # add your keys
+docker compose up -d && uv sync --extra dev
+alembic upgrade head && python scripts/ops/seed_demo_data.py
+uvicorn rogue.api.main:app --reload
+```
+### Scan your own model — the SDK
+After cloning, run a **full scan offline with no API key** (a mocked target + judge, end to end → an HTML report):
+```bash
+pip install -e .                                       # the `rogue` SDK + CLI
+PYTHONPATH=src python3 examples/sdk_quickstart.py       # runs a scan, writes a report — no key
+```
+Against a real target it's three lines (plus a judge key — ROGUE grades every response; see [`docs/SDK.md`](docs/SDK.md)):
+```python
+from rogue import Client
+client = Client(endpoint="https://api.company.com/v1", api_key="sk-...")   # or Client(provider="openai")
+report = client.scan(pack="aggressive", budget=10.0)
+print(report.summary()); report.to_html("scan.html")
+```
+*(`pip install rogue` is not live yet — the package isn't on PyPI; install editable from this repo as above.)*
+## Integrations
+ROGUE meets your team where it already works:
+| Surface | Status | What you get |
+|---|---|---|
+| **Your IDE** — MCP | ✅ **Available now** · keyless | One config block in Claude Desktop / Cursor / Windsurf / VS Code; the editor's agent queries the live threat DB on the spot. Add an account to launch full scans without leaving your work. `https://rogue-private.onrender.com/mcp` |
+| **Your chat & tracker** — Slack + Jira | ✅ Slack alerts now · ⏳ auto-fan-out rolling out | Point a Slack incoming webhook (`SLACK_WEBHOOK_URL`) at ROGUE and the daily threat brief + new CRITICAL/HIGH breaches post to your workspace automatically — **works today**. Or connect Slack + Jira as per-org integrations (Fernet-encrypted creds) and file findings via the MCP action tools (`send_slack_alert` / `create_jira_ticket`); automatic fan-out on every scan completion is rolling out with the hosted worker. [Setup](docs/platform/integrations/slack-github-jira.md) |
+| **API & SDK** — REST `/v1` + Python | ✅ live · ⏳ hosted scans rolling out | The `/v1` REST API + OpenAPI spec are live and key-authorized at `https://rogue-private.onrender.com/v1`. The **Python SDK runs real scans today** against your own target (`from rogue import Client`; `pip install -e .` — see [`docs/SDK.md`](docs/SDK.md)). *Hosted* scan execution (a `POST /v1/scans` that completes server-side) is rolling out. |
+| **Security tooling** — SOAR / SIEM | 🔜 **Coming soon** | Splunk / Palo Alto Cortex connectors to pipe findings into your existing security stack. On the roadmap, not available today. |
+## What ROGUE does
+Five-layer pipeline: **Harvest → Extract → Dedupe → Reproduce → Diff.**
+1. **Harvest** — 19 open-web sources fetched via 5 Bright Data products.
+2. **Extract** — an LLM agent structures each fetched document into an `AttackPrimitive`.
+3. **Dedupe** — pgvector cosine similarity clusters near-duplicate attacks.
+4. **Reproduce** — each canonical primitive runs against your `DeploymentConfig` × 5 trials.
+5. **Diff** — a separate judge model verdicts each trial; the daily diff ships to Slack, MCP, and the dashboard.
+> **New to the codebase?** [`docs/PROJECT_STRUCTURE.md`](docs/PROJECT_STRUCTURE.md) maps every directory to its pipeline layer and the architecture doc that explains it.
+## What ROGUE red-teams
+ROGUE measures **every place a high-stakes AI agent can go wrong** — whether the agent can be **broken**, whether the **human oversight** around it is meaningful, and whether the **knowledge it accumulates** is safe — each against an independent, continuously-refreshed standard, and each backed by a result rather than a claim:
+- **The model.** Does a live jailbreak or prompt-injection break *your* deployment? The daily breach matrix replays open-web attacks against your model × system-prompt × tools, graded by a [human-calibrated judge](docs/judge-calibration.md). Finding: most *claimed* jailbreaks don't even reproduce — [Claimed Potency Does Not Predict Reproduction](PAPERS.md).
+- **The human gate.** When a person "approves" an AI action, does that approval mean anything? ROGUE measures a reviewer's **false-approve rate** against an independent answer key — the rubber-stamping failure mode regulators now care about ([oversight](PAPERS.md)).
+- **The agent's memory.** Does a shared agent skill-pool leak one user's secrets to the next? ROGUE plants canaries in scrubbed skills and measures recovery — 85% leaked on a weak model despite an explicit never-reveal instruction ([Scrubbing Is Not Containment](PAPERS.md)).
+…and it **closes the loop (assurance-native remediation).** Finding a breach is half the job. ROGUE *generates* a verified mitigation — a system-prompt patch, a tool-permission scope, distilled fine-tuning data — and **re-tests it against the same live corpus to prove it actually closed the breach without over-blocking** (measured with the same calibrated judge). ROGUE generates and verifies the fix; **you own the runtime — it never sits in your request path.**
+One engine, one independent standard — same operation each time (fire inputs at an AI decision-maker, capture what it does, score it against the standard, emit a reproducible signed record).
+## Research
+ROGUE's findings are written up as papers and posts — **[PAPERS.md](PAPERS.md)** is the index, and each entry links to its preprint plus the code and data *in this repo* that reproduces it.
+- **Allocation Is a Capability-Growth Mechanism** — in a self-growing red-team, evaluation *allocation* is a capability lever, not an efficiency layer (8 of 20 starved candidates graduate vs 0 of 20; Fisher *p* = 0.003). · *arXiv `cs.CR`×`cs.LG` — preprint posting soon*
+- **Consummation-Gated Breach Judges** — one gate template ("engagement ≠ breach; consummation = breach") calibrates breach judges across classes, validated against human labels four ways. · *arXiv `cs.CR`×`cs.CL` — preprint posting soon*
+- **Claimed Potency Does Not Predict Reproduction** — most open-web jailbreaks don't survive as working carriers in deployment context, and a source's claimed rate carries no usable signal (Spearman −0.10). · *arXiv `cs.CR` (lead paper) — preprint posting soon*
+- **Scrubbing Is Not Containment** — canary leakage from shared agent skill pools tracks *alignment*, not model size. · *workshop paper + Hugging Face blog — posting soon*
+## Deep dives
+The mechanics behind the pipeline, each on its own page:
+- **Bright Data integration.** Five BD products end-to-end, plus a self-tuning ε-greedy SERP bandit that allocates the daily harvest budget by yield (novel primitives per dollar) at $0.05–$0.30 per harvest. → [docs/bright-data.md](docs/bright-data.md)
+- **Multimodal red-team.** Refused text jailbreaks become real images and audio via deterministic black-box renderers, climbing an autonomous escalation ladder that stops at the first breach; Bright Data sources real carrier images to composite onto. → [docs/multimodal.md](docs/multimodal.md)
+- **Self-growing attack repertoire.** ROGUE harvests reusable *techniques*, not just payloads — classifying, routing, and graduating / retiring / resurrecting them on live breach evidence, with a governed renderer registry and grammar-driven planning (the planner-willingness finding: 22% → 100% by changing only the planner). → [docs/self-growing-repertoire.md](docs/self-growing-repertoire.md)
+- **Judge calibration.** Every breach number is an LLM verdict, so the judge is validated against independent human labels four ways — in-distribution FP 2.56%, WildGuardTest harm 88.5%, StrongREJECT −26% inflation, JBB **91.0%** human agreement (top of field, reproducible from `data/calibration/`), up from a 70.3% v1 judge after a diagnosed recalibration. → [docs/judge-calibration.md](docs/judge-calibration.md)
+- **Benchmark — coverage over time.** Frozen AdvBench / JBB goal sets run through ROGUE's own graduated ladder against a fixed target, to answer "is this month's ROGUE better than last month's?" (honest caveat: still N=1, pre-recalibration). → [docs/benchmark.md](docs/benchmark.md)
+- **Dashboard tour.** A 5-second pitch and a 5-minute deep-dive: cinematic home, `/feed` war room (attacks replayed as ATTACKER → MODEL → JUDGE), `/matrix` breach heatmap, `/brief` threat brief. → [docs/dashboard.md](docs/dashboard.md)
+## Capabilities
+- 15-family attack taxonomy (OWASP LLM Top 10 + MITRE ATLAS aligned) — see [`docs/taxonomy.md`](docs/taxonomy.md).
+- 14-slot payload-template vocabulary for cross-deployment reproduction.
+- 19-source open-web harvest list — see [`docs/sources.md`](docs/sources.md).
+- 8-model target panel (GPT-5.4 Nano, Claude Haiku 4.5, Llama-3.1-8B, Mistral Small, Gemini 3.1 Flash-Lite, Claude Opus 4.8, + two audio targets) — cheap-tier models per lab, an open-weight reliability anchor, a frontier reference, and audio endpoints for multimodal coverage.
+- Judge-model verdict pipeline (REFUSED / EVADED / PARTIAL_BREACH / FULL_BREACH), human-validated four ways — see [Judge calibration](docs/judge-calibration.md).
+- Daily threat brief (markdown + JSON) + Slack webhook.
+- ROGUE-as-MCP-server: query the attack DB from Claude Desktop / Cursor / Windsurf.
+- True multimodal red-team and a self-growing technique repertoire (see [Deep dives](#deep-dives)).
+- External benchmark layer against frozen AdvBench / JailbreakBench goal sets.
+## Roadmap
+- **Expand source coverage** — deeper Web Scraper API integration brings the next ~100 open-web sources online.
+- **Customer SDK** — a drop-in SDK that lands ROGUE verdicts in the workflows teams already run (private beta; SOAR/SIEM connectors planned).
+- **Break bandit** — a second, contextual Thompson-sampling bandit that learns *how to break* (which escalation strategy to try first per attack-family × target); the control surface and reward log are already built and instrumented in prod.
+- **Enterprise** — RBAC, audit logs, and compliance reporting for teams that need them.
+---
+# Run it yourself
+*Everything below is for builders — connecting ROGUE to your tools, running it locally, or driving the pipeline.*
+## Architecture
+See [`docs/architecture.md`](docs/architecture.md) for the five-layer pipeline diagram and the locked stack table.
+## MCP integration
+ROGUE exposes its threat-intelligence database as a **producer-side MCP server** — Claude Desktop / Cursor / Windsurf users query the live breach matrix from inside their IDE.
+**Hosted (recommended, zero setup).** The server is mounted into the live API at `https://rogue-private.onrender.com/mcp/`. Use the **Add to Cursor / Add to VS Code** buttons on the [dashboard home](https://rogue-eosin.vercel.app), or add it as a custom connector in Claude Desktop (Settings → Customize → add a custom connector → paste the URL). The hosted server exposes the read-only query tools **and** the action tools (validate / scan / report / benchmark + Level-3 workflow tools) — ~19 in all.
+**Local (against your own DB), one command:**
+```bash
+uv run python scripts/ops/install_mcp.py                  # Claude Desktop (default)
+uv run python scripts/ops/install_mcp.py --client cursor  # or: cursor / windsurf
+```
+This detects the client's config path, merges in the `rogue` server entry pointing at your checkout (preserving every other key), and backs up the old file first. It's idempotent; `--dry-run` previews, `--uninstall` removes. Then restart the client. Requires a populated DB (run `harvest_once.py` + `reproduce_once.py` at least once); the deployed build reads the live Neon DB.
+**Read-only query tools:** `query_attacks`, `query_diff`, `query_threat_brief`, `query_breaches_for_config`, `query_attack_detail`, `query_worst_attacks`. After connecting, ask Claude *"What new attacks broke our customer-support config in the last 24 hours?"* and it will call `query_diff` + `query_breaches_for_config` and summarize.
+**Transport.** Stdio by default (the Claude Desktop path). For remote clients, serve over HTTP:
+```bash
+ROGUE_MCP_TRANSPORT=streamable-http uv run python -m rogue.mcp_server.server
+# serves http://127.0.0.1:8001/mcp  (ROGUE_MCP_HOST / ROGUE_MCP_PORT override the bind)
+```
+## Pipeline CLI reference
+The two `$`-billed driver scripts spend Bright Data + LLM credit and write the live DB — run them deliberately. All flags are optional.
+<details><summary><b><code>harvest_once.py</code> — harvest → extract → dedup → persist</b></summary>
+```bash
+uv run python scripts/harvest/harvest_once.py --since 1d
+```
+| Flag | Default | What it does |
+|---|---|---|
+| `--since` | `1d` | Harvest window (`1d`, `14d`, `6h`). |
+| `--x-handles` | off | Comma-separated X handles to scrape this run (X is off by default — BD's profile scraper is slow). |
+| `--database-url` | `$DATABASE_URL` | Target SQLAlchemy URL. |
+| `--extraction-model` | Claude Haiku 4.5 | Provider-prefixed extraction model (prompt-cached). |
+| `--embedding-model` | `text-embedding-3-small` | Embedding model for dedup. |
+Env toggles: `EXTRACTION_CONCURRENCY` · `HARVEST_INGEST_IMAGES=0` · `HARVEST_FOLLOW_LINKS=0`. For a single known-fresh URL, use `scripts/harvest/harvest_url.py --url "https://x.com/.../status/<id>"`.
+</details>
+<details><summary><b><code>reproduce_once.py</code> — render → target panel → judge → persist</b></summary>
+```bash
+uv run python scripts/reproduce/reproduce_once.py --primitive-limit 50 --judge-batch
+```
+| Flag | Default | What it does |
+|---|---|---|
+| `--primitive-limit N` | all | Cap how many primitives are reproduced (top-N by `reproducibility_score`). |
+| `--only-unreproduced` | off | Reproduce only primitives with no `breach_results` yet. |
+| `--primitive-ids A,B,…` | — | Reproduce exactly the named primitives (overrides other filters). |
+| `--n-trials N` | 5 | Trials per (primitive × config) — powers the bootstrap CI. |
+| `--multimodal-only` | off | Only image/audio primitives, rendered as real media. |
+| `--persona NAME` | off | PAP persona wrap (the B side of the A/B). |
+| `--escalate` | off | Inline auto-ladder for panel-wide refusals (costly; bound with `--escalate-max-spend`). |
+| `--candidate-quota N` | 0 | Reserve N guaranteed harvested-candidate attempts before early-stop (scheduler policy). |
+| `--judge-batch` | off | Grade via the Anthropic Batch API (50% off + caching; baseline-only). |
+`scripts/reproduce/candidate_quota_ab.py` runs the candidate-quota A/B (the empirical baseline for the break-bandit).
+</details>
+## Repository layout
+```
+src/rogue/     # Python package (schemas, harvest, extract, dedupe, reproduce, diff, mcp_server, db, api)
+docs/          # architecture, schemas, taxonomy, sources, budget + the deep-dive pages
+tests/         # schema round-trip tests + golden fixtures
+scripts/       # harvest_once.py, reproduce_once.py, calibration/, ops/
+frontend/      # Next.js dashboard
+```
+## Built by
+Benaja Soren Obounou Lekogo Nguia — AI Systems Engineer; previously Grand-Prize winner at Yonsei University for LLM security tooling (GPTFuzz optimization), adversarial-ML research at AIM Intelligence (HWARANG red-team series).
+> "I built ROGUE solo in 6 days because Bright Data abstracted away 5 different anti-bot stacks I'd otherwise have spent weeks on. The MCP Server plus pre-built Reddit / X scrapers turned a 6-week project into a 6-day project."
+>
+> — Benaja Soren Obounou Lekogo Nguia
+## License
+MIT. See [`LICENSE`](LICENSE).