PyPI - bad-research - Versions diffs - 0.1.0__tar.gz - Mend

bad-research 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (338) hide show

bad_research-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,25 @@
+__pycache__/
+*.py[cod]
+.venv/
+venv/
+*.egg-info/
+dist/
+build/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.coverage
+htmlcov/
+.DS_Store
+# dev-only build artifacts — not shipped (kept on disk for dev, out of the prod repo)
+docs/plans/
+docs/investigation/
+docs/INTERFACES.md
+docs/INTERFACES_KEYLESS.md
+docs/KEYLESS_REBUILD_PLAN_OUTLINE.md
+docs/SPEC.md
+docs/enhancements/
+# research vault — per-run output/scratch (eval + real runs), never shipped
+research/

bad_research-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Bad Research
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

bad_research-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,143 @@
+Metadata-Version: 2.4
+Name: bad-research
+Version: 0.1.0
+Summary: michael jackson bad
+Project-URL: Homepage, https://github.com/LeventySeven/badresearch
+Project-URL: Repository, https://github.com/LeventySeven/badresearch
+Project-URL: Issues, https://github.com/LeventySeven/badresearch/issues
+Author: Bad Research
+License-Expression: MIT
+License-File: LICENSE
+Keywords: agent,claude,cli,deep-research,llm,rag,research,retrieval
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Python: <3.14,>=3.11
+Requires-Dist: anthropic>=0.40
+Requires-Dist: beautifulsoup4>=4.12
+Requires-Dist: crawl4ai>=0.4
+Requires-Dist: dateparser>=1.2
+Requires-Dist: ddgs>=9.14
+Requires-Dist: feedparser>=6.0
+Requires-Dist: httpx>=0.27
+Requires-Dist: jinja2>=3.1
+Requires-Dist: langdetect>=1.0.9
+Requires-Dist: lxml>=5.0
+Requires-Dist: platformdirs>=4.0
+Requires-Dist: pydantic>=2.0
+Requires-Dist: pymupdf4llm>=0.0.17
+Requires-Dist: pymupdf>=1.24
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: rank-bm25>=0.2
+Requires-Dist: rapidfuzz>=3.0
+Requires-Dist: rich>=13.0
+Requires-Dist: snowballstemmer>=2.2
+Requires-Dist: trafilatura>=1.8
+Requires-Dist: tree-sitter-language-pack>=0.7
+Requires-Dist: tree-sitter>=0.23
+Requires-Dist: typer>=0.9.0
+Provides-Extra: all
+Requires-Dist: lancedb>=0.13; extra == 'all'
+Requires-Dist: mcp>=1.6; extra == 'all'
+Requires-Dist: playwright>=1.40; extra == 'all'
+Requires-Dist: pyarrow>=15.0; extra == 'all'
+Requires-Dist: sentence-transformers>=3.0; extra == 'all'
+Requires-Dist: torch>=2.0; extra == 'all'
+Provides-Extra: browse
+Requires-Dist: playwright>=1.40; extra == 'browse'
+Provides-Extra: dev
+Requires-Dist: mcp>=1.6; extra == 'dev'
+Requires-Dist: mypy>=1.8; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest-cov>=4.1; extra == 'dev'
+Requires-Dist: pytest>=7.4; extra == 'dev'
+Requires-Dist: respx>=0.21; extra == 'dev'
+Requires-Dist: ruff>=0.3; extra == 'dev'
+Provides-Extra: local
+Requires-Dist: lancedb>=0.13; extra == 'local'
+Requires-Dist: pyarrow>=15.0; extra == 'local'
+Requires-Dist: sentence-transformers>=3.0; extra == 'local'
+Requires-Dist: torch>=2.0; extra == 'local'
+Provides-Extra: mcp
+Requires-Dist: mcp>=1.6; extra == 'mcp'
+Description-Content-Type: text/markdown
+<p align="center">
+  <img src="assets/banner.png" alt="BAD — michael jackson bad" width="520">
+</p>
+<h1 align="center">Bad Research</h1>
+<p align="center"><em>michael jackson bad</em></p>
+A **keyless** deep-research agent that runs as a Claude Code skill — a
+fork-and-enhance of [hyperresearch](https://github.com/jordan-gibbs/hyperresearch).
+It searches wide, filters garbage, grounds every claim to a source, and needs
+**zero API keys**: the Claude Code host model supplies all inference, exactly like
+hyperresearch. Optional local CLIs and a `[local]` neural extra are enhancements,
+never requirements.
+## Install
+Bad Research is a small CLI that registers itself as a Claude Code skill. No API keys. Requires Python 3.11–3.13.
+```bash
+# Install the CLI (pipx or uv — either works)
+pipx install bad-research
+uv tool install bad-research
+# Register the /bad-research skill into ~/.claude
+bad install
+# Verify
+bad doctor
+```
+`bad install` writes the entry skill to `~/.claude/skills/bad-research/`; the per-step
+skills install lazily on first use. For a project-local install instead of global, run
+`bad install --project` inside the project. `bad doctor` shows what's wired (host model,
+keyless search/browse, the optional external CLIs it can drive, the `[local]` neural stack).
+## Use it in Claude Code
+After `bad install`, open Claude Code in any project and either:
+- **Invoke it directly** — type the slash command with your question:
+  ```
+  /bad-research Is open-source AI more dangerous than closed-source for national security?
+  ```
+- **Let Claude trigger it** — just ask a research-shaped question (*"write me a cited report
+  comparing vector databases"*, *"literature review on GLP-1 drugs"*) and Claude loads the
+  skill automatically.
+It scales to the question: a simple lookup gets a fast cited answer in minutes; a broad or
+contested one runs the full adversarially-reviewed pipeline (~1.5–2.5 h). The final report
+and every fetched source land in a vault under `./research/` that compounds across sessions.
+> Want the latest unreleased build? Install from source: `pipx install git+https://github.com/LeventySeven/badresearch.git`
+## What it does
+A tier-adaptive pipeline turns a question into an audited, fully-cited report, and
+every fetched source lands in a persistent, searchable vault that compounds across
+sessions. Keyless by design:
+- **Search** — the host `WebSearch` tool + DuckDuckGo + 7 scholarly APIs, fused and reranked by the host model.
+- **Content** — a native fetch-and-clean pipeline (readability → markdown → optional LLM clean), SSRF-guarded.
+- **Browse** — an agentic observe → act → extract loop driven by a local, keyless headless browser.
+- **Retrieve** — SQLite FTS5/BM25 by default (no model required), with an optional local neural lane.
+- **Ground** — every report sentence is checked against its source; uncited claims are blocked.
+## How it works & where the patterns came from
+Bad Research takes hyperresearch as its base and enhances each stage with patterns
+drawn from the best deep-research systems — Perplexity, Gemini, Firecrawl, Stagehand,
+AgentQL, and others — reimplemented to run **keyless** on the host model. The full
+write-up, stage by stage with provenance, is in
+[**docs/HOW_IT_WORKS.md**](docs/HOW_IT_WORKS.md).
+MIT licensed.

bad_research-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,75 @@
+<p align="center">
+  <img src="assets/banner.png" alt="BAD — michael jackson bad" width="520">
+</p>
+<h1 align="center">Bad Research</h1>
+<p align="center"><em>michael jackson bad</em></p>
+A **keyless** deep-research agent that runs as a Claude Code skill — a
+fork-and-enhance of [hyperresearch](https://github.com/jordan-gibbs/hyperresearch).
+It searches wide, filters garbage, grounds every claim to a source, and needs
+**zero API keys**: the Claude Code host model supplies all inference, exactly like
+hyperresearch. Optional local CLIs and a `[local]` neural extra are enhancements,
+never requirements.
+## Install
+Bad Research is a small CLI that registers itself as a Claude Code skill. No API keys. Requires Python 3.11–3.13.
+```bash
+# Install the CLI (pipx or uv — either works)
+pipx install bad-research
+uv tool install bad-research
+# Register the /bad-research skill into ~/.claude
+bad install
+# Verify
+bad doctor
+```
+`bad install` writes the entry skill to `~/.claude/skills/bad-research/`; the per-step
+skills install lazily on first use. For a project-local install instead of global, run
+`bad install --project` inside the project. `bad doctor` shows what's wired (host model,
+keyless search/browse, the optional external CLIs it can drive, the `[local]` neural stack).
+## Use it in Claude Code
+After `bad install`, open Claude Code in any project and either:
+- **Invoke it directly** — type the slash command with your question:
+  ```
+  /bad-research Is open-source AI more dangerous than closed-source for national security?
+  ```
+- **Let Claude trigger it** — just ask a research-shaped question (*"write me a cited report
+  comparing vector databases"*, *"literature review on GLP-1 drugs"*) and Claude loads the
+  skill automatically.
+It scales to the question: a simple lookup gets a fast cited answer in minutes; a broad or
+contested one runs the full adversarially-reviewed pipeline (~1.5–2.5 h). The final report
+and every fetched source land in a vault under `./research/` that compounds across sessions.
+> Want the latest unreleased build? Install from source: `pipx install git+https://github.com/LeventySeven/badresearch.git`
+## What it does
+A tier-adaptive pipeline turns a question into an audited, fully-cited report, and
+every fetched source lands in a persistent, searchable vault that compounds across
+sessions. Keyless by design:
+- **Search** — the host `WebSearch` tool + DuckDuckGo + 7 scholarly APIs, fused and reranked by the host model.
+- **Content** — a native fetch-and-clean pipeline (readability → markdown → optional LLM clean), SSRF-guarded.
+- **Browse** — an agentic observe → act → extract loop driven by a local, keyless headless browser.
+- **Retrieve** — SQLite FTS5/BM25 by default (no model required), with an optional local neural lane.
+- **Ground** — every report sentence is checked against its source; uncited claims are blocked.
+## How it works & where the patterns came from
+Bad Research takes hyperresearch as its base and enhances each stage with patterns
+drawn from the best deep-research systems — Perplexity, Gemini, Firecrawl, Stagehand,
+AgentQL, and others — reimplemented to run **keyless** on the host model. The full
+write-up, stage by stage with provenance, is in
+[**docs/HOW_IT_WORKS.md**](docs/HOW_IT_WORKS.md).
+MIT licensed.

bad_research-0.1.0/assets/banner.png ADDED Viewed

Binary file

bad_research-0.1.0/docs/HOW_IT_WORKS.md ADDED Viewed

@@ -0,0 +1,142 @@
+# How Bad Research was built — and where the patterns came from
+Bad Research is a **fork-and-enhance of [hyperresearch](https://github.com/jordan-gibbs/hyperresearch)**.
+Hyperresearch gave us the foundation: a tier-adaptive, ~16-stage research pipeline
+driven as a Claude Code skill, with a persistent markdown + SQLite vault that
+compounds knowledge across sessions. We kept that whole spine and enhanced each
+stage with the best pattern we could find for it — the approaches that the leading
+deep-research and web-agent systems are known for — and reimplemented every one of
+them to run **keyless**, on the Claude Code host model, with no third-party API key.
+This document is the honest tour: each stage, the pattern it borrows, and who
+pioneered that pattern. Nothing here needs an API key; where a pattern was
+originally a paid product, we adopted the *idea* and rebuilt it on the host model
+and open tooling.
+---
+## The build approach
+1. **Start from hyperresearch.** Its pipeline, vault, skill packaging, and grounding
+   gate are the base. We did not rewrite what already worked.
+2. **Enhance stage by stage.** For each stage — search, content, browse, retrieval,
+   reranking, grounding, the control loop — we took the strongest known pattern and
+   wired it in behind a clean seam.
+3. **Keyless by design.** Every enhancement is implemented on the host model (via a
+   single `LLMProvider` seam) plus open-source libraries and optional local CLIs.
+   No vendor key is ever required to install or run; `bad doctor` proves it.
+4. **Built with reviews.** Each stage was implemented and then independently
+   reviewed for correctness, security (e.g. SSRF), and faithfulness to the pattern
+   before it landed.
+---
+## The stages and their provenance
+### Search — wide recall, then a relevance-gated loop
+*Pattern from: Perplexity.* Perplexity's deep search popularised the loop of
+casting a wide net across many sources and then **re-querying until the results are
+actually good enough** rather than answering from the first page. We implement that
+as a "retrieve-until-good" loop: a relevance gate (default 0.70) and a minimum
+pass-fraction (0.30) decide whether to expand and search again, up to a small round
+cap. Recall comes from the host `WebSearch` tool + DuckDuckGo (`ddgs`), with an
+optional self-hosted SearXNG if you have one.
+### Scholarly verticals — go to the primary sources
+*Pattern from: the open scholarly ecosystem.* For research-grade questions, general
+web search isn't enough, so we route to the primary academic APIs directly —
+**arXiv, OpenAlex, Crossref, Semantic Scholar, Europe PMC, PubMed, and Wikipedia**.
+All are free and keyless; an intent classifier sends a query to the right ones
+(medical → PubMed/Europe PMC, academic → OpenAlex/arXiv, etc.).
+### Rank fusion — merge many ranked lists fairly
+*Pattern from: Reciprocal Rank Fusion (a standard IR technique, used by systems like
+Exa).* When several sources each return their own ranked list, we combine them with
+**RRF (k = 60)** so no single source dominates and consensus results rise to the top.
+### Reranking — a cross-encoder pass for precision
+*Pattern from: Cohere Rerank.* Cohere popularised dropping a cross-encoder reranker
+in front of the final results to sharply improve precision. We adopt the pattern but
+keep it **keyless**: the **host model itself** scores each candidate against the
+query with one frozen rubric prompt. It's a frontier cross-encoder you already have —
+≥ rerank-API quality at zero dollars. (An optional `[local]` extra adds an offline
+`ms-marco-MiniLM` cross-encoder for users who want it.)
+### Content extraction — clean signal out of messy HTML
+*Pattern from: Firecrawl.* Firecrawl is known for turning arbitrary pages into clean,
+model-ready markdown by stripping boilerplate (nav, ads, cookie banners) before
+conversion. We rebuilt that natively: a readability/pruning pass strips chrome,
+HTML→markdown conversion preserves citations and structure, PDFs go through a PDF
+text extractor, and a final optional LLM-clean pass tidies what's left — all with a
+strict anti-prompt-injection preamble so page content is always treated as data.
+Every fetch is **SSRF-guarded** (private-IP/metadata-endpoint denylist, re-validated
+on each redirect).
+### Agentic browse — observe, act, extract
+*Pattern from: Stagehand / Browserbase.* The modern web-agent pattern is a loop of
+**observe** the page's accessibility tree → **act** (click/type/navigate) → **extract**
+structured data, with the model choosing actions against stable element references.
+We use Stagehand's well-known observe/act/extract prompts as the loop's brain — but
+instead of a paid cloud browser, we drive **[vercel's `agent-browser`](https://github.com/vercel-labs/agent-browser)**,
+a local, keyless headless-Chrome CLI (with `lightpanda` as a fast optional engine).
+The model only ever acts on element references that exist in the live page snapshot,
+so it can't be steered onto a hallucinated element.
+### Element querying — ask the page in a query language
+*Pattern from: AgentQL.* AgentQL's idea is a small declarative query language for
+locating page elements by role/intent rather than brittle CSS selectors. We ported a
+parser for that query style so the browse layer can resolve elements the same way —
+again, purely local, no service.
+### Retrieval — hybrid lexical + (optional) semantic
+*Pattern from: Perplexity-style hybrid retrieval.* The robust pattern is to blend
+keyword and semantic recall rather than rely on either alone. Our **default is
+keyless and model-free**: SQLite **FTS5/BM25** lexical recall, three-tier rank
+fusion, and a lexical semantic cache (0.85 overlap) — fast, deterministic, and it
+runs anywhere. If you install the `[local]` extra, a dense vector lane (a local
+`bge` embedder + **LanceDB** ANN, the pattern LanceDB is built for) is used
+automatically on large corpora, fused with BM25 via RRF.
+### Grounding & no-hallucination — cite or don't say it
+*Pattern from: Gemini's grounding & recitation guarantees.* Gemini is known for
+binding generated claims back to retrieved evidence and guarding against verbatim
+recitation. We enforce both deterministically: every report sentence is checked
+against its cited source (exact-match → local NLI → a host-model gate), **uncited
+factual claims are blocked**, and a recitation gate flags any sentence that copies a
+source too closely (a 12-word verbatim run or >50% overlap) — with a carve-out only
+for genuine, attributed direct quotes.
+### Reasoning-effort dial — spend compute where it matters
+*Pattern from: OpenAI's reasoning-effort control.* We expose a `--reasoning-effort`
+continuum (minimal → low → medium → high) that maps to route, model tier, fetch
+budget, and a token ceiling, with a defined degrade order so the system spends more
+only when the question warrants it.
+### Confidence-band hedging — say how sure it is
+*Pattern from: calibrated-uncertainty practice in research assistants.* The final
+report's claims carry a confidence band derived from grounding scores, so
+low-confidence statements are hedged rather than asserted flatly.
+---
+## What's keyless vs. optional
+| Capability | Default (keyless, no setup) | Optional enhancement |
+|---|---|---|
+| Inference | Claude Code host model | — |
+| Web search | host `WebSearch` + DuckDuckGo + 7 scholarly APIs | self-hosted SearXNG |
+| Reranking | host-model cross-encoder | `[local]` `ms-marco-MiniLM` |
+| Retrieval | SQLite FTS5/BM25 | `[local]` `bge` + LanceDB dense lane |
+| Content render | native httpx + readability | `crawl4ai` JS render (bundled) |
+| Browse | — | `agent-browser` / `lightpanda` CLIs |
+| Media transcripts | — | `yt-dlp` CLI |
+Everything in the left column works the moment you `pip install bad-research`. The
+right column is detected at runtime (`bad doctor` shows the status) and degrades
+gracefully when absent — it never blocks a run.
+---
+*Built on the shoulders of [hyperresearch](https://github.com/jordan-gibbs/hyperresearch),
+with patterns from Perplexity, Gemini, Cohere, Firecrawl, Stagehand/Browserbase,
+AgentQL, LanceDB, and the open scholarly web — all reimplemented keyless.*

bad_research-0.1.0/golden-eval-report.json ADDED Viewed

@@ -0,0 +1,179 @@
+{
+  "pass_rate": 1.0,
+  "total": 8,
+  "components": {
+    "decompose": 1.0,
+    "retrieval": 1.0,
+    "synthesis": 1.0
+  },
+  "cases": [
+    {
+      "id": "01_causal_light",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "02_comparison",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "03_multidomain_full",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "04_contested_argumentative",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "05_definitional",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "06_recency_temporal",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "07_breadth_list",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    },
+    {
+      "id": "08_numeric_precise",
+      "passed": true,
+      "verdict": {
+        "rails": {
+          "factual": "pass",
+          "citation": "pass",
+          "completeness": "pass",
+          "source_quality": "pass",
+          "efficiency": "pass"
+        },
+        "pass_rate": 1.0,
+        "passed": true,
+        "rationale": "deterministic offline rubric"
+      },
+      "components": {
+        "decompose": true,
+        "retrieval": true,
+        "synthesis": true
+      }
+    }
+  ]
+}