npm - pi-research - Versions diffs - 1.3.0 → 1.4.0 - Mend

pi-research 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +67 -250
package/bin/pi-research.js +0 -0
package/bin/unblind-mcp.js +0 -0
package/lib/page-fetch-adapter.js +311 -64
package/lib/research-policy.js +36 -15
package/lib/research-profiles.json +4 -0
package/lib/research.js +15 -6
package/lib/router-annotation.js +192 -0
package/lib/router-structured-features.js +134 -0
package/lib/tiny-router.js +338 -0
package/lib/web-research.js +171 -10
package/package.json +4 -4

package/README.md CHANGED Viewed

@@ -3,295 +3,112 @@
 ![pi-research logo](docs/assets/pi-research-logo.png)
 [![npm version](https://img.shields.io/npm/v/pi-research?color=blue)](https://www.npmjs.com/package/pi-research)
-[![tests](https://img.shields.io/badge/tests-56%2F56-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
+[![tests](https://img.shields.io/badge/tests-121%2F121-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
 [![Pi package](https://img.shields.io/badge/pi-package-blueviolet)](https://pi.ai)
-`pi-research` is a Pi extension for grounded web research.
-It searches, ranks, compares, and synthesizes sources inside the agent.
+**The Zero-Setup Research Engine for Autonomous AI Agents.**
-![community packs](docs/assets/pi-research-community.png)
-## Why it exists
-When agents answer well, they usually do three things:
-1. search the right places
-2. prefer authoritative sources
-3. explain confidence and gaps clearly
-`pi-research` does that without an external research service.
-## Best practices
-- use `fast` for short factual lookups
-- use `deep` for comparisons, conflicts, or unclear questions
-- use `code` for docs, repos, README-driven answers, and snippets
-- use `academic` for paper-heavy topics
-- set `options.requireAuthoritative: true` when source quality matters more than recall
-- use `options.format: json` when you need machine-readable output
-- add `options.files` when local docs matter
-- keep questions specific; vague prompts create noisy retrieval
-## What it does
-- searches the live web
-- scores and deduplicates sources
-- prefers official docs, READMEs, and papers when relevant
-- follows up when the first pass is not enough
-- extracts code blocks for code-focused questions
-- supports local files as additional sources
-- returns structured results with citations, confidence, conflicts, and gaps
-## What it is not
-- not a browser interaction tool
-- not an offline knowledge base
-- not a replacement for page navigation
-## Quick start
-```text
-What are the trade-offs between B-trees and LSM-trees?
-```
-```text
-Compare React Server Components with traditional SSR.
-```
+`pi-research` is an advanced grounding tool designed specifically for AI coding agents. It prevents agents from hallucinating API endpoints, guessing library versions, or inventing CVE details by injecting real-time, highly authoritative, and conflict-resolved web research directly into their context window.
-```text
-How do I add retries to a Node.js fetch wrapper?
-```
-## Modes
-| Mode | Best for |
-| --- | --- |
-| `fast` | quick answers with a quality floor |
-| `deep` | broader retrieval with follow-up rounds |
-| `code` | docs, READMEs, repositories, and code snippets |
-| `academic` | scholarly sources and paper-heavy topics |
-## Output
-The tool returns structured data including:
-- `answer`
-- `bullets`
-- `sources`
-- `citations`
-- `codeBlocks`
-- `confidence`
-- `confidenceScore`
-- `sufficient`
-- `authoritativeSourcesFound`
-- `openSubQuestions`
-- `missingAspects`
-- `conflictSummary`
-- `unverifiedClaims`
-- `sourceTypes`
-- `meta`
-## Public tool parameters
-- `query` — research question to answer
-- `mode` — `fast`, `deep`, `code`, or `academic`
-- `force` — bypass cached sufficiency checks
-- `isolate` — run without session/query cache reuse
-- `options.allowedSources` — prefer only the listed source hints
-- `options.requireAuthoritative` — bias toward authoritative sources
-- `options.maxTurns` — limit follow-up rounds
-- `options.maxSites` — limit how many sources are read
-- `options.minYear` / `options.maxYear` — constrain source dates
-- `options.preferRecent` — prefer newer sources
-- `options.files` — include local files as sources
-- `options.format` — output format: `markdown`, `json`, `table`, or `latex`
-- `options.deepResearchConfig` — depth/breadth/concurrency tuning for deeper runs
-## Example calls
-### Fast mode
-```text
-query: What is the difference between HTTP and HTTPS?
-mode: fast
-```
-### Deep mode
-```text
-query: Compare PostgreSQL and MySQL for multi-tenant SaaS
-mode: deep
-options:
-  preferRecent: true
-  maxTurns: 2
-```
+![community packs](docs/assets/pi-research-community.png)
-### Code mode
+## 💡 Why `pi-research`?
-```text
-query: How do I add retries to a Node.js fetch wrapper?
-mode: code
-```
+The world does not need just another "AI Search Engine"—there are plenty of massive, standalone research tools out there.
-### Academic mode
+Instead, `pi-research` was built specifically to solve a crucial problem in the **Agentic Workflow**: When an autonomous agent is deep in a coding loop, compiling errors, or debugging, it needs hard facts instantly without losing focus. Calling out to heavy external search services or trying to execute brittle Playwright scripts breaks the agent's flow, wastes context window tokens, and leads to hallucinations.
-```text
-query: Retrieval augmented generation evaluation methods
-mode: academic
-```
+`pi-research` solves this by providing a lightweight, internal **cognitive research loop** directly into the agent harness:
+1. **Agent-Centric Routing:** It knows exactly where developers look (GitHub, NPM, NIST, arXiv).
+2. **Authority First:** It prioritizes official documentation over random SEO-optimized tutorials.
+3. **Self-Awareness:** It extracts structured features to know when it lacks information, safely triggering follow-up questions *before* returning an answer to the agent.
-### Local files as sources
+Best of all? **Zero setup.** No external search API keys to configure, no heavy local LLMs to run, and no flaky browser automation scripts to maintain. It's built to run silently and reliably alongside your agent.
-```text
-query: Summarize the key points from these notes
-mode: fast
-options:
-  files:
-    - ./notes/project-notes.md
-    - ./docs/spec.md
-```
-## Domain packs
-Built-in packs now steer routing and source selection:
-- `web`
-- `github`
-- `security`
-- `papers`
-- `specs`
-- `changelog`
-- `forums`
-- `package-registry`
-- `vendor-status`
-## Community packs
-You can add your own domain pack without changing the core research engine:
-1. copy `lib/domains/template.js`
-2. implement your domain-specific `run(question, options)` logic
-3. register the pack in `lib/domains/index.js`
-4. add eval cases in `eval/cases/<your-domain>/`
-Starter example:
-```js
-export default {
-  name: "boxing-training",
-  sourceHints: ["web"],
-  async run(question) {
-    return {
-      claims: [
-        {
-          text: `Starter pack example for ${question}`,
-          evidence: [{ type: "web", source: "https://example.com", snippet: "Example" }],
-          confidence: "medium",
-        },
-      ],
-    };
-  },
-};
-```
+---
-## Eval
+## ✨ Features
-Run `npm run eval` to execute the eval harness.
+- 🚀 **Lightning Fast:** Powered by a Hybrid Tiny-Router Architecture (Model2Vec + SVC), routing queries in **< 0.6 milliseconds**.
+- 🛡️ **Anti-Hallucination:** Built-in Veto-Power for high-risk queries. If a security question only finds blog posts, the system forces a follow-up to find authoritative NIST/CVE data.
+- 🕸️ **Resilient Fetching:** Pre-emptively escalates blocked, JS-heavy, or thin pages through an integrated, robust Python `Scrapling` daemon (via IPC JSON-RPC 2.0).
+- 🧩 **Domain Packs:** Built-in heuristics for `github`, `security`, `papers`, `package-registry`, and more.
+- 📊 **Structured Outputs:** Returns citations, code blocks, missing aspects, confidence scores, and conflict summaries (e.g., "Source A contradicts Source B").
+- 📂 **Local Context:** Ingests local files (`options.files`) to ground web research in your current repository context.
-## Install
+---
-### Pi Coding Agent — extension
-Existing Pi users should keep installing the main package:
+## 📦 Installation
+### Pi Coding Agent (Extension)
+If you are using the Pi Agent harness, install the extension directly:
 ```bash
 pi install npm:pi-research
 ```
-This registers the Pi extension and keeps the public tool name `pi-research`.
-### npm install
-```bash
-npm i pi-research
-```
-This is the package install command that npm shows on the package page.
-### MCP-only — any agent
-Run the MCP server directly from npm:
-```bash
-npx -y pi-research
-```
-The MCP server identifies itself as `unblind-mcp`, but the tool it exposes is still named `pi-research`.
-### Global MCP install
+### Node.js / NPM (Standalone Server)
+Install it globally to expose the MCP (Model Context Protocol) server for any compatible AI agent:
 ```bash
 npm install -g pi-research
-unblind-mcp
-```
-The global install also provides `pi-research` as a CLI alias for the same MCP server:
-```bash
 pi-research
 ```
+*(The MCP server identifies itself as `unblind-mcp`, exposing the tool `pi-research`)*
-### Local development
-```bash
-node ./mcp/server.js
-```
+---
-Convenience script:
+## 🚀 Quick Start / Usage
-```bash
-npm run --silent mcp
-```
+Once installed, your agent has access to the `pi-research` tool. It accepts a `query`, a `mode`, and various `options`.
-Example MCP config:
+### Modes
+| Mode | Best for |
+| --- | --- |
+| `fast` | Quick factual lookups (e.g., "What is the latest LTS version of Node.js?"). Stops fetching early if authoritative sources are found. |
+| `deep` | Broader retrieval with automatic follow-up rounds. Perfect for comparisons, conflicts, or unclear architecture questions. |
+| `code` | Docs, repositories, README-driven answers, and retrieving actual code snippets. |
+| `academic` | Scholarly sources, DOI links, and paper-heavy topics. |
+### Example Tool Calls (For Agents)
+**Factual Lookup:**
 ```json
 {
-  "mcpServers": {
-    "unblind-mcp": {
-      "command": "npx",
-      "args": ["-y", "pi-research"]
-    }
-  }
+  "query": "React 19 RC release notes",
+  "mode": "fast",
+  "options": { "requireAuthoritative": true }
 }
 ```
-Local path config:
+**Architecture Research:**
 ```json
 {
-  "mcpServers": {
-    "unblind-mcp": {
-      "command": "node",
-      "args": ["/path/to/pi-research/mcp/server.js"]
-    }
-  }
+  "query": "Compare PostgreSQL and MySQL for multi-tenant SaaS",
+  "mode": "deep",
+  "options": { "preferRecent": true, "maxTurns": 2 }
 }
 ```
-Compatibility note: `mcp-server.js` remains as a deprecated root-level shim for older local configs.
+---
+## 🧠 Under the Hood: The Agentic Router Update (v1.4.0)
+With `1.4.0`, `pi-research` shifted from heavy, generative JSON-planners to a **Hybrid Tiny-Router Architecture**.
+- **Model2Vec & SVC:** Queries are classified via locally embedded features. Security and paper queries have a 0% downgrade rate.
+- **Structured ML:** Instead of asking a heavy LLM "Is this enough data?", the system extracts deterministic features (`has_authority`, `conflict_state`) and uses an ultra-fast Logistic Regression model to evaluate sufficiency and follow-up actions with 100% evaluated accuracy.
+- **Node.js-to-Python IPC:** Operates entirely locally using a highly optimized, line-delimited JSON-RPC daemon to manage Python dependencies (`Scrapling`, `Model2Vec`) without memory leaks.
+---
-### Future `unblind-mcp` package
+## 🛣️ Future Roadmap
-A separate npm package named `unblind-mcp` can be added later as a tiny wrapper around `pi-research`. It should depend on `pi-research` and start the same MCP server, not duplicate the engine.
+We are actively working on scaling the reasoning capabilities:
+- **LLM Data Augmentation (Weak Supervision):** Generating synthetic training data for underconfident domains to boost zero-shot accuracy to >95% without manual labeling.
+- **Active Learning Telemetry Loop:** Clustering low-confidence predictions from cache logs into a weakly-supervised retraining pipeline to let the system "self-heal."
+- **Cross-Encoder for Conflict Detection:** Transitioning to a fine-tuned Cross-Encoder (e.g., MiniLM + Natural Language Inference) to detect deep semantic contradiction across differing texts (e.g., recognizing that "Node 20 is stable" contradicts "Node 20 is broken").
-## Release notes
+---
-- Package name: `pi-research`
-- Version: `1.2.1`
-- Entry point: `extensions/pi-research.ts`
-- MCP entry point: `mcp/server.js`
-- MCP compatibility shim: `mcp-server.js`
-- License: MIT
-- Third-party notices: `THIRD_PARTY_NOTICES.md`
-- GitHub: `https://github.com/endgegnerbert-tech/pi-research`
+## 📝 License & Notices
+- **License:** MIT
+- **Third-party notices:** See `THIRD_PARTY_NOTICES.md`
+- **GitHub:** [https://github.com/endgegnerbert-tech/pi-research](https://github.com/endgegnerbert-tech/pi-research)

package/bin/pi-research.js CHANGED Viewed

File without changes

package/bin/unblind-mcp.js CHANGED Viewed

File without changes