pi-research 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,295 +3,112 @@
3
3
  ![pi-research logo](docs/assets/pi-research-logo.png)
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/pi-research?color=blue)](https://www.npmjs.com/package/pi-research)
6
- [![tests](https://img.shields.io/badge/tests-56%2F56-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
6
+ [![tests](https://img.shields.io/badge/tests-121%2F121-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
7
7
  [![Pi package](https://img.shields.io/badge/pi-package-blueviolet)](https://pi.ai)
8
8
 
9
- `pi-research` is a Pi extension for grounded web research.
10
- It searches, ranks, compares, and synthesizes sources inside the agent.
9
+ **The Zero-Setup Research Engine for Autonomous AI Agents.**
11
10
 
12
- ![community packs](docs/assets/pi-research-community.png)
13
-
14
- ## Why it exists
15
-
16
- When agents answer well, they usually do three things:
17
-
18
- 1. search the right places
19
- 2. prefer authoritative sources
20
- 3. explain confidence and gaps clearly
21
-
22
- `pi-research` does that without an external research service.
23
-
24
- ## Best practices
25
-
26
- - use `fast` for short factual lookups
27
- - use `deep` for comparisons, conflicts, or unclear questions
28
- - use `code` for docs, repos, README-driven answers, and snippets
29
- - use `academic` for paper-heavy topics
30
- - set `options.requireAuthoritative: true` when source quality matters more than recall
31
- - use `options.format: json` when you need machine-readable output
32
- - add `options.files` when local docs matter
33
- - keep questions specific; vague prompts create noisy retrieval
34
-
35
- ## What it does
36
-
37
- - searches the live web
38
- - scores and deduplicates sources
39
- - prefers official docs, READMEs, and papers when relevant
40
- - follows up when the first pass is not enough
41
- - extracts code blocks for code-focused questions
42
- - supports local files as additional sources
43
- - returns structured results with citations, confidence, conflicts, and gaps
44
-
45
- ## What it is not
46
-
47
- - not a browser interaction tool
48
- - not an offline knowledge base
49
- - not a replacement for page navigation
50
-
51
- ## Quick start
52
-
53
- ```text
54
- What are the trade-offs between B-trees and LSM-trees?
55
- ```
56
-
57
- ```text
58
- Compare React Server Components with traditional SSR.
59
- ```
11
+ `pi-research` is an advanced grounding tool designed specifically for AI coding agents. It prevents agents from hallucinating API endpoints, guessing library versions, or inventing CVE details by injecting real-time, highly authoritative, and conflict-resolved web research directly into their context window.
60
12
 
61
- ```text
62
- How do I add retries to a Node.js fetch wrapper?
63
- ```
64
-
65
- ## Modes
66
-
67
- | Mode | Best for |
68
- | --- | --- |
69
- | `fast` | quick answers with a quality floor |
70
- | `deep` | broader retrieval with follow-up rounds |
71
- | `code` | docs, READMEs, repositories, and code snippets |
72
- | `academic` | scholarly sources and paper-heavy topics |
73
-
74
- ## Output
75
-
76
- The tool returns structured data including:
77
-
78
- - `answer`
79
- - `bullets`
80
- - `sources`
81
- - `citations`
82
- - `codeBlocks`
83
- - `confidence`
84
- - `confidenceScore`
85
- - `sufficient`
86
- - `authoritativeSourcesFound`
87
- - `openSubQuestions`
88
- - `missingAspects`
89
- - `conflictSummary`
90
- - `unverifiedClaims`
91
- - `sourceTypes`
92
- - `meta`
93
-
94
- ## Public tool parameters
95
-
96
- - `query` — research question to answer
97
- - `mode` — `fast`, `deep`, `code`, or `academic`
98
- - `force` — bypass cached sufficiency checks
99
- - `isolate` — run without session/query cache reuse
100
- - `options.allowedSources` — prefer only the listed source hints
101
- - `options.requireAuthoritative` — bias toward authoritative sources
102
- - `options.maxTurns` — limit follow-up rounds
103
- - `options.maxSites` — limit how many sources are read
104
- - `options.minYear` / `options.maxYear` — constrain source dates
105
- - `options.preferRecent` — prefer newer sources
106
- - `options.files` — include local files as sources
107
- - `options.format` — output format: `markdown`, `json`, `table`, or `latex`
108
- - `options.deepResearchConfig` — depth/breadth/concurrency tuning for deeper runs
109
-
110
- ## Example calls
111
-
112
- ### Fast mode
113
-
114
- ```text
115
- query: What is the difference between HTTP and HTTPS?
116
- mode: fast
117
- ```
118
-
119
- ### Deep mode
120
-
121
- ```text
122
- query: Compare PostgreSQL and MySQL for multi-tenant SaaS
123
- mode: deep
124
- options:
125
- preferRecent: true
126
- maxTurns: 2
127
- ```
13
+ ![community packs](docs/assets/pi-research-community.png)
128
14
 
129
- ### Code mode
15
+ ## 💡 Why `pi-research`?
130
16
 
131
- ```text
132
- query: How do I add retries to a Node.js fetch wrapper?
133
- mode: code
134
- ```
17
+ The world does not need just another "AI Search Engine"—there are plenty of massive, standalone research tools out there.
135
18
 
136
- ### Academic mode
19
+ Instead, `pi-research` was built specifically to solve a crucial problem in the **Agentic Workflow**: When an autonomous agent is deep in a coding loop, compiling errors, or debugging, it needs hard facts instantly without losing focus. Calling out to heavy external search services or trying to execute brittle Playwright scripts breaks the agent's flow, wastes context window tokens, and leads to hallucinations.
137
20
 
138
- ```text
139
- query: Retrieval augmented generation evaluation methods
140
- mode: academic
141
- ```
21
+ `pi-research` solves this by providing a lightweight, internal **cognitive research loop** directly into the agent harness:
22
+ 1. **Agent-Centric Routing:** It knows exactly where developers look (GitHub, NPM, NIST, arXiv).
23
+ 2. **Authority First:** It prioritizes official documentation over random SEO-optimized tutorials.
24
+ 3. **Self-Awareness:** It extracts structured features to know when it lacks information, safely triggering follow-up questions *before* returning an answer to the agent.
142
25
 
143
- ### Local files as sources
26
+ Best of all? **Zero setup.** No external search API keys to configure, no heavy local LLMs to run, and no flaky browser automation scripts to maintain. It's built to run silently and reliably alongside your agent.
144
27
 
145
- ```text
146
- query: Summarize the key points from these notes
147
- mode: fast
148
- options:
149
- files:
150
- - ./notes/project-notes.md
151
- - ./docs/spec.md
152
- ```
153
-
154
- ## Domain packs
155
-
156
- Built-in packs now steer routing and source selection:
157
-
158
- - `web`
159
- - `github`
160
- - `security`
161
- - `papers`
162
- - `specs`
163
- - `changelog`
164
- - `forums`
165
- - `package-registry`
166
- - `vendor-status`
167
-
168
- ## Community packs
169
-
170
- You can add your own domain pack without changing the core research engine:
171
-
172
- 1. copy `lib/domains/template.js`
173
- 2. implement your domain-specific `run(question, options)` logic
174
- 3. register the pack in `lib/domains/index.js`
175
- 4. add eval cases in `eval/cases/<your-domain>/`
176
-
177
- Starter example:
178
-
179
- ```js
180
- export default {
181
- name: "boxing-training",
182
- sourceHints: ["web"],
183
- async run(question) {
184
- return {
185
- claims: [
186
- {
187
- text: `Starter pack example for ${question}`,
188
- evidence: [{ type: "web", source: "https://example.com", snippet: "Example" }],
189
- confidence: "medium",
190
- },
191
- ],
192
- };
193
- },
194
- };
195
- ```
28
+ ---
196
29
 
197
- ## Eval
30
+ ## ✨ Features
198
31
 
199
- Run `npm run eval` to execute the eval harness.
32
+ - 🚀 **Lightning Fast:** Powered by a Hybrid Tiny-Router Architecture (Model2Vec + SVC), routing queries in **< 0.6 milliseconds**.
33
+ - 🛡️ **Anti-Hallucination:** Built-in Veto-Power for high-risk queries. If a security question only finds blog posts, the system forces a follow-up to find authoritative NIST/CVE data.
34
+ - 🕸️ **Resilient Fetching:** Pre-emptively escalates blocked, JS-heavy, or thin pages through an integrated, robust Python `Scrapling` daemon (via IPC JSON-RPC 2.0).
35
+ - 🧩 **Domain Packs:** Built-in heuristics for `github`, `security`, `papers`, `package-registry`, and more.
36
+ - 📊 **Structured Outputs:** Returns citations, code blocks, missing aspects, confidence scores, and conflict summaries (e.g., "Source A contradicts Source B").
37
+ - 📂 **Local Context:** Ingests local files (`options.files`) to ground web research in your current repository context.
200
38
 
201
- ## Install
39
+ ---
202
40
 
203
- ### Pi Coding Agent — extension
204
-
205
- Existing Pi users should keep installing the main package:
41
+ ## 📦 Installation
206
42
 
43
+ ### Pi Coding Agent (Extension)
44
+ If you are using the Pi Agent harness, install the extension directly:
207
45
  ```bash
208
46
  pi install npm:pi-research
209
47
  ```
210
48
 
211
- This registers the Pi extension and keeps the public tool name `pi-research`.
212
-
213
- ### npm install
214
-
215
- ```bash
216
- npm i pi-research
217
- ```
218
-
219
- This is the package install command that npm shows on the package page.
220
-
221
- ### MCP-only — any agent
222
-
223
- Run the MCP server directly from npm:
224
-
225
- ```bash
226
- npx -y pi-research
227
- ```
228
-
229
- The MCP server identifies itself as `unblind-mcp`, but the tool it exposes is still named `pi-research`.
230
-
231
- ### Global MCP install
232
-
49
+ ### Node.js / NPM (Standalone Server)
50
+ Install it globally to expose the MCP (Model Context Protocol) server for any compatible AI agent:
233
51
  ```bash
234
52
  npm install -g pi-research
235
- unblind-mcp
236
- ```
237
-
238
- The global install also provides `pi-research` as a CLI alias for the same MCP server:
239
-
240
- ```bash
241
53
  pi-research
242
54
  ```
55
+ *(The MCP server identifies itself as `unblind-mcp`, exposing the tool `pi-research`)*
243
56
 
244
- ### Local development
245
-
246
- ```bash
247
- node ./mcp/server.js
248
- ```
57
+ ---
249
58
 
250
- Convenience script:
59
+ ## 🚀 Quick Start / Usage
251
60
 
252
- ```bash
253
- npm run --silent mcp
254
- ```
61
+ Once installed, your agent has access to the `pi-research` tool. It accepts a `query`, a `mode`, and various `options`.
255
62
 
256
- Example MCP config:
63
+ ### Modes
64
+ | Mode | Best for |
65
+ | --- | --- |
66
+ | `fast` | Quick factual lookups (e.g., "What is the latest LTS version of Node.js?"). Stops fetching early if authoritative sources are found. |
67
+ | `deep` | Broader retrieval with automatic follow-up rounds. Perfect for comparisons, conflicts, or unclear architecture questions. |
68
+ | `code` | Docs, repositories, README-driven answers, and retrieving actual code snippets. |
69
+ | `academic` | Scholarly sources, DOI links, and paper-heavy topics. |
257
70
 
71
+ ### Example Tool Calls (For Agents)
72
+ **Factual Lookup:**
258
73
  ```json
259
74
  {
260
- "mcpServers": {
261
- "unblind-mcp": {
262
- "command": "npx",
263
- "args": ["-y", "pi-research"]
264
- }
265
- }
75
+ "query": "React 19 RC release notes",
76
+ "mode": "fast",
77
+ "options": { "requireAuthoritative": true }
266
78
  }
267
79
  ```
268
80
 
269
- Local path config:
270
-
81
+ **Architecture Research:**
271
82
  ```json
272
83
  {
273
- "mcpServers": {
274
- "unblind-mcp": {
275
- "command": "node",
276
- "args": ["/path/to/pi-research/mcp/server.js"]
277
- }
278
- }
84
+ "query": "Compare PostgreSQL and MySQL for multi-tenant SaaS",
85
+ "mode": "deep",
86
+ "options": { "preferRecent": true, "maxTurns": 2 }
279
87
  }
280
88
  ```
281
89
 
282
- Compatibility note: `mcp-server.js` remains as a deprecated root-level shim for older local configs.
90
+ ---
91
+
92
+ ## 🧠 Under the Hood: The Agentic Router Update (v1.4.0)
93
+
94
+ With `1.4.0`, `pi-research` shifted from heavy, generative JSON-planners to a **Hybrid Tiny-Router Architecture**.
95
+
96
+ - **Model2Vec & SVC:** Queries are classified via locally embedded features. Security and paper queries have a 0% downgrade rate.
97
+ - **Structured ML:** Instead of asking a heavy LLM "Is this enough data?", the system extracts deterministic features (`has_authority`, `conflict_state`) and uses an ultra-fast Logistic Regression model to evaluate sufficiency and follow-up actions with 100% evaluated accuracy.
98
+ - **Node.js-to-Python IPC:** Operates entirely locally using a highly optimized, line-delimited JSON-RPC daemon to manage Python dependencies (`Scrapling`, `Model2Vec`) without memory leaks.
99
+
100
+ ---
283
101
 
284
- ### Future `unblind-mcp` package
102
+ ## 🛣️ Future Roadmap
285
103
 
286
- A separate npm package named `unblind-mcp` can be added later as a tiny wrapper around `pi-research`. It should depend on `pi-research` and start the same MCP server, not duplicate the engine.
104
+ We are actively working on scaling the reasoning capabilities:
105
+ - **LLM Data Augmentation (Weak Supervision):** Generating synthetic training data for underconfident domains to boost zero-shot accuracy to >95% without manual labeling.
106
+ - **Active Learning Telemetry Loop:** Clustering low-confidence predictions from cache logs into a weakly-supervised retraining pipeline to let the system "self-heal."
107
+ - **Cross-Encoder for Conflict Detection:** Transitioning to a fine-tuned Cross-Encoder (e.g., MiniLM + Natural Language Inference) to detect deep semantic contradiction across differing texts (e.g., recognizing that "Node 20 is stable" contradicts "Node 20 is broken").
287
108
 
288
- ## Release notes
109
+ ---
289
110
 
290
- - Package name: `pi-research`
291
- - Version: `1.2.1`
292
- - Entry point: `extensions/pi-research.ts`
293
- - MCP entry point: `mcp/server.js`
294
- - MCP compatibility shim: `mcp-server.js`
295
- - License: MIT
296
- - Third-party notices: `THIRD_PARTY_NOTICES.md`
297
- - GitHub: `https://github.com/endgegnerbert-tech/pi-research`
111
+ ## 📝 License & Notices
112
+ - **License:** MIT
113
+ - **Third-party notices:** See `THIRD_PARTY_NOTICES.md`
114
+ - **GitHub:** [https://github.com/endgegnerbert-tech/pi-research](https://github.com/endgegnerbert-tech/pi-research)
File without changes
File without changes