pi-research 1.3.1 β 1.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +67 -250
- package/lib/page-fetch-adapter.js +311 -64
- package/lib/research-policy.js +36 -15
- package/lib/research-profiles.json +4 -0
- package/lib/research.js +15 -6
- package/lib/router-annotation.js +192 -0
- package/lib/router-structured-features.js +134 -0
- package/lib/tiny-router.js +338 -0
- package/lib/web-research.js +171 -10
- package/ml/models/conflict-structured/feature-names.json +22 -0
- package/ml/models/conflict-structured/meta.json +5 -0
- package/ml/models/conflict-structured/model.joblib +0 -0
- package/ml/models/domain/metrics.json +16 -0
- package/ml/models/domain/model.joblib +0 -0
- package/ml/models/domain-lr/metrics.json +16 -0
- package/ml/models/domain-lr/model.joblib +0 -0
- package/ml/models/followup/meta.json +3 -0
- package/ml/models/followup/model.joblib +0 -0
- package/ml/models/sufficiency-structured/feature-names.json +22 -0
- package/ml/models/sufficiency-structured/meta.json +5 -0
- package/ml/models/sufficiency-structured/model.joblib +0 -0
- package/ml/router/README.md +106 -0
- package/ml/router/__pycache__/features.cpython-314.pyc +0 -0
- package/ml/router/benchmark_latency.py +81 -0
- package/ml/router/daemon.py +140 -0
- package/ml/router/embed_model2vec.py +48 -0
- package/ml/router/evaluate_domain.py +67 -0
- package/ml/router/features.py +60 -0
- package/ml/router/requirements.txt +5 -0
- package/ml/router/train_classifier.py +57 -0
- package/ml/router/train_domain_classifier.py +209 -0
- package/ml/router/train_structured_baseline.py +174 -0
- package/package.json +5 -4
package/README.md
CHANGED
|
@@ -3,295 +3,112 @@
|
|
|
3
3
|

|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/pi-research)
|
|
6
|
-
[](https://github.com/endgegnerbert-tech/pi-research)
|
|
7
7
|
[](https://pi.ai)
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
It searches, ranks, compares, and synthesizes sources inside the agent.
|
|
9
|
+
**The Zero-Setup Research Engine for Autonomous AI Agents.**
|
|
11
10
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
## Why it exists
|
|
15
|
-
|
|
16
|
-
When agents answer well, they usually do three things:
|
|
17
|
-
|
|
18
|
-
1. search the right places
|
|
19
|
-
2. prefer authoritative sources
|
|
20
|
-
3. explain confidence and gaps clearly
|
|
21
|
-
|
|
22
|
-
`pi-research` does that without an external research service.
|
|
23
|
-
|
|
24
|
-
## Best practices
|
|
25
|
-
|
|
26
|
-
- use `fast` for short factual lookups
|
|
27
|
-
- use `deep` for comparisons, conflicts, or unclear questions
|
|
28
|
-
- use `code` for docs, repos, README-driven answers, and snippets
|
|
29
|
-
- use `academic` for paper-heavy topics
|
|
30
|
-
- set `options.requireAuthoritative: true` when source quality matters more than recall
|
|
31
|
-
- use `options.format: json` when you need machine-readable output
|
|
32
|
-
- add `options.files` when local docs matter
|
|
33
|
-
- keep questions specific; vague prompts create noisy retrieval
|
|
34
|
-
|
|
35
|
-
## What it does
|
|
36
|
-
|
|
37
|
-
- searches the live web
|
|
38
|
-
- scores and deduplicates sources
|
|
39
|
-
- prefers official docs, READMEs, and papers when relevant
|
|
40
|
-
- follows up when the first pass is not enough
|
|
41
|
-
- extracts code blocks for code-focused questions
|
|
42
|
-
- supports local files as additional sources
|
|
43
|
-
- returns structured results with citations, confidence, conflicts, and gaps
|
|
44
|
-
|
|
45
|
-
## What it is not
|
|
46
|
-
|
|
47
|
-
- not a browser interaction tool
|
|
48
|
-
- not an offline knowledge base
|
|
49
|
-
- not a replacement for page navigation
|
|
50
|
-
|
|
51
|
-
## Quick start
|
|
52
|
-
|
|
53
|
-
```text
|
|
54
|
-
What are the trade-offs between B-trees and LSM-trees?
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
```text
|
|
58
|
-
Compare React Server Components with traditional SSR.
|
|
59
|
-
```
|
|
11
|
+
`pi-research` is an advanced grounding tool designed specifically for AI coding agents. It prevents agents from hallucinating API endpoints, guessing library versions, or inventing CVE details by injecting real-time, highly authoritative, and conflict-resolved web research directly into their context window.
|
|
60
12
|
|
|
61
|
-
|
|
62
|
-
How do I add retries to a Node.js fetch wrapper?
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
## Modes
|
|
66
|
-
|
|
67
|
-
| Mode | Best for |
|
|
68
|
-
| --- | --- |
|
|
69
|
-
| `fast` | quick answers with a quality floor |
|
|
70
|
-
| `deep` | broader retrieval with follow-up rounds |
|
|
71
|
-
| `code` | docs, READMEs, repositories, and code snippets |
|
|
72
|
-
| `academic` | scholarly sources and paper-heavy topics |
|
|
73
|
-
|
|
74
|
-
## Output
|
|
75
|
-
|
|
76
|
-
The tool returns structured data including:
|
|
77
|
-
|
|
78
|
-
- `answer`
|
|
79
|
-
- `bullets`
|
|
80
|
-
- `sources`
|
|
81
|
-
- `citations`
|
|
82
|
-
- `codeBlocks`
|
|
83
|
-
- `confidence`
|
|
84
|
-
- `confidenceScore`
|
|
85
|
-
- `sufficient`
|
|
86
|
-
- `authoritativeSourcesFound`
|
|
87
|
-
- `openSubQuestions`
|
|
88
|
-
- `missingAspects`
|
|
89
|
-
- `conflictSummary`
|
|
90
|
-
- `unverifiedClaims`
|
|
91
|
-
- `sourceTypes`
|
|
92
|
-
- `meta`
|
|
93
|
-
|
|
94
|
-
## Public tool parameters
|
|
95
|
-
|
|
96
|
-
- `query` β research question to answer
|
|
97
|
-
- `mode` β `fast`, `deep`, `code`, or `academic`
|
|
98
|
-
- `force` β bypass cached sufficiency checks
|
|
99
|
-
- `isolate` β run without session/query cache reuse
|
|
100
|
-
- `options.allowedSources` β prefer only the listed source hints
|
|
101
|
-
- `options.requireAuthoritative` β bias toward authoritative sources
|
|
102
|
-
- `options.maxTurns` β limit follow-up rounds
|
|
103
|
-
- `options.maxSites` β limit how many sources are read
|
|
104
|
-
- `options.minYear` / `options.maxYear` β constrain source dates
|
|
105
|
-
- `options.preferRecent` β prefer newer sources
|
|
106
|
-
- `options.files` β include local files as sources
|
|
107
|
-
- `options.format` β output format: `markdown`, `json`, `table`, or `latex`
|
|
108
|
-
- `options.deepResearchConfig` β depth/breadth/concurrency tuning for deeper runs
|
|
109
|
-
|
|
110
|
-
## Example calls
|
|
111
|
-
|
|
112
|
-
### Fast mode
|
|
113
|
-
|
|
114
|
-
```text
|
|
115
|
-
query: What is the difference between HTTP and HTTPS?
|
|
116
|
-
mode: fast
|
|
117
|
-
```
|
|
118
|
-
|
|
119
|
-
### Deep mode
|
|
120
|
-
|
|
121
|
-
```text
|
|
122
|
-
query: Compare PostgreSQL and MySQL for multi-tenant SaaS
|
|
123
|
-
mode: deep
|
|
124
|
-
options:
|
|
125
|
-
preferRecent: true
|
|
126
|
-
maxTurns: 2
|
|
127
|
-
```
|
|
13
|
+

|
|
128
14
|
|
|
129
|
-
|
|
15
|
+
## π‘ Why `pi-research`?
|
|
130
16
|
|
|
131
|
-
|
|
132
|
-
query: How do I add retries to a Node.js fetch wrapper?
|
|
133
|
-
mode: code
|
|
134
|
-
```
|
|
17
|
+
The world does not need just another "AI Search Engine"βthere are plenty of massive, standalone research tools out there.
|
|
135
18
|
|
|
136
|
-
|
|
19
|
+
Instead, `pi-research` was built specifically to solve a crucial problem in the **Agentic Workflow**: When an autonomous agent is deep in a coding loop, compiling errors, or debugging, it needs hard facts instantly without losing focus. Calling out to heavy external search services or trying to execute brittle Playwright scripts breaks the agent's flow, wastes context window tokens, and leads to hallucinations.
|
|
137
20
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
21
|
+
`pi-research` solves this by providing a lightweight, internal **cognitive research loop** directly into the agent harness:
|
|
22
|
+
1. **Agent-Centric Routing:** It knows exactly where developers look (GitHub, NPM, NIST, arXiv).
|
|
23
|
+
2. **Authority First:** It prioritizes official documentation over random SEO-optimized tutorials.
|
|
24
|
+
3. **Self-Awareness:** It extracts structured features to know when it lacks information, safely triggering follow-up questions *before* returning an answer to the agent.
|
|
142
25
|
|
|
143
|
-
|
|
26
|
+
Best of all? **Zero setup.** No external search API keys to configure, no heavy local LLMs to run, and no flaky browser automation scripts to maintain. It's built to run silently and reliably alongside your agent.
|
|
144
27
|
|
|
145
|
-
|
|
146
|
-
query: Summarize the key points from these notes
|
|
147
|
-
mode: fast
|
|
148
|
-
options:
|
|
149
|
-
files:
|
|
150
|
-
- ./notes/project-notes.md
|
|
151
|
-
- ./docs/spec.md
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
## Domain packs
|
|
155
|
-
|
|
156
|
-
Built-in packs now steer routing and source selection:
|
|
157
|
-
|
|
158
|
-
- `web`
|
|
159
|
-
- `github`
|
|
160
|
-
- `security`
|
|
161
|
-
- `papers`
|
|
162
|
-
- `specs`
|
|
163
|
-
- `changelog`
|
|
164
|
-
- `forums`
|
|
165
|
-
- `package-registry`
|
|
166
|
-
- `vendor-status`
|
|
167
|
-
|
|
168
|
-
## Community packs
|
|
169
|
-
|
|
170
|
-
You can add your own domain pack without changing the core research engine:
|
|
171
|
-
|
|
172
|
-
1. copy `lib/domains/template.js`
|
|
173
|
-
2. implement your domain-specific `run(question, options)` logic
|
|
174
|
-
3. register the pack in `lib/domains/index.js`
|
|
175
|
-
4. add eval cases in `eval/cases/<your-domain>/`
|
|
176
|
-
|
|
177
|
-
Starter example:
|
|
178
|
-
|
|
179
|
-
```js
|
|
180
|
-
export default {
|
|
181
|
-
name: "boxing-training",
|
|
182
|
-
sourceHints: ["web"],
|
|
183
|
-
async run(question) {
|
|
184
|
-
return {
|
|
185
|
-
claims: [
|
|
186
|
-
{
|
|
187
|
-
text: `Starter pack example for ${question}`,
|
|
188
|
-
evidence: [{ type: "web", source: "https://example.com", snippet: "Example" }],
|
|
189
|
-
confidence: "medium",
|
|
190
|
-
},
|
|
191
|
-
],
|
|
192
|
-
};
|
|
193
|
-
},
|
|
194
|
-
};
|
|
195
|
-
```
|
|
28
|
+
---
|
|
196
29
|
|
|
197
|
-
##
|
|
30
|
+
## β¨ Features
|
|
198
31
|
|
|
199
|
-
|
|
32
|
+
- π **Lightning Fast:** Powered by a Hybrid Tiny-Router Architecture (Model2Vec + SVC), routing queries in **< 0.6 milliseconds**.
|
|
33
|
+
- π‘οΈ **Anti-Hallucination:** Built-in Veto-Power for high-risk queries. If a security question only finds blog posts, the system forces a follow-up to find authoritative NIST/CVE data.
|
|
34
|
+
- πΈοΈ **Resilient Fetching:** Pre-emptively escalates blocked, JS-heavy, or thin pages through an integrated, robust Python `Scrapling` daemon (via IPC JSON-RPC 2.0).
|
|
35
|
+
- π§© **Domain Packs:** Built-in heuristics for `github`, `security`, `papers`, `package-registry`, and more.
|
|
36
|
+
- π **Structured Outputs:** Returns citations, code blocks, missing aspects, confidence scores, and conflict summaries (e.g., "Source A contradicts Source B").
|
|
37
|
+
- π **Local Context:** Ingests local files (`options.files`) to ground web research in your current repository context.
|
|
200
38
|
|
|
201
|
-
|
|
39
|
+
---
|
|
202
40
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
Existing Pi users should keep installing the main package:
|
|
41
|
+
## π¦ Installation
|
|
206
42
|
|
|
43
|
+
### Pi Coding Agent (Extension)
|
|
44
|
+
If you are using the Pi Agent harness, install the extension directly:
|
|
207
45
|
```bash
|
|
208
46
|
pi install npm:pi-research
|
|
209
47
|
```
|
|
210
48
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
### npm install
|
|
214
|
-
|
|
215
|
-
```bash
|
|
216
|
-
npm i pi-research
|
|
217
|
-
```
|
|
218
|
-
|
|
219
|
-
This is the package install command that npm shows on the package page.
|
|
220
|
-
|
|
221
|
-
### MCP-only β any agent
|
|
222
|
-
|
|
223
|
-
Run the MCP server directly from npm:
|
|
224
|
-
|
|
225
|
-
```bash
|
|
226
|
-
npx -y pi-research
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
The MCP server identifies itself as `unblind-mcp`, but the tool it exposes is still named `pi-research`.
|
|
230
|
-
|
|
231
|
-
### Global MCP install
|
|
232
|
-
|
|
49
|
+
### Node.js / NPM (Standalone Server)
|
|
50
|
+
Install it globally to expose the MCP (Model Context Protocol) server for any compatible AI agent:
|
|
233
51
|
```bash
|
|
234
52
|
npm install -g pi-research
|
|
235
|
-
unblind-mcp
|
|
236
|
-
```
|
|
237
|
-
|
|
238
|
-
The global install also provides `pi-research` as a CLI alias for the same MCP server:
|
|
239
|
-
|
|
240
|
-
```bash
|
|
241
53
|
pi-research
|
|
242
54
|
```
|
|
55
|
+
*(The MCP server identifies itself as `unblind-mcp`, exposing the tool `pi-research`)*
|
|
243
56
|
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
```bash
|
|
247
|
-
node ./mcp/server.js
|
|
248
|
-
```
|
|
57
|
+
---
|
|
249
58
|
|
|
250
|
-
|
|
59
|
+
## π Quick Start / Usage
|
|
251
60
|
|
|
252
|
-
|
|
253
|
-
npm run --silent mcp
|
|
254
|
-
```
|
|
61
|
+
Once installed, your agent has access to the `pi-research` tool. It accepts a `query`, a `mode`, and various `options`.
|
|
255
62
|
|
|
256
|
-
|
|
63
|
+
### Modes
|
|
64
|
+
| Mode | Best for |
|
|
65
|
+
| --- | --- |
|
|
66
|
+
| `fast` | Quick factual lookups (e.g., "What is the latest LTS version of Node.js?"). Stops fetching early if authoritative sources are found. |
|
|
67
|
+
| `deep` | Broader retrieval with automatic follow-up rounds. Perfect for comparisons, conflicts, or unclear architecture questions. |
|
|
68
|
+
| `code` | Docs, repositories, README-driven answers, and retrieving actual code snippets. |
|
|
69
|
+
| `academic` | Scholarly sources, DOI links, and paper-heavy topics. |
|
|
257
70
|
|
|
71
|
+
### Example Tool Calls (For Agents)
|
|
72
|
+
**Factual Lookup:**
|
|
258
73
|
```json
|
|
259
74
|
{
|
|
260
|
-
"
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
"args": ["-y", "pi-research"]
|
|
264
|
-
}
|
|
265
|
-
}
|
|
75
|
+
"query": "React 19 RC release notes",
|
|
76
|
+
"mode": "fast",
|
|
77
|
+
"options": { "requireAuthoritative": true }
|
|
266
78
|
}
|
|
267
79
|
```
|
|
268
80
|
|
|
269
|
-
|
|
270
|
-
|
|
81
|
+
**Architecture Research:**
|
|
271
82
|
```json
|
|
272
83
|
{
|
|
273
|
-
"
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
"args": ["/path/to/pi-research/mcp/server.js"]
|
|
277
|
-
}
|
|
278
|
-
}
|
|
84
|
+
"query": "Compare PostgreSQL and MySQL for multi-tenant SaaS",
|
|
85
|
+
"mode": "deep",
|
|
86
|
+
"options": { "preferRecent": true, "maxTurns": 2 }
|
|
279
87
|
}
|
|
280
88
|
```
|
|
281
89
|
|
|
282
|
-
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## π§ Under the Hood: The Agentic Router Update (v1.4.0)
|
|
93
|
+
|
|
94
|
+
With `1.4.0`, `pi-research` shifted from heavy, generative JSON-planners to a **Hybrid Tiny-Router Architecture**.
|
|
95
|
+
|
|
96
|
+
- **Model2Vec & SVC:** Queries are classified via locally embedded features. Security and paper queries have a 0% downgrade rate.
|
|
97
|
+
- **Structured ML:** Instead of asking a heavy LLM "Is this enough data?", the system extracts deterministic features (`has_authority`, `conflict_state`) and uses an ultra-fast Logistic Regression model to evaluate sufficiency and follow-up actions with 100% evaluated accuracy.
|
|
98
|
+
- **Node.js-to-Python IPC:** Operates entirely locally using a highly optimized, line-delimited JSON-RPC daemon to manage Python dependencies (`Scrapling`, `Model2Vec`) without memory leaks.
|
|
99
|
+
|
|
100
|
+
---
|
|
283
101
|
|
|
284
|
-
|
|
102
|
+
## π£οΈ Future Roadmap
|
|
285
103
|
|
|
286
|
-
|
|
104
|
+
We are actively working on scaling the reasoning capabilities:
|
|
105
|
+
- **LLM Data Augmentation (Weak Supervision):** Generating synthetic training data for underconfident domains to boost zero-shot accuracy to >95% without manual labeling.
|
|
106
|
+
- **Active Learning Telemetry Loop:** Clustering low-confidence predictions from cache logs into a weakly-supervised retraining pipeline to let the system "self-heal."
|
|
107
|
+
- **Cross-Encoder for Conflict Detection:** Transitioning to a fine-tuned Cross-Encoder (e.g., MiniLM + Natural Language Inference) to detect deep semantic contradiction across differing texts (e.g., recognizing that "Node 20 is stable" contradicts "Node 20 is broken").
|
|
287
108
|
|
|
288
|
-
|
|
109
|
+
---
|
|
289
110
|
|
|
290
|
-
|
|
291
|
-
-
|
|
292
|
-
-
|
|
293
|
-
-
|
|
294
|
-
- MCP compatibility shim: `mcp-server.js`
|
|
295
|
-
- License: MIT
|
|
296
|
-
- Third-party notices: `THIRD_PARTY_NOTICES.md`
|
|
297
|
-
- GitHub: `https://github.com/endgegnerbert-tech/pi-research`
|
|
111
|
+
## π License & Notices
|
|
112
|
+
- **License:** MIT
|
|
113
|
+
- **Third-party notices:** See `THIRD_PARTY_NOTICES.md`
|
|
114
|
+
- **GitHub:** [https://github.com/endgegnerbert-tech/pi-research](https://github.com/endgegnerbert-tech/pi-research)
|