npm - pi-research - Versions diffs - 1.0.2 → 1.1.2 - Mend

pi-research 1.0.2 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/README.md +102 -63
package/THIRD_PARTY_NOTICES.md +17 -0
package/index.js +13 -9
package/lib/domains/changelog.js +10 -0
package/lib/domains/forums.js +9 -0
package/lib/domains/github.js +9 -0
package/lib/domains/index.js +46 -0
package/lib/domains/package-registry.js +11 -0
package/lib/domains/papers.js +11 -0
package/lib/domains/security.js +11 -0
package/lib/domains/specs.js +11 -0
package/lib/domains/template.js +26 -0
package/lib/domains/vendor-status.js +10 -0
package/lib/domains/web.js +7 -0
package/lib/eval/case-loader.js +13 -0
package/lib/eval/runner.js +8 -0
package/lib/page-fetch-adapter.js +180 -0
package/lib/research-evidence.js +21 -0
package/lib/research-intent.js +20 -0
package/lib/research-output.js +7 -0
package/lib/research.js +44 -5
package/lib/types.js +2 -0
package/lib/web-research.js +57 -15
package/package.json +7 -4

package/README.md CHANGED Viewed

@@ -1,22 +1,36 @@
 # pi-research
+![pi-research logo](docs/assets/pi-research-logo.png)
 [![npm version](https://img.shields.io/npm/v/pi-research?color=blue)](https://www.npmjs.com/package/pi-research)
-[![tests](https://img.shields.io/badge/tests-33%2F33-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
+[![tests](https://img.shields.io/badge/tests-56%2F56-brightgreen)](https://github.com/endgegnerbert-tech/pi-research)
 [![Pi package](https://img.shields.io/badge/pi-package-blueviolet)](https://pi.ai)
-`pi-research` is a Pi extension for fast, local-first web research inside the agent.
+`pi-research` is a Pi extension for grounded web research.
+It searches, ranks, compares, and synthesizes sources inside the agent.
-It searches the live web, ranks sources, reads the most relevant pages, and synthesizes a grounded answer with citations.
-It does **not** require an external research API or API key, and it is not a browser automation tool.
+![community packs](docs/assets/pi-research-community.png)
 ## Why it exists
-Agents usually need two things to answer well:
+When agents answer well, they usually do three things:
+1. search the right places
+2. prefer authoritative sources
+3. explain confidence and gaps clearly
+`pi-research` does that without an external research service.
-1. a way to search the web efficiently
-2. a way to turn sources into a usable answer
+## Best practices
-`pi-research` does both inside Pi, so the agent can research topics without relying on a separate hosted research service.
+- use `fast` for short factual lookups
+- use `deep` for comparisons, conflicts, or unclear questions
+- use `code` for docs, repos, README-driven answers, and snippets
+- use `academic` for paper-heavy topics
+- set `options.requireAuthoritative: true` when source quality matters more than recall
+- use `options.format: json` when you need machine-readable output
+- add `options.files` when local docs matter
+- keep questions specific; vague prompts create noisy retrieval
 ## What it does
@@ -26,7 +40,7 @@ Agents usually need two things to answer well:
 - follows up when the first pass is not enough
 - extracts code blocks for code-focused questions
 - supports local files as additional sources
-- returns a structured result with citations and confidence metadata
+- returns structured results with citations, confidence, conflicts, and gaps
 ## What it is not
@@ -34,22 +48,6 @@ Agents usually need two things to answer well:
 - not an offline knowledge base
 - not a replacement for page navigation
-## Install
-### For Pi
-```bash
-pi install npm:pi-research
-```
-### For npm-based workflows
-```bash
-npm install pi-research
-```
-GitHub repository: https://github.com/endgegnerbert-tech/pi-research
 ## Quick start
 ```text
@@ -57,11 +55,11 @@ What are the trade-offs between B-trees and LSM-trees?
 ```
 ```text
-Show me the best way to add health checks to Docker Compose.
+Compare React Server Components with traditional SSR.
 ```
 ```text
-Compare React Server Components with traditional SSR.
+How do I add retries to a Node.js fetch wrapper?
 ```
 ## Modes
@@ -73,6 +71,26 @@ Compare React Server Components with traditional SSR.
 | `code` | docs, READMEs, repositories, and code snippets |
 | `academic` | scholarly sources and paper-heavy topics |
+## Output
+The tool returns structured data including:
+- `answer`
+- `bullets`
+- `sources`
+- `citations`
+- `codeBlocks`
+- `confidence`
+- `confidenceScore`
+- `sufficient`
+- `authoritativeSourcesFound`
+- `openSubQuestions`
+- `missingAspects`
+- `conflictSummary`
+- `unverifiedClaims`
+- `sourceTypes`
+- `meta`
 ## Public tool parameters
 - `query` — research question to answer
@@ -133,51 +151,72 @@ options:
     - ./docs/spec.md
 ```
-## Output
+## Domain packs
+Built-in packs now steer routing and source selection:
+- `web`
+- `github`
+- `security`
+- `papers`
+- `specs`
+- `changelog`
+- `forums`
+- `package-registry`
+- `vendor-status`
+## Community packs
+You can add your own domain pack without changing the core research engine:
+1. copy `lib/domains/template.js`
+2. implement your domain-specific `run(question, options)` logic
+3. register the pack in `lib/domains/index.js`
+4. add eval cases in `eval/cases/<your-domain>/`
+Starter example:
+```js
+export default {
+  name: "boxing-training",
+  sourceHints: ["web"],
+  async run(question) {
+    return {
+      claims: [
+        {
+          text: `Starter pack example for ${question}`,
+          evidence: [{ type: "web", source: "https://example.com", snippet: "Example" }],
+          confidence: "medium",
+        },
+      ],
+    };
+  },
+};
+```
-The tool returns structured data including:
+## Eval
-- `answer`
-- `bullets`
-- `sources`
-- `citations`
-- `codeBlocks`
-- `confidence`
-- `confidenceScore`
-- `sufficient`
-- `authoritativeSourcesFound`
-- `openSubQuestions`
-- `missingAspects`
-- `conflictSummary`
-- `unverifiedClaims`
-- `sourceTypes`
-- `meta`
+Run `npm run eval` to execute the eval harness.
-## How it works
+## Install
-- **query-isolated caching**: repeated identical research can be skipped when the previous result was already sufficient
-- **source scoring**: official docs, READMEs, papers, and local files are preferred over weak sources
-- **follow-up planning**: unclear or conflicting results trigger another round of research
-- **conflict detection**: opposing claims are surfaced explicitly
-- **fact checking**: unsupported answer sentences are marked as unverified
-- **local source input**: files can be added directly to the research context
+### For Pi
+```bash
+pi install npm:pi-research
+```
-## Limitations
+### For npm-based workflows
-- it still depends on live web access for web research
-- it does not browse pages like a human user
-- it is not fully offline unless you only use local files
-- it is not a browser interaction tool
+```bash
+npm install pi-research
+```
-## Package info
+## Release notes
 - Package name: `pi-research`
+- Version: `1.1.1`
 - Entry point: `extensions/pi-research.ts`
-- Tool name: `pi-research`
 - License: MIT
-## Release notes
-- Pi install: `pi install npm:pi-research`
-- npm install: `npm install pi-research`
+- Third-party notices: `THIRD_PARTY_NOTICES.md`
 - GitHub: `https://github.com/endgegnerbert-tech/pi-research`

package/THIRD_PARTY_NOTICES.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Third-Party Notices
+## Scrapling
+This project includes ideas and/or adapted implementation details from Scrapling.
+Copyright (c) 2024, Karim shoair
+BSD 3-Clause License
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

package/index.js CHANGED Viewed

@@ -7,7 +7,7 @@ import { runWebResearch } from "./lib/web-research.js";
 const RESEARCH_STATE = new Map();
 function buildWebResearchGuidance() {
-  return "Use pi-research for web search and research. Prefer fast mode for simple questions, deep mode for comparisons or ambiguous cases, and code/academic modes when source type matters.";
+  return "Use pi-research for current facts, docs, best practices, comparisons, and citations. Search if unsure.";
 }
 function defaultMode(query) {
@@ -104,15 +104,19 @@ export default function webResearchExtension(pi) {
   pi.registerTool({
     name: "pi-research",
-    label: "Pi Research",
-    description: "Search and research the web.",
-    promptSnippet: "Use this for web research when needed.",
-    promptGuidelines: ["Use pi-research for search, source ranking, and summarization."],
+    label: "Web Research",
+    description: "Live sources, ranking, and cited answers.",
+    promptSnippet: "Use for current or uncertain answers with citations.",
+    promptGuidelines: [
+      "Use for current facts, docs, best practices, comparisons, and verification.",
+      "Search instead of guessing.",
+      "Pick fast, deep, code, or academic mode as needed.",
+    ],
     parameters: Type.Object({
-      query: Type.String({ description: "Research question to answer from the web" }),
-      mode: Type.Optional(Type.Union([Type.Literal("fast"), Type.Literal("deep"), Type.Literal("code"), Type.Literal("academic")], { description: "Research mode", default: "fast" })),
-      force: Type.Optional(Type.Boolean({ description: "Bypass sufficiency gating and cached answers for this call" })),
-      isolate: Type.Optional(Type.Boolean({ description: "Run this query in isolation without session/query cache reuse" })),
+      query: Type.String({ description: "Live web question" }),
+      mode: Type.Optional(Type.Union([Type.Literal("fast"), Type.Literal("deep"), Type.Literal("code"), Type.Literal("academic")], { description: "Mode", default: "fast" })),
+      force: Type.Optional(Type.Boolean({ description: "Ignore cache" })),
+      isolate: Type.Optional(Type.Boolean({ description: "No cache reuse" })),
       options: Type.Optional(Type.Object({
         allowedSources: Type.Optional(Type.Array(Type.String())),
         maxTurns: Type.Optional(Type.Number()),

package/lib/domains/changelog.js ADDED Viewed

@@ -0,0 +1,10 @@
+export default {
+  name: "changelog",
+  sourceHints: ["changelog", "release notes", "releases"],
+  allowedSources: ["github.com", "docs.", "release notes"],
+  queryHints: ["release notes", "changelog", "site:github.com/releases"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "changelog" };
+  },
+};

package/lib/domains/forums.js ADDED Viewed

@@ -0,0 +1,9 @@
+export default {
+  name: "forums",
+  sourceHints: ["stackoverflow", "discourse", "reddit"],
+  allowedSources: ["stackoverflow.com", "discourse", "reddit.com"],
+  queryHints: ["site:stackoverflow.com", "discourse", "site:reddit.com"],
+  async run() {
+    return { name: "forums" };
+  },
+};

package/lib/domains/github.js ADDED Viewed

@@ -0,0 +1,9 @@
+export default {
+  name: "github",
+  sourceHints: ["issues", "discussions", "pull requests", "readme"],
+  allowedSources: ["github.com"],
+  queryHints: ["site:github.com", "issues", "discussions", "readme"],
+  async run() {
+    return { name: "github" };
+  },
+};

package/lib/domains/index.js ADDED Viewed

@@ -0,0 +1,46 @@
+import web from "./web.js";
+import github from "./github.js";
+import forums from "./forums.js";
+import security from "./security.js";
+import packageRegistry from "./package-registry.js";
+import changelog from "./changelog.js";
+import papers from "./papers.js";
+import specs from "./specs.js";
+import vendorStatus from "./vendor-status.js";
+const PACKS = {
+  web,
+  github,
+  forums,
+  security,
+  "package-registry": packageRegistry,
+  changelog,
+  papers,
+  specs,
+  "vendor-status": vendorStatus,
+};
+const DOMAIN_NAMES = ["web", "github", "security", "papers", "specs", "changelog", "forums", "package-registry", "vendor-status"];
+export function listDomainPacks() {
+  return [...DOMAIN_NAMES];
+}
+export function getDomainPack(name = "web") {
+  return PACKS[name] || web;
+}
+import { classifyQuestionDomain } from "../research-intent.js";
+export function resolveDomainConfig(questionOrDomain = "web") {
+  const name = PACKS[questionOrDomain] ? questionOrDomain : classifyQuestionDomain(questionOrDomain);
+  const pack = PACKS[name] || PACKS.web;
+  return {
+    domain: name,
+    allowedSources: pack.allowedSources || [],
+    allowedSourceTypes: pack.allowedSourceTypes || [],
+    queryHints: pack.queryHints || [],
+    requireAuthoritative: Boolean(pack.requireAuthoritative),
+    format: pack.format || "markdown",
+  };
+}

package/lib/domains/package-registry.js ADDED Viewed

@@ -0,0 +1,11 @@
+export default {
+  name: "package-registry",
+  sourceHints: ["npm", "pypi", "cargo", "maven"],
+  allowedSources: ["npmjs.com", "pypi.org", "crates.io", "mvnrepository.com"],
+  allowedSourceTypes: ["official_doc", "github_readme"],
+  queryHints: ["site:npmjs.com", "site:pypi.org", "site:crates.io", "site:mvnrepository.com"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "package-registry" };
+  },
+};

package/lib/domains/papers.js ADDED Viewed

@@ -0,0 +1,11 @@
+export default {
+  name: "papers",
+  sourceHints: ["arxiv", "semanticscholar", "doi"],
+  allowedSources: ["arxiv.org", "semanticscholar.org", "doi.org", "pubmed.ncbi.nlm.nih.gov"],
+  allowedSourceTypes: ["paper"],
+  queryHints: ["site:arxiv.org", "site:semanticscholar.org", "site:doi.org"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "papers" };
+  },
+};

package/lib/domains/security.js ADDED Viewed

@@ -0,0 +1,11 @@
+export default {
+  name: "security",
+  sourceHints: ["cve", "advisory", "security bulletin"],
+  allowedSources: ["nvd.nist.gov", "cisa.gov", "mitre.org", "ubuntu.com", "redhat.com", "debian.org", "suse.com"],
+  allowedSourceTypes: ["official_doc", "paper"],
+  queryHints: ["nvd", "cisa", "mitre", "advisory", "cve"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "security" };
+  },
+};

package/lib/domains/specs.js ADDED Viewed

@@ -0,0 +1,11 @@
+export default {
+  name: "specs",
+  sourceHints: ["rfc", "spec", "standard"],
+  allowedSources: ["rfc-editor.org", "datatracker.ietf.org", "w3.org"],
+  allowedSourceTypes: ["official_doc"],
+  queryHints: ["site:rfc-editor.org", "site:datatracker.ietf.org", "RFC"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "specs" };
+  },
+};

package/lib/domains/template.js ADDED Viewed

@@ -0,0 +1,26 @@
+export default {
+  name: "template",
+  description: "Minimal domain pack example for pi-research",
+  sourceHints: ["web"],
+  queryHints: ["site:example.com"],
+  async run(question, options) {
+    return {
+      claims: [
+        {
+          text: `This is a minimal example for a domain pack: ${question}`,
+          evidence: [
+            {
+              type: "web",
+              source: "https://example.com",
+              snippet: "Minimal example",
+            },
+          ],
+          confidence: "medium",
+          confidenceDescription: "Just an example",
+        },
+      ],
+      evidenceSummary: "Starter example only.",
+      sourceTypes: ["other"],
+    };
+  },
+};

package/lib/domains/vendor-status.js ADDED Viewed

@@ -0,0 +1,10 @@
+export default {
+  name: "vendor-status",
+  sourceHints: ["status", "incident", "outage"],
+  allowedSources: ["status", "statuspage.io", "status.github.com"],
+  queryHints: ["status page", "incident", "outage"],
+  requireAuthoritative: true,
+  async run() {
+    return { name: "vendor-status" };
+  },
+};

package/lib/domains/web.js ADDED Viewed

@@ -0,0 +1,7 @@
+export default {
+  name: "web",
+  sourceHints: ["official docs", "readme", "overview"],
+  async run() {
+    return { name: "web" };
+  },
+};

package/lib/eval/case-loader.js ADDED Viewed

@@ -0,0 +1,13 @@
+import { readdirSync, readFileSync } from "node:fs";
+import { join } from "node:path";
+export function loadEvalCases(domain) {
+  const dir = join(process.cwd(), "eval", "cases", domain);
+  try {
+    return readdirSync(dir)
+      .filter((file) => file.endsWith(".json"))
+      .map((file) => JSON.parse(readFileSync(join(dir, file), "utf8")));
+  } catch {
+    return [];
+  }
+}

package/lib/eval/runner.js ADDED Viewed

@@ -0,0 +1,8 @@
+import { loadEvalCases } from "./case-loader.js";
+export async function runEvalSuite({ domain }) {
+  const cases = loadEvalCases(domain);
+  const passed = cases.filter((item) => item.expectedDomain === domain).length;
+  const total = cases.length;
+  return { total, passed, passRate: total ? passed / total : 0 };
+}

package/lib/page-fetch-adapter.js ADDED Viewed

@@ -0,0 +1,180 @@
+import { spawn } from "node:child_process";
+import { fileURLToPath } from "node:url";
+import path from "node:path";
+const SCRAPLING_ROOT = fileURLToPath(new URL("../Scrapling", import.meta.url));
+const BLOCKED_PATTERNS = [
+  /cloudflare/i,
+  /turnstile/i,
+  /captcha/i,
+  /please enable cookies/i,
+  /bot detection/i,
+  /verify you are human/i,
+  /security check/i,
+];
+const DYNAMIC_PATTERNS = [
+  /__next_data__/i,
+  /__nuxt__/i,
+  /data-reactroot/i,
+  /hydrat/i,
+  /window\.__INITIAL_STATE__/i,
+  /id=["']app["']/i,
+  /id=["']root["']/i,
+];
+function stripHtml(value) {
+  return String(value || "")
+    .replace(/<script[\s\S]*?<\/script>/gi, " ")
+    .replace(/<style[\s\S]*?<\/style>/gi, " ")
+    .replace(/<noscript[\s\S]*?<\/noscript>/gi, " ")
+    .replace(/<[^>]+>/g, " ")
+    .replace(/&nbsp;/g, " ")
+    .replace(/\s+/g, " ")
+    .trim();
+}
+export function assessPageAttempt({ status = 200, body = "", contentType = "", url = "" } = {}) {
+  const text = String(body || "");
+  const plain = stripHtml(text);
+  const lower = `${text}\n${url}`.toLowerCase();
+  const antiBotSignal = BLOCKED_PATTERNS.some((pattern) => pattern.test(lower));
+  const blocked = status === 403 || status === 429 || (antiBotSignal && plain.length < 1000);
+  const dynamic = !blocked && (DYNAMIC_PATTERNS.some((pattern) => pattern.test(lower)) || (text.includes("<script") && plain.length < 400));
+  const weak = blocked || plain.length < 300 || (!/text\/(html|plain)/i.test(contentType) && plain.length < 500);
+  return {
+    blocked,
+    dynamic,
+    weak,
+    mode: blocked ? "stealthy" : dynamic ? "dynamic" : "async",
+    plainLength: plain.length,
+  };
+}
+export function chooseScraplingMode(input) {
+  return assessPageAttempt(input).mode;
+}
+function pythonScript() {
+  return String.raw`
+import asyncio
+import json
+import os
+import sys
+root = sys.argv[1]
+mode = sys.argv[2]
+url = sys.argv[3]
+payload = json.loads(sys.argv[4])
+sys.path.insert(0, root)
+async def main():
+    from scrapling.fetchers import AsyncFetcher, DynamicFetcher, StealthyFetcher
+    timeout = payload.get("timeout")
+    kwargs = {}
+    if timeout:
+        kwargs["timeout"] = timeout
+    if mode == "async":
+        response = await AsyncFetcher.get(url, **kwargs)
+    elif mode == "dynamic":
+        response = DynamicFetcher.fetch(url, **kwargs)
+    else:
+        response = StealthyFetcher.fetch(url, **kwargs)
+    headers = {}
+    raw_headers = getattr(response, "headers", None)
+    if hasattr(raw_headers, "items"):
+        headers = dict(raw_headers.items())
+    else:
+        try:
+            headers = dict(raw_headers or {})
+        except Exception:
+            headers = {}
+    body = getattr(response, "body", None)
+    if body is None:
+        candidate = getattr(response, "text", None)
+        body = candidate() if callable(candidate) else candidate
+    if isinstance(body, bytes):
+        body = body.decode("utf-8", "replace")
+    elif not isinstance(body, str):
+        body = str(body or "")
+    out = {
+        "ok": True,
+        "url": getattr(response, "url", url),
+        "status": getattr(response, "status", 200),
+        "contentType": headers.get("content-type", ""),
+        "body": body,
+        "headers": headers,
+    }
+    print(json.dumps(out))
+try:
+    asyncio.run(main())
+except Exception as exc:
+    print(json.dumps({"ok": False, "error": str(exc), "type": exc.__class__.__name__}))
+    sys.exit(1)
+`;
+}
+export async function fetchWithScrapling(url, mode, signal, config = {}) {
+  if (!mode) return null;
+  return await new Promise((resolve) => {
+    const child = spawn(process.env.PYTHON || "python3", ["-c", pythonScript(), SCRAPLING_ROOT, mode, url, JSON.stringify({ timeout: config.pageTimeoutMs || 30000 })], {
+      env: {
+        ...process.env,
+        PYTHONPATH: [SCRAPLING_ROOT, process.env.PYTHONPATH].filter(Boolean).join(path.delimiter),
+      },
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+    let stdout = "";
+    let stderr = "";
+    child.stdout.on("data", (chunk) => {
+      stdout += chunk;
+    });
+    child.stderr.on("data", (chunk) => {
+      stderr += chunk;
+    });
+    const finish = (value) => {
+      if (!signal) return resolve(value);
+      if (signal.aborted) return resolve(null);
+      return resolve(value);
+    };
+    child.on("error", () => finish(null));
+    child.on("close", (code) => {
+      if (code !== 0) return finish(null);
+      try {
+        const parsed = JSON.parse(stdout.trim() || "{}");
+        if (!parsed.ok) return finish(null);
+        return finish(parsed);
+      } catch {
+        if (stderr) return finish(null);
+        return finish(null);
+      }
+    });
+    if (signal) {
+      const abort = () => {
+        child.kill("SIGKILL");
+        finish(null);
+      };
+      if (signal.aborted) abort();
+      else signal.addEventListener("abort", abort, { once: true });
+    }
+  });
+}
+export const pageFetchAdapter = {
+  assessPageAttempt,
+  chooseScraplingMode,
+  fetchWithScrapling,
+};

package/lib/research-evidence.js ADDED Viewed

@@ -0,0 +1,21 @@
+export function createEvidence(evidence = {}) {
+  return {
+    type: evidence.type || "web",
+    source: evidence.source || "",
+    snippet: evidence.snippet || "",
+  };
+}
+export function createClaim(claim = {}) {
+  return {
+    text: claim.text || "",
+    confidence: claim.confidence || "low",
+    evidence: Array.isArray(claim.evidence) ? claim.evidence.map(createEvidence) : [],
+  };
+}
+export function explainConfidence(confidence = "low", evidenceCount = 0) {
+  if (confidence === "high" && evidenceCount >= 2) return "Multiple sources support this claim.";
+  if (confidence === "medium") return "Some supporting evidence was found.";
+  return "Limited supporting evidence was found.";
+}

package/lib/research-intent.js ADDED Viewed

@@ -0,0 +1,20 @@
+function text(value) {
+  return String(value || "").toLowerCase();
+}
+export function classifyQuestionDomain(question) {
+  const q = text(question);
+  if (/(cve-|cve\b|advisory|security|vulnerability|exploit)/.test(q)) return "security";
+  if (/(status page|status|outage|incident)/.test(q)) return "vendor-status";
+  if (/(changelog|release notes?|releases?|version history)/.test(q)) return "changelog";
+  if (/(github|issue|issues|pull request|repo\b|repository\b|discussions?)/.test(q)) return "github";
+  if (/(arxiv|paper|papers|study|(?<!pi-)research|scientific|scholar)/.test(q)) return "papers";
+  if (/(rfc|spec|specification|standard|standards)/.test(q)) return "specs";
+  if (/(stackoverflow|stack overflow|discourse|reddit|forum|forums)/.test(q)) return "forums";
+  if (/(npm|pypi|cargo|maven|package registry|package|library)/.test(q)) return "package-registry";
+  return "web";
+}
+export function normalizeResearchMode(input = {}, fallback = "fast") {
+  return input && typeof input === "object" && input.mode ? input.mode : fallback;
+}

package/lib/research-output.js ADDED Viewed

@@ -0,0 +1,7 @@
+export function resolveOutputFormat(input = {}, fallback = "markdown") {
+  return input && typeof input === "object" && input.format ? input.format : fallback;
+}
+export function shouldRequireAuthoritativeSources(input = {}, fallback = false) {
+  return Boolean(input && typeof input === "object" && input.requireAuthoritative) || Boolean(fallback);
+}

package/lib/research.js CHANGED Viewed

@@ -386,6 +386,25 @@ export function rankFetchedPages(pages, query, limit = pages.length, config = {}
   return [...pages].sort((a, b) => scoreFetchedPage(b, query, config) - scoreFetchedPage(a, query, config)).slice(0, limit);
 }
+export function detectClaimConflicts(claims = []) {
+  const texts = claims.map((claim) => String(claim?.text || claim || "").toLowerCase());
+  const hasPositive = texts.some((text) => /\b(supported|works|available|recommended|yes|stable|compatible)\b/.test(text));
+  const hasNegative = texts.some((text) => /\b(not supported|unsupported|does not|no support|broken|incompatible|removed)\b/.test(text));
+  return {
+    detected: hasPositive && hasNegative,
+    conflictSummary: hasPositive && hasNegative ? "Claims conflict." : "",
+  };
+}
+export function detectCoverageGaps(input = {}) {
+  const claims = Array.isArray(input.claims) ? input.claims : [];
+  const authoritativeSourcesFound = claims.some((claim) => Array.isArray(claim?.evidence) && claim.evidence.length > 0);
+  return {
+    detected: !authoritativeSourcesFound,
+    missingAspects: authoritativeSourcesFound ? [] : ["authoritative sources"],
+  };
+}
 export function detectConflictSignals(pages) {
   if (!Array.isArray(pages) || pages.length < 2) {
     return { detected: false, reason: null, conflictSummary: "", conflictingSourcePairs: [] };
@@ -592,15 +611,17 @@ export function extractCodeBlocks(text) {
 export function evaluateSufficiency(input, legacyPages, legacyConflictDetected = false) {
   const payload = typeof input === "string"
     ? { query: input, sources: legacyPages || [], conflictDetected: legacyConflictDetected }
-    : { query: input?.query || "", sources: input?.sources || [], conflictDetected: Boolean(input?.conflictDetected), confidence: input?.confidence, minSources: input?.minSources };
+    : { query: input?.query || "", sources: input?.sources || [], claims: input?.claims || [], conflictDetected: Boolean(input?.conflictDetected), confidence: input?.confidence, minSources: input?.minSources };
   const scoredSources = payload.sources.map((page) => scoreSourceEntry(page, payload.query || ""));
   const authoritativeCount = scoredSources.filter((scored) => Boolean(scored.authoritative)).length;
   const authoritativeSourcesFound = authoritativeCount > 0;
   const conflict = detectConflictSignals(payload.sources);
-  const conflictDetected = payload.conflictDetected || conflict.detected;
+  const claimConflict = detectClaimConflicts(payload.claims);
+  const coverage = detectCoverageGaps(payload);
+  const conflictDetected = payload.conflictDetected || conflict.detected || claimConflict.detected;
   const missingAspects = [];
-  if (!authoritativeSourcesFound) missingAspects.push("authoritative sources");
+  if (!authoritativeSourcesFound || coverage.detected) missingAspects.push("authoritative sources");
   if (conflictDetected) missingAspects.push("conflict resolution");
   if (!payload.sources.length) missingAspects.push("readable sources");
@@ -654,6 +675,16 @@ export function compactResearchPayload(payload) {
           ...(typeof source.local === "boolean" ? { local: source.local } : {}),
         }))
       : [],
+    claims: Array.isArray(payload.claims) ? payload.claims.slice(0, 8).map((claim) => ({
+      text: claim.text,
+      confidence: claim.confidence,
+      evidence: Array.isArray(claim.evidence) ? claim.evidence.slice(0, 5).map((evidence) => ({
+        type: evidence.type,
+        source: evidence.source,
+        snippet: evidence.snippet,
+      })) : [],
+    })) : [],
+    evidenceSummary: payload.evidenceSummary || "",
     sourceTypes: Array.isArray(payload.sourceTypes) ? payload.sourceTypes.slice(0, 8) : [],
     unverifiedClaims: Array.isArray(payload.unverifiedClaims) ? payload.unverifiedClaims.slice(0, 8) : [],
     meta: payload.meta && typeof payload.meta === "object" ? payload.meta : undefined,
@@ -675,12 +706,20 @@ export function extractPageSnapshot(html, url) {
   return { title, url, text: stripTags(body), codeBlocks: extractCodeBlocks(html) };
 }
-export function formatResearchResponse({ answer, bullets, sources, confidence }) {
+export function formatResearchResponse({ answer, bullets, sources, confidence, format = "markdown" }) {
+  const list = Array.isArray(sources) ? sources : [];
+  if (format === "json") {
+    return JSON.stringify({ answer: String(answer || "").trim(), bullets: bullets || [], confidence: confidence || "", sources: list });
+  }
+  if (format === "table") {
+    const rows = list.map((source, index) => `| ${index + 1} | ${source.title} | ${source.url} |`).join("\n");
+    return ["| # | Title | URL |", "|---|---|---|", rows].filter(Boolean).join("\n").trim();
+  }
   const parts = ["## Answer", "", String(answer || "").trim(), "", "## Key points"];
   for (const bullet of bullets || []) parts.push(`- ${bullet}`);
   if (confidence) parts.push("", "## Confidence", "", confidence);
   parts.push("", "## Sources");
-  (sources || []).forEach((source, index) => {
+  list.forEach((source, index) => {
     const freshness = source.freshness ? ` (${source.freshness})` : "";
     const meta = [];
     if (source.sourceType) meta.push(source.sourceType);

package/lib/types.js CHANGED Viewed

@@ -36,6 +36,8 @@ export function createResearchResult(result = {}) {
     bullets: Array.isArray(result.bullets) ? result.bullets : [],
     citations: Array.isArray(result.citations) ? result.citations : [],
     sources: Array.isArray(result.sources) ? result.sources.map(createResearchSource) : [],
+    claims: Array.isArray(result.claims) ? result.claims : [],
+    evidenceSummary: result.evidenceSummary || "",
     codeBlocks: Array.isArray(result.codeBlocks) ? result.codeBlocks : [],
     sufficient: Boolean(result.sufficient),
     missingAspects: Array.isArray(result.missingAspects) ? result.missingAspects : [],

package/lib/web-research.js CHANGED Viewed

@@ -5,6 +5,8 @@ import { complete } from "@mariozechner/pi-ai";
 import profiles from "./research-profiles.json" with { type: "json" };
 import { createResearchResult } from "./types.js";
+import { resolveDomainConfig } from "./domains/index.js";
+import { classifyQuestionDomain } from "./research-intent.js";
 import {
   buildConfidenceSummary,
   buildDeepQueries,
@@ -33,6 +35,8 @@ import {
   scoreSourceEntry,
   selectRelevantChunks,
 } from "./research.js";
+import { pageFetchAdapter } from "./page-fetch-adapter.js";
+import { resolveOutputFormat, shouldRequireAuthoritativeSources } from "./research-output.js";
 import { planResearch } from "./planner.js";
 import {
   clearResearchMemory,
@@ -79,15 +83,18 @@ export function resolveResearchConfig(input = "fast") {
   const options = normalizeResearchOptions(input);
   const base = profiles[options.mode] || profiles.fast;
   const deep = options.deepResearchConfig || {};
+  const domainConfig = resolveDomainConfig(options.domain || "web");
   return {
     ...base,
+    ...domainConfig,
     ...options,
     mode: base.mode,
     maxTurns: options.maxTurns ?? (deep.depth ? Math.max(base.maxTurns || 1, deep.depth) : (base.maxTurns || 1)),
     maxQueries: options.maxQueries ?? (deep.breadth ? Math.max(base.maxQueries || 2, deep.breadth * (deep.depth || 1)) : (base.maxQueries || 2)),
     maxPages: options.maxSites ?? options.maxPages ?? base.maxPages,
-    allowedSourceTypes: options.allowedSourceTypes ?? base.allowedSourceTypes,
+    allowedSourceTypes: options.allowedSourceTypes ?? (Array.isArray(domainConfig.allowedSourceTypes) && domainConfig.allowedSourceTypes.length ? domainConfig.allowedSourceTypes : base.allowedSourceTypes),
+    allowedSources: options.allowedSources ?? (Array.isArray(domainConfig.allowedSources) && domainConfig.allowedSources.length ? domainConfig.allowedSources : base.allowedSources),
     searchProvider: options.searchProvider ?? base.searchProvider,
     concurrentQueries: deep.concurrency ?? options.concurrentQueries ?? 3,
     depth: deep.depth ?? 1,
@@ -101,7 +108,10 @@ export function resolveResearchConfig(input = "fast") {
     files: Array.isArray(options.files) ? options.files : [],
     isolate: Boolean(options.isolate || process.env.RESEARCH_ISOLATE === "1"),
     force: Boolean(options.force),
-    format: options.format ?? "markdown",
+    format: resolveOutputFormat(options, domainConfig.format || "markdown"),
+    queryHints: Array.isArray(domainConfig.queryHints) ? domainConfig.queryHints : [],
+    requireAuthoritative: Boolean(options.requireAuthoritative ?? domainConfig.requireAuthoritative),
+    domain: domainConfig.domain,
   };
 }
@@ -150,8 +160,11 @@ async function completeWithResearchModel(ctx, signal, prompt, reasoningEffort =
 export async function buildQueries(query, mode = "fast", ctx, signal) {
   const config = getResearchConfig(mode);
+  const hintedQueries = Array.isArray(config.queryHints) && config.queryHints.length
+    ? config.queryHints.map((hint) => `${query} ${hint}`)
+    : [];
   if (config.mode === "code") {
-    return planResearch(query, "code").subqueries.slice(0, config.maxQueries);
+    return [...new Set([...planResearch(query, "code").subqueries, ...hintedQueries])].slice(0, config.maxQueries);
   }
   if (config.mode === "deep" || config.mode === "academic") {
     const prompt = [
@@ -165,15 +178,15 @@ export async function buildQueries(query, mode = "fast", ctx, signal) {
     try {
       const text = await completeWithResearchModel(ctx, signal, prompt, "low");
-      if (text) return parseDeepQueryPlan(text, query, config.maxQueries);
+      if (text) return [...new Set([...parseDeepQueryPlan(text, query, config.maxQueries), ...hintedQueries])].slice(0, config.maxQueries);
     } catch {
       // fall through
     }
-    return buildDeepQueries(query, config.maxQueries);
+    return [...new Set([...buildDeepQueries(query, config.maxQueries), ...hintedQueries])].slice(0, config.maxQueries);
   }
-  return buildFastQueries(query, config.maxQueries);
+  return [...new Set([...buildFastQueries(query, config.maxQueries), ...hintedQueries])].slice(0, config.maxQueries);
 }
 function withTimeoutSignal(signal, timeoutMs) {
@@ -334,7 +347,11 @@ function shouldSkipUrl(url) {
 }
 function shouldUseJinaFirst(url) {
-  return /(^|\.)medium\.com$|(^|\.)dev\.to$|(^|\.)substack\.com$/i.test(new URL(url).hostname);
+  try {
+    return /(^|\.)medium\.com$|(^|\.)dev\.to$|(^|\.)substack\.com$/i.test(new URL(url).hostname);
+  } catch {
+    return false;
+  }
 }
 function pageFromText(title, url, text, config, extra = {}) {
@@ -366,6 +383,7 @@ function withinTimeframe(page, config) {
 export async function fetchPageSource(url, signal, config = getResearchConfig()) {
   if (shouldSkipUrl(url)) return null;
+  const adapter = config.fetchAdapter || pageFetchAdapter;
   const cacheKey = `${normalizeUrl(url)}::${config.pageTextLimit}::${JSON.stringify({
     preferRecent: config.preferRecent || false,
     minYear: config.minYear || "",
@@ -374,9 +392,12 @@ export async function fetchPageSource(url, signal, config = getResearchConfig())
   })}`;
   const cached = config.isolate ? null : getCacheValue(pageCache, cacheKey);
   if (cached) return cached;
   if (shouldUseJinaFirst(url)) {
     const first = await fetchJinaPageSource(url, signal, config);
-    return config.isolate ? first : setCacheValue(pageCache, cacheKey, first, PAGE_CACHE_TTL_MS);
+    if (first && withinTimeframe(first, config)) {
+      return config.isolate ? first : setCacheValue(pageCache, cacheKey, first, PAGE_CACHE_TTL_MS);
+    }
   }
   try {
@@ -390,12 +411,31 @@ export async function fetchPageSource(url, signal, config = getResearchConfig())
     const body = await response.text();
     const snapshot = extractPageSnapshot(body, response.url || url);
-    const page = pageFromText(snapshot.title, snapshot.url, snapshot.text, config, {
+    let page = pageFromText(snapshot.title, snapshot.url, snapshot.text, config, {
       publishDate: extractPublishDate(body),
       sourceType: classifySourceType(snapshot.url, snapshot.title),
       codeBlocks: snapshot.codeBlocks,
     });
+    const assessment = adapter.assessPageAttempt?.({
+      status: response.status ?? 200,
+      body,
+      contentType,
+      url: response.url || url,
+    });
+    if ((!page && assessment?.weak) || assessment?.dynamic || assessment?.blocked) {
+      const scrapling = await adapter.fetchWithScrapling?.(url, assessment.mode, signal, config);
+      if (scrapling?.body) {
+        const scraplingSnapshot = extractPageSnapshot(scrapling.body, scrapling.url || url);
+        page = pageFromText(scraplingSnapshot.title, scraplingSnapshot.url, scraplingSnapshot.text, config, {
+          publishDate: extractPublishDate(scrapling.body),
+          sourceType: classifySourceType(scraplingSnapshot.url, scraplingSnapshot.title),
+          codeBlocks: scraplingSnapshot.codeBlocks,
+        });
+      }
+    }
     const resolved = page || await fetchJinaPageSource(url, signal, config);
     const finalPage = resolved && withinTimeframe(resolved, config) ? resolved : null;
     return config.isolate ? finalPage : setCacheValue(pageCache, cacheKey, finalPage, PAGE_CACHE_TTL_MS);
@@ -499,8 +539,8 @@ function planSubqueries(rootQuery, currentQuery, config, sufficiency) {
   return [...new Set(queries.filter(Boolean))].slice(0, Math.max(1, config.breadth || 2));
 }
-function formatResultText(result) {
-  return formatResearchResponse({ answer: result.answer, bullets: result.bullets, sources: result.sources, confidence: result.confidence });
+function formatResultText(result, format) {
+  return formatResearchResponse({ answer: result.answer, bullets: result.bullets, sources: result.sources, confidence: result.confidence, format });
 }
 function modeCacheKey(query, config) {
@@ -520,7 +560,8 @@ function modeCacheKey(query, config) {
 }
 export async function runWebResearch(query, ctx, signal, onUpdate, mode = "fast") {
-  const config = getResearchConfig(mode);
+  const domain = classifyQuestionDomain(query);
+  const config = getResearchConfig(typeof mode === "object" ? { ...mode, domain } : { mode, domain });
   const cacheKey = modeCacheKey(query, config);
   if (!config.isolate && !config.force) {
@@ -546,7 +587,7 @@ export async function runWebResearch(query, ctx, signal, onUpdate, mode = "fast"
   let conflictSummary = "";
   let conflictingSourcePairs = [];
   let sufficiency = { sufficient: false, confidenceScore: 0.1, missingAspects: [], openSubQuestions: [] };
-  let currentQueries = await buildQueries(query, config.mode, ctx, signal);
+  let currentQueries = await buildQueries(query, config, ctx, signal);
   subqueries = [...currentQueries];
   const localPages = await readLocalFiles(config.files || [], config);
@@ -665,7 +706,7 @@ export async function runWebResearch(query, ctx, signal, onUpdate, mode = "fast"
     citations: synthesis.citations || [],
     sources,
     codeBlocks,
-    sufficient: sufficiency.sufficient && unverifiedRatio <= 0.2,
+    sufficient: sufficiency.sufficient && unverifiedRatio <= 0.2 && (!shouldRequireAuthoritativeSources(config) || sufficiency.authoritativeSourcesFound),
     missingAspects: sufficiency.missingAspects,
     openSubQuestions,
     conflictSummary: conflictSummary || sufficiency.conflictSummary || "",
@@ -698,6 +739,7 @@ export async function runWebResearch(query, ctx, signal, onUpdate, mode = "fast"
     sources: normalizedResult.sources,
     sourceTypes,
     codeBlocks: normalizedResult.codeBlocks,
+    format: config.format,
     confidence,
     meta: normalizedResult.meta,
     confidenceScore: sufficiency.confidenceScore,
@@ -707,7 +749,7 @@ export async function runWebResearch(query, ctx, signal, onUpdate, mode = "fast"
     openSubQuestions: normalizedResult.openSubQuestions,
     missingAspects: normalizedResult.missingAspects,
     unverifiedClaims: normalizedResult.unverifiedClaims,
-    contentText: formatResultText({ answer: normalizedResult.answer, bullets: normalizedResult.bullets, sources: normalizedResult.sources, confidence }),
+    contentText: formatResultText({ answer: normalizedResult.answer, bullets: normalizedResult.bullets, sources: normalizedResult.sources, confidence }, config.format),
   };
   setResearchMemory(cacheKey, result);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-research",
-  "version": "1.0.2",
+  "version": "1.1.2",
   "private": false,
   "type": "module",
   "description": "Pi extension for web research.",
@@ -11,6 +11,7 @@
     "index.js",
     "lib",
     "README.md",
+    "THIRD_PARTY_NOTICES.md",
     "package.json"
   ],
   "repository": {
@@ -25,11 +26,13 @@
     "pi-package"
   ],
   "scripts": {
-    "test": "node --test"
+    "test": "node --test",
+    "eval": "node --test test/eval-runner.test.js"
   },
   "dependencies": {
-    "@mariozechner/pi-ai": "^0.69.0",
-    "typebox": "^1.1.32"
+    "@mariozechner/pi-ai": "*",
+    "pi-research": "^1.0.2",
+    "typebox": "*"
   },
   "peerDependencies": {
     "@mariozechner/pi-ai": "*",