@scira/cli 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +128 -0
- package/dist/agent/research-agent.js +253 -0
- package/dist/agent/skills.js +265 -0
- package/dist/agent/tools.js +429 -0
- package/dist/agent/tools.test.js +27 -0
- package/dist/cli/commands/init.js +370 -0
- package/dist/cli/index.js +445 -0
- package/dist/cli/shell/shell.js +76 -0
- package/dist/cli/shell/tui.js +11 -0
- package/dist/config/env-store.js +47 -0
- package/dist/config/load-config.js +58 -0
- package/dist/export/formatters.js +37 -0
- package/dist/providers/llm/gateway.js +64 -0
- package/dist/providers/llm/huggingface.js +33 -0
- package/dist/providers/llm/models.js +97 -0
- package/dist/providers/llm/readiness.js +50 -0
- package/dist/providers/llm/registry.js +56 -0
- package/dist/storage/jsonl.js +29 -0
- package/dist/storage/jsonl.test.js +38 -0
- package/dist/storage/run-store.js +134 -0
- package/dist/storage/run-store.test.js +65 -0
- package/dist/tools/chrome-devtools-mcp.js +61 -0
- package/dist/tools/file-tools.js +128 -0
- package/dist/tools/mcp-bridge.js +118 -0
- package/dist/tools/mcp-oauth.js +276 -0
- package/dist/tools/open-url.js +99 -0
- package/dist/tools/search-web.js +153 -0
- package/dist/types/index.js +91 -0
- package/dist/types/schema.test.js +60 -0
- package/dist/ui/ink/SciraApp.js +274 -0
- package/dist/ui/ink/components/effects.js +44 -0
- package/dist/ui/ink/components/home-screen.js +69 -0
- package/dist/ui/ink/components/overlays.js +111 -0
- package/dist/ui/ink/constants.js +56 -0
- package/dist/ui/ink/hooks/use-agent-turn.js +186 -0
- package/dist/ui/ink/hooks/use-feed-lines.js +186 -0
- package/dist/ui/ink/hooks/use-feed.js +69 -0
- package/dist/ui/ink/hooks/use-keyboard.js +315 -0
- package/dist/ui/ink/hooks/use-mouse.js +31 -0
- package/dist/ui/ink/hooks/use-session.js +103 -0
- package/dist/ui/ink/hooks/use-settings.js +155 -0
- package/dist/ui/ink/hooks/use-submit.js +366 -0
- package/dist/ui/ink/hooks/use-suggestions.js +91 -0
- package/dist/ui/ink/lib/file-mentions.js +71 -0
- package/dist/ui/ink/lib/markdown.js +245 -0
- package/dist/ui/ink/lib/utils.js +224 -0
- package/dist/ui/ink/session-manager.js +160 -0
- package/dist/ui/ink/types.js +1 -0
- package/dist/utils/ids.js +15 -0
- package/dist/utils/markdown-joiner.js +249 -0
- package/dist/watch/runner.js +65 -0
- package/package.json +74 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Zaid Mukaddam
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# Scira CLI
|
|
2
|
+
|
|
3
|
+
Terminal-native AI research and coding agent. Ask a question, get a grounded report with cited sources and verified claims — all stored locally and inspectable.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
npm install -g @scira/cli
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Requires **Node.js ≥ 20**. Run the interactive setup:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
scira init
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
This walks you through API keys and configuration. Keys go in `~/.scira/.env` so they work from any directory.
|
|
18
|
+
|
|
19
|
+
Check your setup:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
scira doctor
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Quickstart
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Interactive TUI (home screen with session history)
|
|
29
|
+
scira
|
|
30
|
+
|
|
31
|
+
# Headless run — writes a report to .scira/runs/<id>/report.md
|
|
32
|
+
scira "compare browser automation tools in 2025"
|
|
33
|
+
|
|
34
|
+
# Interactive TUI for a specific question
|
|
35
|
+
scira new "history of the Silk Road" --tui
|
|
36
|
+
|
|
37
|
+
# Classic readline shell for a specific question
|
|
38
|
+
scira new "history of the Silk Road" --shell
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Setup
|
|
42
|
+
|
|
43
|
+
Put your API keys in `~/.scira/.env` (loaded automatically from any working directory):
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
mkdir -p ~/.scira && cp .env.example ~/.scira/.env
|
|
47
|
+
# then edit ~/.scira/.env
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Commands
|
|
51
|
+
|
|
52
|
+
| Command | Description |
|
|
53
|
+
|---|---|
|
|
54
|
+
| `scira init` | Interactive setup for API keys and configuration |
|
|
55
|
+
| `scira [question]` | Open TUI home, or run headlessly if a question is given |
|
|
56
|
+
| `scira new <question>` | Start a run; add `--tui` or `--shell` to open interactive UI |
|
|
57
|
+
| `scira resume <run-id>` | Resume a run; add `--tui` or `--shell` to specify UI |
|
|
58
|
+
| `scira list` | List all runs |
|
|
59
|
+
| `scira show <run-id>` | Print run status (sources, claims, report state) |
|
|
60
|
+
| `scira run <run-id>` | Re-run the research agent on an existing run |
|
|
61
|
+
| `scira verify <run-id>` | Print the claim verification report |
|
|
62
|
+
| `scira export <run-id>` | Export report (md, json, or csv) with `--format` and `--output` |
|
|
63
|
+
| `scira mcp list` | List configured MCP servers |
|
|
64
|
+
| `scira mcp add <transport> <name> <target>` | Add an MCP server (stdio, sse, or http) |
|
|
65
|
+
| `scira mcp oauth <name>` | Run OAuth PKCE flow for an MCP server |
|
|
66
|
+
| `scira mcp enable <name>` | Enable an MCP server |
|
|
67
|
+
| `scira mcp disable <name>` | Disable an MCP server |
|
|
68
|
+
| `scira mcp remove <name>` | Remove an MCP server from config |
|
|
69
|
+
| `scira watch <goal>` | Monitor a topic on a schedule with diffing |
|
|
70
|
+
| `scira models [--provider <p>]` | List available AI Gateway models |
|
|
71
|
+
| `scira config` | Print the resolved config |
|
|
72
|
+
| `scira doctor` | Check credentials and environment |
|
|
73
|
+
|
|
74
|
+
## Configuration
|
|
75
|
+
|
|
76
|
+
Config merges `~/.scira/config.json` (global) with `.scira/config.json` (project). All fields are optional.
|
|
77
|
+
|
|
78
|
+
```json
|
|
79
|
+
{
|
|
80
|
+
"model": "deepseek/deepseek-v4-flash",
|
|
81
|
+
"approvalMode": "suggest",
|
|
82
|
+
"runDirectory": ".scira/runs",
|
|
83
|
+
"maxSources": 20,
|
|
84
|
+
"citationPolicy": "strict",
|
|
85
|
+
"search": {
|
|
86
|
+
"provider": "exa",
|
|
87
|
+
"maxResults": 8,
|
|
88
|
+
"includeDomains": [],
|
|
89
|
+
"excludeDomains": []
|
|
90
|
+
}
|
|
91
|
+
}
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
| Field | Default | Description |
|
|
95
|
+
|---|---|---|
|
|
96
|
+
| `model` | `deepseek/deepseek-v4-flash` | AI Gateway model ID |
|
|
97
|
+
| `approvalMode` | `suggest` | `manual`, `suggest`, or `auto` tool approval |
|
|
98
|
+
| `runDirectory` | `.scira/runs` | Local directory where run data is stored |
|
|
99
|
+
| `maxSources` | `20` | Max sources the agent may gather per run |
|
|
100
|
+
| `citationPolicy` | `strict` | `strict` (all claims cited) or `balanced` |
|
|
101
|
+
| `search.provider` | `exa` | `exa`, `firecrawl`, or `parallel` |
|
|
102
|
+
| `search.maxResults` | `8` | Max results per search query |
|
|
103
|
+
|
|
104
|
+
## Environment Variables
|
|
105
|
+
|
|
106
|
+
| Variable | Required | Purpose |
|
|
107
|
+
|---|---|---|
|
|
108
|
+
| `AI_GATEWAY_API_KEY` | Yes | Vercel AI Gateway — all model calls |
|
|
109
|
+
| `EXA_API_KEY` | With Exa | Web search via Exa |
|
|
110
|
+
| `FIRECRAWL_API_KEY` | With Firecrawl | Web scraping via Firecrawl |
|
|
111
|
+
|
|
112
|
+
## Run Directory
|
|
113
|
+
|
|
114
|
+
Each run writes to `.scira/runs/<run-id>/`:
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
goal.txt original question
|
|
118
|
+
plan.md agent's research plan
|
|
119
|
+
notes.md incremental findings
|
|
120
|
+
sources.jsonl sources gathered (id, url, title, snapshot path)
|
|
121
|
+
claims.jsonl claims extracted and verified
|
|
122
|
+
report.md final report
|
|
123
|
+
convo.json full conversation + feed (for TUI resume)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## License
|
|
127
|
+
|
|
128
|
+
MIT
|
|
@@ -0,0 +1,253 @@
|
|
|
1
|
+
import { createInterface } from "node:readline/promises";
|
|
2
|
+
import { stdin, stdout } from "node:process";
|
|
3
|
+
import { ToolLoopAgent, isLoopFinished } from "ai";
|
|
4
|
+
import ora from "ora";
|
|
5
|
+
import { getLanguageModel, requireLlmKeys } from "../providers/llm/registry.js";
|
|
6
|
+
import { createResearchTools, createOneShotTools, createCodingTools } from "./tools.js";
|
|
7
|
+
import { SKILL_CATALOG } from "./skills.js";
|
|
8
|
+
import { createMcpBridge } from "../tools/mcp-bridge.js";
|
|
9
|
+
function instructions(goal, config, workspacePath) {
|
|
10
|
+
const now = new Date();
|
|
11
|
+
const temporalContext = now.toLocaleDateString("en-US", {
|
|
12
|
+
weekday: "long",
|
|
13
|
+
year: "numeric",
|
|
14
|
+
month: "long",
|
|
15
|
+
day: "numeric"
|
|
16
|
+
});
|
|
17
|
+
const citationRule = config.citationPolicy === "strict"
|
|
18
|
+
? "Citation policy (strict): every non-trivial statement in report.md MUST cite a source ID. Do not include any uncited claims — move anything you cannot cite to an Open Questions section."
|
|
19
|
+
: "Citation policy (balanced): cite source IDs for all major claims; minor background context may be uncited but must not be overstated.";
|
|
20
|
+
const codingSection = workspacePath ? `
|
|
21
|
+
|
|
22
|
+
CODING CAPABILITIES:
|
|
23
|
+
You also have workspace-aware coding tools to build, modify, and debug code:
|
|
24
|
+
- readWorkspaceFile: Read any file in the workspace
|
|
25
|
+
- writeWorkspaceFile: Create or overwrite files (requires approval)
|
|
26
|
+
- editWorkspaceFile: Make surgical edits by replacing exact strings (requires approval)
|
|
27
|
+
- listWorkspaceDir: List files and directories
|
|
28
|
+
- grepWorkspace: Search for patterns across the codebase
|
|
29
|
+
- runWorkspaceCommand: Execute shell commands like builds, tests, installs (requires approval)
|
|
30
|
+
|
|
31
|
+
Workspace: ${workspacePath}
|
|
32
|
+
|
|
33
|
+
When the task involves code:
|
|
34
|
+
- Use grepWorkspace and readWorkspaceFile to understand existing code structure
|
|
35
|
+
- Use editWorkspaceFile for precise changes, writeWorkspaceFile for new files
|
|
36
|
+
- Run tests/builds with runWorkspaceCommand to verify changes
|
|
37
|
+
- Research APIs, libraries, or error messages with webSearch + readUrl when needed
|
|
38
|
+
- Match existing code style and patterns
|
|
39
|
+
|
|
40
|
+
You can seamlessly combine research and coding - e.g., research how to implement a feature, then implement it, or debug an issue by researching the error and fixing the code.` : "";
|
|
41
|
+
return `You are Scira AI CLI, made by Zaid Mukaddam, an autonomous research ${workspacePath ? "and coding " : ""}agent operating inside a single run directory on the user's machine.
|
|
42
|
+
|
|
43
|
+
Your goal:
|
|
44
|
+
${goal}
|
|
45
|
+
|
|
46
|
+
Temporal context:
|
|
47
|
+
Today is ${temporalContext}. Treat dates relative to this date: distinguish past, current, and future events; verify date-sensitive claims with sources instead of relying on model memory.
|
|
48
|
+
|
|
49
|
+
${citationRule}
|
|
50
|
+
Gather at most ${config.maxSources} high-quality sources — prefer depth and primary sources over volume.
|
|
51
|
+
|
|
52
|
+
You have shell, file, search, skill${config.files ? ", and local files" : ""}${workspacePath ? ", and workspace coding" : ""} tools. Work like an engineer running a research harness:${config.files ? `\n\nLocal files directory (${config.files.dir}):\nUse listFiles / searchFiles to enumerate documents, getFile to read them, and fileExists to confirm a file is present. Prefer local files as primary sources before falling back to the web. moveFile and deleteFile require user approval.` : ""}${codingSection}
|
|
53
|
+
0. Bootstrap: these built-in research skills are available — pull the relevant ones with readSkill before you begin. This is mandatory — skills contain concrete tactics for search, source quality, claim verification, and report writing.
|
|
54
|
+
${SKILL_CATALOG}
|
|
55
|
+
1. Plan: write a short plan.md outlining your approach (use the research-plan skill as a template).
|
|
56
|
+
2. Gather: use webSearch with 3-5 parallel query variations to find real, citable sources, then readUrl to read the most relevant ones. Record findings in notes.md as you go. Never invent sources or URLs.
|
|
57
|
+
3. Extract claims: after reading each source, use createClaim to record significant findings. Assign a short ID like claim_001, set confidence, and link source IDs.
|
|
58
|
+
4. Verify: once all claims are recorded, use verifyClaim to update each claim's status (verified / weak / contradicted / needs_review). Be honest — flag weak or vendor-only evidence.
|
|
59
|
+
5. Record sources: write all sources you actually used to sources.jsonl (include the snapshotPath reported by readUrl for each one) — STRICT JSONL rules: one compact JSON object per line, no literal newlines inside string values, no trailing commas. Use writeFile to write the entire file at once.
|
|
60
|
+
6. Synthesize: write a clear, well-structured report.md grounded in verified claims (use the report-structure skill for the section layout). Cite source IDs inline. Mark vendor/marketing claims as such.
|
|
61
|
+
7. Finish: when report.md is complete and accurate, stop and give the user a 2-4 sentence summary of what you found.
|
|
62
|
+
|
|
63
|
+
Rules:
|
|
64
|
+
- Prefer primary sources. Cross-check important claims across multiple sources.
|
|
65
|
+
- Keep files inside the run directory (paths are relative to it).
|
|
66
|
+
- Be terse in your narration between tool calls — say what you're doing and why in one line.
|
|
67
|
+
- Do not claim something is done before you have actually written report.md.
|
|
68
|
+
- Re-read a skill with readSkill any time you are uncertain how to proceed.`;
|
|
69
|
+
}
|
|
70
|
+
function devtoolsInstructionsBlock(toolNames) {
|
|
71
|
+
if (toolNames.length === 0)
|
|
72
|
+
return "";
|
|
73
|
+
return `
|
|
74
|
+
|
|
75
|
+
Browser tools (Chrome DevTools MCP)
|
|
76
|
+
You have access to a real Chromium browser via Chrome DevTools MCP. The available tool names are:
|
|
77
|
+
${toolNames.map((n) => ` - ${n}`).join("\n")}
|
|
78
|
+
|
|
79
|
+
DevTools is not only for debugging. Treat it as a research evidence tool for rendered, interactive, current, or runtime-dependent web sources. The built-in skill "browser-research" explains the workflow; read it with readSkill before using DevTools in a full research run.
|
|
80
|
+
|
|
81
|
+
These tools drive a live browser session and roughly cover four capabilities (exact names depend on the MCP server):
|
|
82
|
+
- Navigation & input: open URLs, click, type, fill forms, scroll, wait for selectors, navigate history.
|
|
83
|
+
- DOM & content: read the rendered DOM, accessibility tree, computed styles, and text content of elements.
|
|
84
|
+
- Console & network: list console messages, errors, and network requests/responses (status, headers, timing, payloads).
|
|
85
|
+
- Performance & diagnostics: capture screenshots, run performance traces / Core Web Vitals, and inspect runtime state.
|
|
86
|
+
|
|
87
|
+
When to use them (in priority order):
|
|
88
|
+
1. The research question asks what a live page currently shows: pricing, product availability, UI copy, feature lists, status pages, rankings, maps, app-store pages, dashboards, search pages, or docs portals.
|
|
89
|
+
2. The page is JS-heavy, gated, paginated, tabbed, filtered, infinite-scroll, or \`readUrl\` returns empty/garbled/incomplete text.
|
|
90
|
+
3. The claim depends on runtime behavior — redirects, loaded API payloads, console errors, network calls, client-rendered data, layout, screenshots, or performance.
|
|
91
|
+
4. You need to verify something only visible after interaction: clicking tabs, expanding accordions, selecting filters, scrolling, entering public search terms, or opening a modal.
|
|
92
|
+
5. The user explicitly asks to inspect Chrome/browser/devtools/live page/screenshot/console/network/rendered page behavior.
|
|
93
|
+
|
|
94
|
+
When NOT to use them:
|
|
95
|
+
- For static articles, papers, docs, blog posts, or anything \`webSearch\` + \`readUrl\` already handles cleanly.
|
|
96
|
+
- To "browse around" without a concrete claim or hypothesis to validate — these tools are slow and expensive.
|
|
97
|
+
|
|
98
|
+
Rules for browser tools:
|
|
99
|
+
- State the hypothesis or claim you are validating in one line before calling a browser tool.
|
|
100
|
+
- Prefer the smallest sequence of calls that resolves the question; close/navigate away when done.
|
|
101
|
+
- Record browser observations in notes.md with URL, access time, observed text/state, and interaction steps.
|
|
102
|
+
- Treat DOM/screenshot/console/network output as evidence for what the page showed at access time: cite the URL you observed it on, and record findings as regular claims with sourceIds pointing to that URL.
|
|
103
|
+
- Browser observations are primary evidence for page state but not independent corroboration; cross-check important factual claims with separate sources.
|
|
104
|
+
- Never paste secrets or credentials into the browser.`;
|
|
105
|
+
}
|
|
106
|
+
export async function createResearchAgent(runPath, goal, config, onApprovalRequired, workspacePath) {
|
|
107
|
+
requireLlmKeys(config);
|
|
108
|
+
const bridge = await createMcpBridge(config);
|
|
109
|
+
const researchTools = createResearchTools(runPath, config, onApprovalRequired);
|
|
110
|
+
const codingTools = workspacePath ? createCodingTools(workspacePath, config, onApprovalRequired) : {};
|
|
111
|
+
const tools = { ...researchTools, ...codingTools, ...bridge.tools };
|
|
112
|
+
const agent = new ToolLoopAgent({
|
|
113
|
+
model: getLanguageModel(config),
|
|
114
|
+
instructions: instructions(goal, config, workspacePath) + devtoolsInstructionsBlock(bridge.toolNames),
|
|
115
|
+
tools,
|
|
116
|
+
stopWhen: isLoopFinished()
|
|
117
|
+
});
|
|
118
|
+
return { agent, close: bridge.close };
|
|
119
|
+
}
|
|
120
|
+
function oneShotInstructions(goal, hasDevtools) {
|
|
121
|
+
const now = new Date();
|
|
122
|
+
const temporalContext = now.toLocaleDateString("en-US", {
|
|
123
|
+
weekday: "long",
|
|
124
|
+
year: "numeric",
|
|
125
|
+
month: "long",
|
|
126
|
+
day: "numeric"
|
|
127
|
+
});
|
|
128
|
+
const browserHint = hasDevtools
|
|
129
|
+
? `
|
|
130
|
+
|
|
131
|
+
Browser-tool routing (IMPORTANT):
|
|
132
|
+
- If the user explicitly mentions "chrome", "browser", "devtools", "live page", "render", "screenshot", "console", or "network", you MUST use the devtools_* tools instead of webSearch/readUrl. Open the relevant URL with devtools_navigate_page (or devtools_new_page), then take a snapshot/screenshot or read console/network as needed.
|
|
133
|
+
- Use devtools_* for research questions about what a live/current/interactive page shows: pricing pages, product pages, status pages, app-store pages, docs portals, rankings, maps, client-rendered dashboards, or pages with tabs/filters/infinite scroll.
|
|
134
|
+
- Also use devtools_* when readUrl already failed or returned empty/garbled/incomplete text on a JS-heavy page.
|
|
135
|
+
- Otherwise, default to webSearch + readUrl as below.`
|
|
136
|
+
: "";
|
|
137
|
+
return `You are Scira in quick one-shot mode. Your job is to either answer the user's question directly OR escalate to the full research harness.
|
|
138
|
+
|
|
139
|
+
Temporal context:
|
|
140
|
+
Today is ${temporalContext}. Treat dates relative to this date — don't rely on model memory for date-sensitive facts.
|
|
141
|
+
|
|
142
|
+
Built-in research skills available in full research mode:
|
|
143
|
+
${SKILL_CATALOG}
|
|
144
|
+
|
|
145
|
+
Question:
|
|
146
|
+
${goal}
|
|
147
|
+
|
|
148
|
+
Step 1 — Decide the depth required:
|
|
149
|
+
- If the user asks for "research," "deep dive," "analysis," "comparison," "history," or anything that would benefit from >3 sources, structured claims, and a written report → you MUST call requestFullResearch first. Do NOT try to answer it yourself.
|
|
150
|
+
- If the user asks for research that depends on rendered/live/interactive web evidence and DevTools are available, prefer requestFullResearch so the full agent can read the browser-research skill, record observations, and create claims.
|
|
151
|
+
- If the user asks a simple, narrow, or factual question (e.g. "what is the capital of France?", "what time is it in Tokyo?") → answer directly with 1-2 web searches.
|
|
152
|
+
- When in doubt, escalate.${browserHint}
|
|
153
|
+
|
|
154
|
+
Step 2 — If you decide to answer directly:
|
|
155
|
+
- Default path: use webSearch (2-3 query variations) to find relevant, recent sources, then readUrl to read the best 1-2.
|
|
156
|
+
- Browser path (only if the routing rules above triggered): use the devtools_* tools to drive a real Chromium session, then summarize what you observed (cite the URL you visited).
|
|
157
|
+
- Synthesize a clear, direct answer in a few short paragraphs. Cite sources inline as [title](url). Never invent sources or URLs.
|
|
158
|
+
- Do NOT write files, create claims, or produce a formal report — just answer in chat.`;
|
|
159
|
+
}
|
|
160
|
+
export async function createOneShotAgent(runPath, goal, config, onApprovalRequired, onEscalate) {
|
|
161
|
+
requireLlmKeys(config);
|
|
162
|
+
const bridge = await createMcpBridge(config);
|
|
163
|
+
const tools = { ...createOneShotTools(runPath, config, onApprovalRequired, onEscalate), ...bridge.tools };
|
|
164
|
+
const agent = new ToolLoopAgent({
|
|
165
|
+
model: getLanguageModel(config),
|
|
166
|
+
instructions: oneShotInstructions(goal, bridge.toolNames.length > 0) + devtoolsInstructionsBlock(bridge.toolNames),
|
|
167
|
+
tools,
|
|
168
|
+
stopWhen: isLoopFinished()
|
|
169
|
+
});
|
|
170
|
+
return { agent, close: bridge.close };
|
|
171
|
+
}
|
|
172
|
+
/**
|
|
173
|
+
* Run the research agent headlessly, streaming a compact timeline to stdout.
|
|
174
|
+
*/
|
|
175
|
+
export async function runResearchAgent(runPath, goal, config, workspacePath) {
|
|
176
|
+
const spinner = ora({ stream: stdout });
|
|
177
|
+
const onApprovalRequired = async (toolName, description) => {
|
|
178
|
+
spinner.stop();
|
|
179
|
+
console.error(`\n⚠ ${toolName} needs approval`);
|
|
180
|
+
console.error("-".repeat(60));
|
|
181
|
+
console.error(description.slice(0, 800));
|
|
182
|
+
console.error("-".repeat(60));
|
|
183
|
+
const rl = createInterface({ input: stdin, output: stdout });
|
|
184
|
+
const answer = await rl.question("\nApprove? [y/N] ");
|
|
185
|
+
rl.close();
|
|
186
|
+
const approved = answer.trim().toLowerCase() === "y";
|
|
187
|
+
if (approved)
|
|
188
|
+
spinner.start();
|
|
189
|
+
return approved;
|
|
190
|
+
};
|
|
191
|
+
const bundle = await createResearchAgent(runPath, goal, config, onApprovalRequired, workspacePath);
|
|
192
|
+
try {
|
|
193
|
+
const result = await bundle.agent.stream({ prompt: goal });
|
|
194
|
+
for await (const part of result.fullStream) {
|
|
195
|
+
if (part.type === "tool-call") {
|
|
196
|
+
spinner.start(`${CODING_ICONS[part.toolName] ?? TOOL_ICONS[part.toolName] ?? "•"} ${part.toolName} ${summarize(part.input)}`);
|
|
197
|
+
}
|
|
198
|
+
else if (part.type === "tool-result") {
|
|
199
|
+
spinner.succeed(`${part.toolName}`);
|
|
200
|
+
}
|
|
201
|
+
else if (part.type === "tool-error") {
|
|
202
|
+
spinner.fail(`${part.toolName} ${String(part.error).slice(0, 80)}`);
|
|
203
|
+
}
|
|
204
|
+
else if (part.type === "reasoning-delta") {
|
|
205
|
+
spinner.stop();
|
|
206
|
+
// dim the model's reasoning so it's visually distinct from the answer
|
|
207
|
+
process.stdout.write(`\x1b[2m${part.text}\x1b[22m`);
|
|
208
|
+
}
|
|
209
|
+
else if (part.type === "text-delta") {
|
|
210
|
+
spinner.stop();
|
|
211
|
+
process.stdout.write(part.text);
|
|
212
|
+
}
|
|
213
|
+
else if (part.type === "error") {
|
|
214
|
+
spinner.fail(String(part.error));
|
|
215
|
+
}
|
|
216
|
+
}
|
|
217
|
+
spinner.stop();
|
|
218
|
+
process.stdout.write("\n");
|
|
219
|
+
}
|
|
220
|
+
finally {
|
|
221
|
+
await bundle.close();
|
|
222
|
+
}
|
|
223
|
+
}
|
|
224
|
+
const TOOL_ICONS = {
|
|
225
|
+
bash: "⌘",
|
|
226
|
+
writeFile: "✎",
|
|
227
|
+
editFile: "✎",
|
|
228
|
+
readFile: "▤",
|
|
229
|
+
createClaim: "◎",
|
|
230
|
+
verifyClaim: "✓",
|
|
231
|
+
webSearch: "⌕",
|
|
232
|
+
readUrl: "↗",
|
|
233
|
+
listSkills: "★",
|
|
234
|
+
readSkill: "★",
|
|
235
|
+
listFiles: "▤",
|
|
236
|
+
searchFiles: "⌕",
|
|
237
|
+
getFile: "▤",
|
|
238
|
+
fileExists: "▤",
|
|
239
|
+
moveFile: "✎",
|
|
240
|
+
deleteFile: "✗"
|
|
241
|
+
};
|
|
242
|
+
function summarize(input) {
|
|
243
|
+
const obj = (input ?? {});
|
|
244
|
+
return String(obj.command ?? obj.query ?? obj.url ?? obj.path ?? obj.key ?? obj.pattern ?? obj.source ?? "").slice(0, 100);
|
|
245
|
+
}
|
|
246
|
+
const CODING_ICONS = {
|
|
247
|
+
readWorkspaceFile: "▤",
|
|
248
|
+
writeWorkspaceFile: "✎",
|
|
249
|
+
editWorkspaceFile: "✎",
|
|
250
|
+
listWorkspaceDir: "▤",
|
|
251
|
+
grepWorkspace: "⌕",
|
|
252
|
+
runWorkspaceCommand: "⌘"
|
|
253
|
+
};
|
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
export const SKILLS = [
|
|
2
|
+
{
|
|
3
|
+
name: "research-plan",
|
|
4
|
+
summary: "How to structure a research session into discovery, deep-dive, and synthesis phases",
|
|
5
|
+
content: `# research-plan
|
|
6
|
+
|
|
7
|
+
## Standard research phases
|
|
8
|
+
|
|
9
|
+
### Phase 1 — Discovery (10-15 % of budget)
|
|
10
|
+
- Write plan.md with your research questions and initial approach.
|
|
11
|
+
- Run broad searches to map the landscape.
|
|
12
|
+
- Identify 8-12 candidate sources across quality tiers.
|
|
13
|
+
- Note key researchers, institutions, and technical terms for later queries.
|
|
14
|
+
|
|
15
|
+
### Phase 2 — Deep Dive (60-70 % of budget)
|
|
16
|
+
- Read the 5-8 most relevant sources.
|
|
17
|
+
- Call createClaim immediately after reading each source — don't batch at the end.
|
|
18
|
+
- Keep notes.md updated with key insights and emerging patterns.
|
|
19
|
+
- Discover additional sources through the references sections of sources you've read.
|
|
20
|
+
|
|
21
|
+
### Phase 3 — Synthesis (20-30 % of budget)
|
|
22
|
+
- Call verifyClaim for every recorded claim.
|
|
23
|
+
- Identify gaps: which key questions lack verified answers?
|
|
24
|
+
- Write report.md following the report-structure skill.
|
|
25
|
+
- Final sanity check: every statistic and specific claim in the report has a cited source ID.
|
|
26
|
+
|
|
27
|
+
## plan.md starter template
|
|
28
|
+
\`\`\`
|
|
29
|
+
# Research Plan: <topic>
|
|
30
|
+
|
|
31
|
+
## Goal
|
|
32
|
+
<one sentence>
|
|
33
|
+
|
|
34
|
+
## Research questions
|
|
35
|
+
1.
|
|
36
|
+
2.
|
|
37
|
+
3.
|
|
38
|
+
|
|
39
|
+
## Approach
|
|
40
|
+
Phase 1: <broad searches + source identification>
|
|
41
|
+
Phase 2: <deep reading + claim recording>
|
|
42
|
+
Phase 3: <verification + report synthesis>
|
|
43
|
+
|
|
44
|
+
## Status
|
|
45
|
+
[ ] Phase 1 [ ] Phase 2 [ ] Phase 3
|
|
46
|
+
\`\`\`
|
|
47
|
+
`
|
|
48
|
+
},
|
|
49
|
+
{
|
|
50
|
+
name: "search-strategy",
|
|
51
|
+
summary: "How to formulate effective search queries, iterate on failures, and exploit parallel fetch",
|
|
52
|
+
content: `# search-strategy
|
|
53
|
+
|
|
54
|
+
## Core principle
|
|
55
|
+
Start broad, narrow by iteration. Never assume the first query is optimal.
|
|
56
|
+
|
|
57
|
+
## Use parallel queries (always)
|
|
58
|
+
webSearch accepts a \`queries\` array — always pass 3-5 variations for a single lookup.
|
|
59
|
+
This runs them in parallel and costs the same latency as one query.
|
|
60
|
+
|
|
61
|
+
Bad: { queries: ["dolphin intelligence"] }
|
|
62
|
+
Good: { queries: ["dolphin intelligence research", "cetacean cognition studies 2023 2024",
|
|
63
|
+
"bottlenose dolphin problem solving experiments", "dolphin self-awareness evidence"] }
|
|
64
|
+
|
|
65
|
+
## Retry ladder when results are poor
|
|
66
|
+
1. Remove qualifiers (drop year, drop adjectives).
|
|
67
|
+
2. Swap synonyms: "intelligence" → "cognition" → "learning ability" → "executive function".
|
|
68
|
+
3. Add domain targeting: append "site:pubmed.ncbi.nlm.nih.gov" or "site:arxiv.org".
|
|
69
|
+
4. Search for the researcher or paper title fragment directly.
|
|
70
|
+
|
|
71
|
+
## Source discovery via chaining
|
|
72
|
+
- Wikipedia article "References" sections are free bibliographies — mine them.
|
|
73
|
+
- After finding a key paper, search for its author name to find related work.
|
|
74
|
+
- Search "<topic> systematic review" or "<topic> meta-analysis" for aggregated evidence.
|
|
75
|
+
|
|
76
|
+
## Know when to stop
|
|
77
|
+
If 3 different query formulations on the same subtopic return no useful results, record a
|
|
78
|
+
knowledge gap note in notes.md and move on rather than looping.
|
|
79
|
+
`
|
|
80
|
+
},
|
|
81
|
+
{
|
|
82
|
+
name: "source-quality",
|
|
83
|
+
summary: "Tiers for source credibility; how to assign claim confidence and spot red flags",
|
|
84
|
+
content: `# source-quality
|
|
85
|
+
|
|
86
|
+
## Tier 1 — High confidence (confidence: "high")
|
|
87
|
+
- Peer-reviewed journals: PubMed / PMC, arXiv, Nature, Science, PLOS, Cell, PNAS
|
|
88
|
+
- Government databases: NIH, NOAA, NASA, CDC, WHO (.gov, .int)
|
|
89
|
+
- University research pages and institutional preprints (.edu)
|
|
90
|
+
Use confidence "high" when 2+ independent Tier 1 sources agree.
|
|
91
|
+
|
|
92
|
+
## Tier 2 — Medium confidence (confidence: "medium")
|
|
93
|
+
- Major news organizations with science desks (Reuters, BBC, NYT, AP)
|
|
94
|
+
- Wikipedia — use for orientation and bibliography mining, never as a citable source
|
|
95
|
+
- Reputable non-profits, professional associations
|
|
96
|
+
Use confidence "medium"; always seek corroboration for critical claims.
|
|
97
|
+
|
|
98
|
+
## Tier 3 — Low / Vendor (confidence: "low", status: "weak")
|
|
99
|
+
- Company blogs, product pages, PR releases, vendor whitepapers
|
|
100
|
+
- Social media, Reddit, forums, user-generated wikis
|
|
101
|
+
- Uncredited, undated, or anonymously authored content
|
|
102
|
+
Flag these explicitly as vendor/marketing in the report.
|
|
103
|
+
|
|
104
|
+
## Red flags
|
|
105
|
+
- "Studies show" or "research suggests" without a named citation.
|
|
106
|
+
- Statistics that only appear on a single vendor domain.
|
|
107
|
+
- Press releases that describe a study — always trace back to the original paper.
|
|
108
|
+
- Circular citations: A cites B which cites A — find the actual primary source.
|
|
109
|
+
`
|
|
110
|
+
},
|
|
111
|
+
{
|
|
112
|
+
name: "claim-verification",
|
|
113
|
+
summary: "Protocol for verifying claims: multi-source corroboration, status rules, and common traps",
|
|
114
|
+
content: `# claim-verification
|
|
115
|
+
|
|
116
|
+
## Steps for each major claim
|
|
117
|
+
1. Identify the original source (journal, study name, institution, year).
|
|
118
|
+
2. Search with different terminology to find independent corroboration.
|
|
119
|
+
3. Actively search for contradicting evidence: "<claim topic> criticism",
|
|
120
|
+
"<claim topic> limitations", "<claim topic> contradicted".
|
|
121
|
+
|
|
122
|
+
## Status decision table
|
|
123
|
+
| Evidence | Status |
|
|
124
|
+
|-------------------------------------------------|-----------------|
|
|
125
|
+
| 2+ independent Tier 1 sources agree | verified |
|
|
126
|
+
| Single Tier 1 source, no contradiction found | needs_review |
|
|
127
|
+
| Multiple sources disagree substantially | contradicted |
|
|
128
|
+
| Only vendor / marketing sources | weak |
|
|
129
|
+
| No primary source found after 3 search attempts | weak |
|
|
130
|
+
|
|
131
|
+
## Batching strategy
|
|
132
|
+
Record claims with createClaim as you read each source.
|
|
133
|
+
Run all verifyClaim calls together at the end of Phase 2, once the full evidence
|
|
134
|
+
picture is assembled — this avoids premature status assignment.
|
|
135
|
+
|
|
136
|
+
## Common traps
|
|
137
|
+
- Circular citations: trace back to the named primary study, not a news summary.
|
|
138
|
+
- Wikipedia summaries often chain to a single study — read the actual paper.
|
|
139
|
+
- "Preliminary study" and "pilot study" findings warrant confidence "low" regardless of source tier.
|
|
140
|
+
- Quantitative claims (percentages, effect sizes) need the exact study, not a secondary summary.
|
|
141
|
+
`
|
|
142
|
+
},
|
|
143
|
+
{
|
|
144
|
+
name: "report-structure",
|
|
145
|
+
summary: "Recommended section order, inline citation style, and prose rules for report.md",
|
|
146
|
+
content: `# report-structure
|
|
147
|
+
|
|
148
|
+
## Section order
|
|
149
|
+
|
|
150
|
+
### Executive Summary (3-5 sentences)
|
|
151
|
+
Lead with the strongest conclusions. State what is definitively known and what remains uncertain.
|
|
152
|
+
Do not save the conclusion for the end.
|
|
153
|
+
|
|
154
|
+
### Key Findings
|
|
155
|
+
Bullet list. Each bullet has an inline source citation: [src_1] or [src_2, src_4].
|
|
156
|
+
Maximum one claim per bullet.
|
|
157
|
+
|
|
158
|
+
### Detailed Analysis
|
|
159
|
+
One subsection per major theme (200-300 words each).
|
|
160
|
+
Cite inline. Use "evidence suggests" or "appears to" for medium-confidence claims.
|
|
161
|
+
Use direct declarative statements only for verified claims.
|
|
162
|
+
|
|
163
|
+
### Methodology
|
|
164
|
+
Which searches were run, how many sources evaluated, which were excluded and why.
|
|
165
|
+
Name source tiers used.
|
|
166
|
+
|
|
167
|
+
### Open Questions / Limitations
|
|
168
|
+
Explicitly list what is NOT well-established.
|
|
169
|
+
Flag vendor-only evidence here.
|
|
170
|
+
This section adds credibility — do not omit it.
|
|
171
|
+
|
|
172
|
+
### References
|
|
173
|
+
One line per source:
|
|
174
|
+
- [src_1] Title — https://url
|
|
175
|
+
|
|
176
|
+
## Style rules
|
|
177
|
+
- Findings: present tense ("dolphins demonstrate self-recognition…")
|
|
178
|
+
- Study descriptions: past tense ("Reiss & Marino (2001) found…")
|
|
179
|
+
- No invented statistics. If a number can't be traced to a primary source, omit it or flag it.
|
|
180
|
+
- Avoid hedge-stacking: "may possibly suggest" → "suggests"
|
|
181
|
+
- Bold key terms on first use.
|
|
182
|
+
- Aim for 800-1500 words for a standard run; longer only if the topic genuinely requires it.
|
|
183
|
+
`
|
|
184
|
+
},
|
|
185
|
+
{
|
|
186
|
+
name: "browser-research",
|
|
187
|
+
summary: "How to use Chrome DevTools MCP for research on rendered, interactive, JS-heavy, or runtime-dependent sources",
|
|
188
|
+
content: `# browser-research
|
|
189
|
+
|
|
190
|
+
Use Chrome DevTools MCP as a research instrument when normal fetch/read tools cannot capture what a real user sees.
|
|
191
|
+
|
|
192
|
+
## When to use browser tools
|
|
193
|
+
- JS-heavy sites where readUrl returns empty, boilerplate, paywall shells, or missing article content.
|
|
194
|
+
- Product/pricing pages, dashboards, docs portals, app stores, maps, search pages, or sites with tabs/filters/infinite scroll.
|
|
195
|
+
- Claims about what a page currently shows: wording, feature availability, pricing tiers, UI flows, redirects, embedded data, dates, or generated content.
|
|
196
|
+
- Runtime evidence: network calls, API payloads, console errors, status codes, screenshots, layout/performance observations.
|
|
197
|
+
- Verification of screenshots or visual claims that cannot be proven from plain text.
|
|
198
|
+
|
|
199
|
+
## Research workflow
|
|
200
|
+
1. Start with webSearch to identify candidate URLs and readUrl for static pages.
|
|
201
|
+
2. Escalate a candidate URL to DevTools when the rendered page is the primary evidence or readUrl is incomplete.
|
|
202
|
+
3. Navigate to the URL, wait for content to settle, then capture a snapshot/text view before interacting.
|
|
203
|
+
4. If needed, click tabs, expand sections, apply filters, scroll, or inspect network calls.
|
|
204
|
+
5. Record a concise note in notes.md: URL, observation, action taken, and why browser evidence was needed.
|
|
205
|
+
6. Create claims from browser observations just like source text. Use the observed URL as the source and mention "browser-observed" in notes.
|
|
206
|
+
|
|
207
|
+
## Evidence quality
|
|
208
|
+
- Browser observations are primary evidence for "what the page showed at time of access."
|
|
209
|
+
- They are not independent corroboration. Cross-check important factual claims with separate sources.
|
|
210
|
+
- Prefer official pages for product/pricing/status claims, but flag vendor-only evidence as weak unless corroborated.
|
|
211
|
+
- For volatile pages, include access date/time in notes.md and report.md.
|
|
212
|
+
|
|
213
|
+
## Tool discipline
|
|
214
|
+
- State the exact hypothesis before using DevTools.
|
|
215
|
+
- Use the shortest reliable path: navigate, snapshot, maybe network/console/screenshot, then stop.
|
|
216
|
+
- Do not browse aimlessly. Every browser action should resolve a research question.
|
|
217
|
+
- Never enter secrets, private data, or credentials.
|
|
218
|
+
`
|
|
219
|
+
},
|
|
220
|
+
{
|
|
221
|
+
name: "tool-efficiency",
|
|
222
|
+
summary: "Tool batching patterns, retry tactics, file hygiene, and common anti-patterns to avoid",
|
|
223
|
+
content: `# tool-efficiency
|
|
224
|
+
|
|
225
|
+
## webSearch
|
|
226
|
+
- Always pass 3-5 query variations in the queries array (parallel fetch, same latency as one).
|
|
227
|
+
- Use short noun phrases, not full sentences.
|
|
228
|
+
- Set retries: 2 for flaky searches.
|
|
229
|
+
|
|
230
|
+
## readUrl
|
|
231
|
+
- Target the specific article or paper page, not the site homepage.
|
|
232
|
+
- PubMed: use the /articles/PMC… URL for full text where available.
|
|
233
|
+
- If a URL 404s or times out, try: search for the title, or try an alternate domain.
|
|
234
|
+
- Do not re-read a URL you already read in this session.
|
|
235
|
+
|
|
236
|
+
## writeFile / bash
|
|
237
|
+
- Before writing, check what exists: bash \`cat filename 2>/dev/null | head -10\`
|
|
238
|
+
- Write complete files in one call; never partial appends to JSONL.
|
|
239
|
+
- sources.jsonl strict rules: one compact JSON per line, no literal newlines in strings,
|
|
240
|
+
no trailing commas. Write the entire file at once using writeFile.
|
|
241
|
+
|
|
242
|
+
## createClaim / verifyClaim
|
|
243
|
+
- Call createClaim right after reading each source, not in a batch at the end.
|
|
244
|
+
- Use a consistent ID scheme: claim_001, claim_002, … or topic-prefixed: claim_cognition_001.
|
|
245
|
+
- Run all verifyClaim calls together after all claims are recorded.
|
|
246
|
+
|
|
247
|
+
## Narration discipline
|
|
248
|
+
- One line before each tool call: say what you're doing and why.
|
|
249
|
+
- Do not repeat a tool call that already succeeded in this session.
|
|
250
|
+
- Check notes.md with bash \`cat notes.md\` before re-researching a topic.
|
|
251
|
+
|
|
252
|
+
## Anti-patterns
|
|
253
|
+
- ✗ Searching the same query multiple times without changing terms.
|
|
254
|
+
- ✗ Reading a homepage hoping it contains the article text.
|
|
255
|
+
- ✗ Writing report.md before claims are verified.
|
|
256
|
+
- ✗ Omitting sources.jsonl or writing it with multiline string values.
|
|
257
|
+
`
|
|
258
|
+
}
|
|
259
|
+
];
|
|
260
|
+
export const SKILL_NAMES = SKILLS.map(s => s.name);
|
|
261
|
+
/** Markdown bullet list of skill names + summaries, embedded directly in the agent system prompt. */
|
|
262
|
+
export const SKILL_CATALOG = SKILLS.map(s => `- ${s.name}: ${s.summary}`).join("\n");
|
|
263
|
+
export function getSkill(name) {
|
|
264
|
+
return SKILLS.find(s => s.name === name);
|
|
265
|
+
}
|