ai-discovery-manager-cli 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +252 -0
- package/dist/chat.js +827 -0
- package/dist/checkpoints.js +134 -0
- package/dist/cli.js +796 -0
- package/dist/doctor.js +148 -0
- package/dist/hypothesisSchema.js +106 -0
- package/dist/jsonOutput.js +65 -0
- package/dist/mcpManager.js +196 -0
- package/dist/models.js +117 -0
- package/dist/safety.js +152 -0
- package/dist/specialistContracts.js +355 -0
- package/dist/workspaceTools.js +166 -0
- package/package.json +32 -0
package/README.md
ADDED
|
@@ -0,0 +1,252 @@
|
|
|
1
|
+
# AI Discovery Manager CLI
|
|
2
|
+
|
|
3
|
+
Codex-style research CLI built on the OpenAI Agents SDK **manager pattern**: one trusted host agent owns the final answer and calls bounded specialist agents exposed as tools. The host process owns local filesystem writes for generated artifacts, chat exports, and optional workspace writes.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
### Workflow commands
|
|
8
|
+
|
|
9
|
+
A manager agent orchestrates specialists and produces a Markdown research artifact. Each command targets a different section, except `run`, which drives the full pipeline:
|
|
10
|
+
|
|
11
|
+
| Command | What it produces |
|
|
12
|
+
| --- | --- |
|
|
13
|
+
| `run` | Full manager-orchestrated PhD thesis workflow (calls every relevant specialist). |
|
|
14
|
+
| `thesis` | A complete PhD thesis draft (title, abstract, intro, lit review, methods, results, discussion, conclusion, references). |
|
|
15
|
+
| `literature-review` | A cited PhD-level literature review grouped by theme, method, evidence strength, and open questions. |
|
|
16
|
+
| `hypothesis` | A structured YAML research hypothesis covering evidence, mechanism, predictions, test plan, confounders, feasibility, evaluation, uncertainty, and status. |
|
|
17
|
+
| `abstract` | A concise thesis abstract covering problem, gap, method, evidence, contribution, and implications. |
|
|
18
|
+
| `discussion` | A discussion section with implications, limitations, counterarguments, threats to validity, and future work. |
|
|
19
|
+
| `experiment` | A designed-and-run experiment analyzed with Code Interpreter (stats, simulations, generated tables). |
|
|
20
|
+
| `conclusion` | A conclusion synthesizing the question, contribution, evidence, limitations, and next research. |
|
|
21
|
+
| `chat` | Interactive REPL for workspace Q&A, specialist slash commands, and assistant-output exports. |
|
|
22
|
+
| `doctor` | Local readiness checks for API key, workspace, models, vector stores, MCP support, and built dist files. |
|
|
23
|
+
|
|
24
|
+
### Specialists (manager tools)
|
|
25
|
+
|
|
26
|
+
The manager calls these bounded specialist agents as tools. Each gets only the hosted tools its contract requests:
|
|
27
|
+
|
|
28
|
+
| Specialist | Tool name | Hosted tools |
|
|
29
|
+
| --- | --- | --- |
|
|
30
|
+
| Literature Review | `generate_literature_review` | web search, File Search |
|
|
31
|
+
| Hypothesis | `generate_hypothesis` | web search, File Search |
|
|
32
|
+
| Abstract | `generate_abstract` | web search, File Search |
|
|
33
|
+
| Discussion | `generate_discussion` | web search, File Search |
|
|
34
|
+
| Experiment | `run_experiment_and_analysis` | Code Interpreter, File Search, web search |
|
|
35
|
+
| Conclusion | `generate_conclusion` | web search, File Search |
|
|
36
|
+
| Thesis Writer | `generate_phd_thesis` | web search, File Search |
|
|
37
|
+
|
|
38
|
+
### Hosted-tool gating
|
|
39
|
+
|
|
40
|
+
- **Web search** attaches to specialists that request it unless `--no-web-search` is set.
|
|
41
|
+
- **OpenAI File Search** attaches only when at least one vector store ID is configured (`--vector-store-id`, `--vector-store-ids`, or `OPENAI_VECTOR_STORE_IDS`).
|
|
42
|
+
- **Code Interpreter** attaches to the experiment specialist for quantitative analysis, simulations, statistics, and generated tables. Chat also exposes Code Interpreter so `/experiment` can use the same experiment contract.
|
|
43
|
+
|
|
44
|
+
### Sandboxed workspace tools
|
|
45
|
+
|
|
46
|
+
When workspace filesystem access is enabled (default; disable with `--no-workspace-fs`), specialists and the chat agent get local tools:
|
|
47
|
+
|
|
48
|
+
- `list_workspace` - list files/subdirectories (entries truncate at 500).
|
|
49
|
+
- `read_workspace_file` - read UTF-8 text files (capped at ~256 KiB; binary files refused).
|
|
50
|
+
- `write_workspace_file` - write UTF-8 text files (max 1 MiB) - **only** when `--workspace-write` is set (off by default).
|
|
51
|
+
|
|
52
|
+
All tool paths are resolved inside the workspace root; `..` and absolute-path escapes are rejected. Chat `/save` and `/flash-save` are host-side export commands, not model tools, and also constrain output paths inside the workspace.
|
|
53
|
+
|
|
54
|
+
Model-facing workspace reads also refuse default ignored paths such as `.env*`, `.git`, `node_modules`, lockfiles, and secret/key-like filenames before file content is sent to the model.
|
|
55
|
+
|
|
56
|
+
### Interactive chat
|
|
57
|
+
|
|
58
|
+
`ai-discovery chat --workspace <path>` opens a manager-free REPL. Conversation state carries across turns, and the agent can read/list the workspace itself. Specialist slash commands reuse the same shared specialist contracts as the manager CLI, so chat and workflow output stay aligned.
|
|
59
|
+
|
|
60
|
+
Slash commands:
|
|
61
|
+
|
|
62
|
+
| Command | Action |
|
|
63
|
+
| --- | --- |
|
|
64
|
+
| `/read <path>` | Load a workspace text file into the conversation, then ask about it. |
|
|
65
|
+
| `/list [<path>]` | List workspace files (default: workspace root). |
|
|
66
|
+
| `/save <path.text\|path.txt\|path.pdf>` | Save assistant output history only, excluding user inputs. `/flash-save` is an alias. |
|
|
67
|
+
| `/literature-review <topic>` | Generate a literature review using the same specialist contract as the CLI workflow. |
|
|
68
|
+
| `/hypothesis <question>` | Generate a structured YAML research hypothesis using the hypothesis schema. |
|
|
69
|
+
| `/abstract <topic>` | Generate an abstract using the same specialist contract as the CLI workflow. |
|
|
70
|
+
| `/discussion <topic>` | Generate a discussion using the same specialist contract as the CLI workflow. |
|
|
71
|
+
| `/experiment <topic/spec>` | Design, run, and analyze an experiment using the same specialist contract as the CLI workflow. |
|
|
72
|
+
| `/conclusion <topic>` | Generate a conclusion using the same specialist contract as the CLI workflow. |
|
|
73
|
+
| `/model [name\|number]` | Show or switch the chat model (text-only allowlist). |
|
|
74
|
+
| `/models` | List the allowed text models. |
|
|
75
|
+
| `/safety [1-5]` | Show or set the local safety preflight level for the session. |
|
|
76
|
+
| `/mcp <subcommand>` | Manage session-only stdio MCP servers (`connect` / `status` / `tools` / `disconnect` / `help`). |
|
|
77
|
+
| `/recursive [on\|off\|status\|<iterations>]` | Toggle bounded self-review/revision for subsequent assistant replies. |
|
|
78
|
+
| `/reset` | Clear conversation history, loaded files, and assistant output history used by `/save`. |
|
|
79
|
+
| `/help` | Show chat help. |
|
|
80
|
+
| `/exit`, `/quit` | Leave the chat. |
|
|
81
|
+
|
|
82
|
+
`/read` shares the same sandbox, 256 KiB cap, and binary guard as the agent's read tool.
|
|
83
|
+
|
|
84
|
+
Best-effort keyboard shortcuts on interactive TTYs: **Ctrl+S** saves assistant output history to a default `.ai-discovery/chat-output-<timestamp>.text` path inside the workspace, and **Ctrl+M** shows MCP status/help. Many terminals deliver Ctrl+M as Enter, so `/mcp` remains the reliable command.
|
|
85
|
+
|
|
86
|
+
### Text-only model allowlist
|
|
87
|
+
|
|
88
|
+
Every model input (`--model`, `--manager-model`, `--specialist-model`, `OPENAI_MODEL`, and chat `/model`) is validated against a curated text-only allowlist. Display aliases such as `GPT-5.5 Pro` and `GPT 5.4 mini` are normalized; unknown models are rejected with the allowed list. The allowed IDs are:
|
|
89
|
+
|
|
90
|
+
`gpt-5.5`, `gpt-5.5-pro`, `gpt-5.4`, `gpt-5.4-pro`, `gpt-5.4-mini`, `gpt-5.4-nano`.
|
|
91
|
+
|
|
92
|
+
### Safety levels and local preflight
|
|
93
|
+
|
|
94
|
+
A local, on-device safety preflight runs **before** any OpenAI API call (and before `--dry-run` prints), so disallowed prompts fail without leaving the machine. Levels run 1-5 (default `3`, configurable via `--safety-level` or `AI_DISCOVERY_SAFETY_LEVEL`, and per-session via chat `/safety`):
|
|
95
|
+
|
|
96
|
+
- **Levels 1-5** all block biological/chemical mass-hazard prompts (weaponization and dangerous-pathogen / chemical-agent synthesis).
|
|
97
|
+
- **Level 5** additionally blocks jailbreak, prompt-injection, secret-exfiltration, and policy-evasion ("ignore your system/tool policy") attempts.
|
|
98
|
+
|
|
99
|
+
The preflight is a coarse first gate that complements — not replaces — the agents' built-in "never fabricate / no procedural physical-world harm" instructions.
|
|
100
|
+
|
|
101
|
+
### Session MCP servers
|
|
102
|
+
|
|
103
|
+
Chat can attach user-started **stdio MCP servers** ("science MCPs") for the current session only — there is no persisted MCP config file and nothing is autoloaded. Manage them with `/mcp`:
|
|
104
|
+
|
|
105
|
+
```text
|
|
106
|
+
/mcp connect <name> [--cwd <path>] [--env KEY=value | --env KEY]... -- <command> [args...]
|
|
107
|
+
/mcp status
|
|
108
|
+
/mcp tools [name]
|
|
109
|
+
/mcp disconnect <name>
|
|
110
|
+
/mcp help
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
`--env KEY` (no value) forwards `KEY` from the current environment. **Env values are never printed or saved — only key names are shown.** MCP tools are exposed to the assistant prefixed by server name so two servers cannot collide.
|
|
114
|
+
|
|
115
|
+
### Chat Output Saving
|
|
116
|
+
|
|
117
|
+
Use `/save <path>` or `/flash-save <path>` inside `ai-discovery chat` to export only assistant output from the current chat session. User prompts, slash commands, and loaded file contents are not written to the export. Text exports support `.text` and `.txt`; PDF exports support `.pdf` and are converted from the plain text history.
|
|
118
|
+
|
|
119
|
+
Examples:
|
|
120
|
+
|
|
121
|
+
```text
|
|
122
|
+
/save notes/session-output.text
|
|
123
|
+
/save notes/session-output.pdf
|
|
124
|
+
/flash-save latest-output.txt
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Saved text groups replies as `--- Assistant output N ---`. Export paths must stay inside the configured workspace.
|
|
128
|
+
|
|
129
|
+
### Streaming
|
|
130
|
+
|
|
131
|
+
Streaming is on by default (`--stream`):
|
|
132
|
+
|
|
133
|
+
- **stdout** - the manager's final Markdown, streamed token-by-token. This is also what gets saved to the artifact.
|
|
134
|
+
- **stderr** - `[stream]` / `[specialist:<name>]` progress: output deltas plus `tool_called` / `handoff_*` / `tool_approval_requested` events.
|
|
135
|
+
|
|
136
|
+
Use `--no-stream` to wait for the complete result before printing.
|
|
137
|
+
|
|
138
|
+
### Safety posture
|
|
139
|
+
|
|
140
|
+
- Sensitive trace payloads are disabled everywhere (`traceIncludeSensitiveData: false`).
|
|
141
|
+
- Hard citation policy: every external claim needs a real inline citation (author, year, venue, working URL/DOI) from actual search results - never fabricated. Unverifiable claims are dropped or labeled.
|
|
142
|
+
- Workspace is sandboxed and writes are off by default.
|
|
143
|
+
- A local safety preflight (levels 1-5) blocks disallowed prompts before any API call. See "Safety levels and local preflight".
|
|
144
|
+
- MCP servers are session-only; env values are never printed or persisted (only key names are shown).
|
|
145
|
+
- The CLI never asks for secrets in prompts.
|
|
146
|
+
|
|
147
|
+
## Setup
|
|
148
|
+
|
|
149
|
+
Node >= 22 is required.
|
|
150
|
+
|
|
151
|
+
```powershell
|
|
152
|
+
npm.cmd install
|
|
153
|
+
npm.cmd run build
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Environment variables
|
|
157
|
+
|
|
158
|
+
| Variable | Purpose |
|
|
159
|
+
| --- | --- |
|
|
160
|
+
| `OPENAI_API_KEY` | Required for any non-`--dry-run` invocation. |
|
|
161
|
+
| `OPENAI_MODEL` | Overrides the default model (`gpt-5.5`); must be in the text-only allowlist. |
|
|
162
|
+
| `OPENAI_VECTOR_STORE_IDS` | Comma-separated default vector store IDs for File Search. |
|
|
163
|
+
| `AI_DISCOVERY_SAFETY_LEVEL` | Default safety preflight level `1-5` (overridden by `--safety-level`; default `3`). |
|
|
164
|
+
|
|
165
|
+
## Usage
|
|
166
|
+
|
|
167
|
+
```powershell
|
|
168
|
+
node dist/cli.js run --topic "Robust AI discovery workflows for scientific research" --workspace . --out artifacts
|
|
169
|
+
node dist/cli.js literature-review --topic "AI agents for laboratory planning" --vector-store-id vs_...
|
|
170
|
+
node dist/cli.js hypothesis --topic "Can retrieval-grounded agent debates improve hypothesis novelty screening?"
|
|
171
|
+
node dist/cli.js experiment --topic "Simulation-based hypothesis screening" --experiment-spec "Compare two synthetic baselines and analyze uncertainty"
|
|
172
|
+
node dist/cli.js run --topic "..." --manager-model "GPT-5.5 Pro" --specialist-model gpt-5.4-mini --safety-level 5
|
|
173
|
+
node dist/cli.js chat --workspace ./papers --safety-level 4
|
|
174
|
+
node dist/cli.js doctor --workspace . --json
|
|
175
|
+
node dist/cli.js run --resume <run-id>
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Example chat commands:
|
|
179
|
+
|
|
180
|
+
```text
|
|
181
|
+
/literature-review AI agents for laboratory planning
|
|
182
|
+
/hypothesis Can retrieval-grounded agent debates improve novelty screening?
|
|
183
|
+
/abstract Robust AI discovery workflows for scientific research
|
|
184
|
+
/discussion Limitations and counterarguments for AI co-scientist systems
|
|
185
|
+
/experiment Compare two synthetic baselines and analyze uncertainty
|
|
186
|
+
/conclusion AI agents for scientific discovery
|
|
187
|
+
/models
|
|
188
|
+
/model gpt-5.5-pro
|
|
189
|
+
/safety 5
|
|
190
|
+
/mcp connect arxiv --env ARXIV_TOKEN -- npx -y @example/arxiv-mcp-server
|
|
191
|
+
/mcp tools arxiv
|
|
192
|
+
/recursive on 3
|
|
193
|
+
/save outputs/session-output.pdf
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Run from TypeScript source without building via `npm.cmd run dev -- <command> --topic "..."`.
|
|
197
|
+
|
|
198
|
+
### Options
|
|
199
|
+
|
|
200
|
+
| Flag | Description |
|
|
201
|
+
| --- | --- |
|
|
202
|
+
| `--topic, -t <text>` | Research topic or user request (required except for `chat`). |
|
|
203
|
+
| `--workspace, -w <path>` | Workspace root for the sandboxed file tools (default: cwd). |
|
|
204
|
+
| `--out, -o <path>` | Output directory for the final Markdown (default: `artifacts`). |
|
|
205
|
+
| `--model <model>` | Model for both manager and specialists (default: `OPENAI_MODEL` or `gpt-5.5`). Validated against the text-only allowlist. |
|
|
206
|
+
| `--manager-model <model>` | Override the manager model only (same allowlist). |
|
|
207
|
+
| `--specialist-model <model>` | Override the specialist models only (same allowlist). |
|
|
208
|
+
| `--vector-store-id <id>` | Add an OpenAI vector store for File Search; repeatable. |
|
|
209
|
+
| `--vector-store-ids <ids>` | Comma-separated OpenAI vector store IDs. |
|
|
210
|
+
| `--experiment-spec <text>` | Extra experiment design/analysis requirements. |
|
|
211
|
+
| `--max-turns <number>` | Max manager turns (default: 24). |
|
|
212
|
+
| `--safety-level <1-5>` | Local safety preflight level (default: 3, env `AI_DISCOVERY_SAFETY_LEVEL`). |
|
|
213
|
+
| `--no-web-search` | Disable web search tools. |
|
|
214
|
+
| `--no-workspace-fs` | Disable workspace filesystem tools (read/list/write). |
|
|
215
|
+
| `--workspace-write` | Allow specialists/chat to write files into the workspace (off by default). |
|
|
216
|
+
| `--stream` / `--no-stream` | Stream live output (default) or wait for the final result. |
|
|
217
|
+
| `--dry-run` | Print the resolved workflow as JSON without calling the API. |
|
|
218
|
+
| `--json` | Emit machine-readable JSON. Live run/chat output is newline-delimited JSON events. |
|
|
219
|
+
| `--resume <id>` | Resume a saved workflow checkpoint from `.ai-discovery/runs/<id>`. |
|
|
220
|
+
| `--help, -h` | Show usage. |
|
|
221
|
+
|
|
222
|
+
Artifacts are written to `<out>/<command>-<topic-slug>.md`.
|
|
223
|
+
|
|
224
|
+
### Dry run
|
|
225
|
+
|
|
226
|
+
For a no-network / no-API configuration check (the only command that runs without `OPENAI_API_KEY`):
|
|
227
|
+
|
|
228
|
+
```powershell
|
|
229
|
+
npm.cmd run dry-run
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
This prints the resolved workflow JSON — command, models, the text-only `availableModels` allowlist, `safetyLevel`/`safetyPolicy`, workspace access, web-search state, vector stores, and the hosted/workspace tools each specialist would receive. The `chat` dry-run additionally lists the slash commands, keyboard shortcuts, and `mcpServers` (session-only).
|
|
233
|
+
|
|
234
|
+
### JSON, doctor, and resumable runs
|
|
235
|
+
|
|
236
|
+
Use `--json` when another agent or script needs stable machine-readable output:
|
|
237
|
+
|
|
238
|
+
```powershell
|
|
239
|
+
node dist/cli.js doctor --workspace . --json
|
|
240
|
+
node dist/cli.js run --topic "..." --json
|
|
241
|
+
node dist/cli.js chat --workspace . --json
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
Live workflow and chat commands emit newline-delimited JSON events such as `run_started`, `manager_output_delta`, `artifact_written`, `run_completed`, `chat_started`, `assistant_output_delta`, `assistant_output`, and `error`. Completion events include artifact paths, checkpoint paths, citations found in the final text, SDK token usage when reported, and a cost object with `amount: null` because model-specific pricing is not bundled.
|
|
245
|
+
|
|
246
|
+
Every non-chat workflow creates a checkpoint under `.ai-discovery/runs/<id>/` with `options.json`, `prompt.md`, `partial.md`, `final.md`, `metadata.json`, and `result.json`. Pressing Ctrl+C during a streamed run saves the current partial before exiting. Resume with:
|
|
247
|
+
|
|
248
|
+
```powershell
|
|
249
|
+
node dist/cli.js run --resume <id>
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
`doctor` does not call the OpenAI API. It checks local Node version, API-key presence, workspace existence/writability, model allowlist values, vector-store ID shape, MCP stdio availability, and whether `dist/cli.js` exists.
|