npm - @researai/deepscientist - Versions diffs - 1.5.0 → 1.5.1 - Mend

@researai/deepscientist 1.5.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/docs/en/07_MEMORY_AND_MCP.md ADDED Viewed

@@ -0,0 +1,253 @@
+# 07 Memory and MCP: Built-in MCP and Memory Protocol
+This note defines the intended meaning of the three built-in DeepScientist MCP namespaces:
+- `memory`
+- `artifact`
+- `bash_exec`
+The goal is simple:
+- `artifact` drives the quest
+- `memory` reduces rediscovery cost
+- `bash_exec` runs durable shell work
+## 1. When to use which MCP
+Use `memory` when the output should help a later turn remember something reusable:
+- paper notes
+- failure patterns
+- debugging lessons
+- selected idea rationale
+- stable evaluation caveats
+For ideation specifically:
+- review prior quest idea cards before proposing a new idea
+- review prior experiment outcomes and failure patterns before broad new literature expansion
+- use old idea and experiment memory as references and constraints
+- do not silently treat a past line as the current active idea unless it is explicitly selected again
+Use `artifact` when the output changes or reports quest state:
+- idea creation or revision
+- branch/worktree transitions
+- experiment records
+- analysis campaign records
+- progress or milestone updates
+- decisions and approvals
+- connector-facing interaction state
+Use `bash_exec` when a command should stay durable and inspectable:
+- training runs
+- long evaluations
+- monitored scripts
+- commands that may need `read`, `list`, or `kill` later
+## 2. Memory tool semantics
+### `memory.list_recent(...)`
+Purpose:
+- recover local context quickly
+- rebuild state after pause/restart
+Use it:
+- at turn start
+- when resuming a stopped quest
+- before choosing which specific cards to inspect
+Do not use it:
+- as the only basis for a route decision
+- as a replacement for targeted search
+Example:
+```text
+memory.list_recent(scope="quest", limit=5, kind="knowledge")
+```
+### `memory.search(...)`
+Purpose:
+- targeted retrieval before repeating work
+Use it:
+- before broad literature search
+- before another retry on a recurring failure
+- before choosing or revising an idea
+- before asking the user something that may already be answered durably
+Recommended search patterns:
+- paper discovery:
+  - `kind="papers"`
+- route rationale:
+  - `kind="decisions"`
+- recurring bug or confounder:
+  - `kind="episodes"`
+- stable reusable rule:
+  - `kind="knowledge"`
+Examples:
+```text
+memory.search(query="imagenet official split baseline", scope="both", kind="papers", limit=6)
+memory.search(query="metric wiring mismatch adapter", scope="quest", kind="episodes", limit=5)
+memory.search(query="adapter baseline novelty", scope="both", kind="ideas", limit=6)
+```
+### `memory.read(...)`
+Purpose:
+- inspect one specific card after retrieval
+Use it:
+- after `list_recent` or `search` returned a card that clearly matters now
+Do not use it:
+- on dozens of cards in one turn
+Example:
+```text
+memory.read(path="~/DeepScientist/quests/q-xxxx/memory/knowledge/metric-contract.md")
+```
+### `memory.write(...)`
+Purpose:
+- persist a durable reusable finding
+Use it after:
+- a useful paper reading result
+- a non-trivial debugging episode
+- a stable evaluation rule
+- a selected or rejected idea with reason
+Do not use it for:
+- generic chat summaries
+- temporary progress pings
+- information already captured better as an artifact record
+Suggested body shape:
+1. context
+2. action or observation
+3. outcome
+4. interpretation
+5. boundaries
+6. evidence paths
+7. retrieval hint for future turns
+Example:
+```md
+---
+id: knowledge-1234abcd
+type: knowledge
+title: Metric comparison is valid only under the official validation split
+quest_id: q-xxxx
+scope: quest
+tags:
+  - stage:baseline
+  - topic:metric-contract
+stage: baseline
+confidence: high
+evidence_paths:
+  - artifacts/baselines/verification_report.md
+retrieval_hints:
+  - baseline comparison
+  - metric contract
+updated_at: 2026-03-11T18:00:00+00:00
+---
+Context: baseline verification on the official benchmark setup.
+Observation: numbers matched only when the official validation split was used.
+Interpretation: any comparison against this baseline is invalid under custom splits.
+Boundary: this rule is benchmark-specific and should not be promoted globally unless it recurs across quests.
+```
+### `memory.promote_to_global(...)`
+Purpose:
+- copy a proven reusable quest-local lesson into global memory
+Use it only when:
+- the lesson is not just repo-specific noise
+- it has become stable
+- another quest would likely benefit
+## 3. Artifact versus memory
+Write both only when they serve different roles.
+Example:
+- main experiment finished:
+  - `artifact.record_main_experiment(...)` stores the official run record
+  - `memory.write(kind="knowledge", ...)` is optional and should capture only the reusable lesson, such as a stable metric caveat or debugging rule
+Do not replace an experiment artifact with a memory card.
+Do not replace a reusable lesson with a progress artifact.
+## 4. Bash exec usage
+Use `bash_exec` for monitored commands:
+```text
+bash_exec.bash_exec(command="python train.py --config configs/main.yaml", mode="detach", workdir="<quest workspace>")
+```
+Then inspect:
+```text
+bash_exec.bash_exec(mode="list", status="running")
+bash_exec.bash_exec(mode="read", id="<bash_id>")
+```
+Use `kill` only when the quest truly needs to stop the session.
+## 5. Prompt-level expectations
+The agent should normally follow this discipline:
+1. `memory.list_recent(...)` at turn start or resume
+2. `memory.search(...)` before repeated work
+3. `memory.read(...)` on the few selected cards
+4. `artifact` for quest state changes and reports
+5. `bash_exec` for durable shell work
+6. `memory.write(...)` only after a real durable finding appears
+## 6. UI expectation
+In `/projects/{id}` Studio trace:
+- `memory.*` calls should render as structured cards, not opaque raw JSON
+- the card should show:
+  - operation type
+  - scope
+  - kind
+  - title or query
+  - matching items or saved card summary
+If the agent is not calling memory at all, the problem is prompt/skill behavior.
+If the agent is calling memory but Studio still looks like raw logs, the problem is UI rendering.

package/docs/en/08_FIGURE_STYLE_GUIDE.md ADDED Viewed

@@ -0,0 +1,97 @@
+# 08 Figure Style Guide: Figure and Plot Style
+This page defines the default DeepScientist figure language for experiment summaries, analysis campaigns, and paper-facing plots.
+## Core rule
+Prefer restrained, evidence-first figures.
+- connector milestone charts should be quick to read
+- paper figures should be clean enough to survive PDF export and review
+- both should use the fixed Morandi palette family from the prompt / stage skills
+## Fixed Morandi palette
+- `mist-stone`: `#F3EEE8`, `#D8D1C7`, `#8A9199`
+- `sage-clay`: `#E7E1D6`, `#B7A99A`, `#7F8F84`
+- `dust-rose`: `#F2E9E6`, `#D8C3BC`, `#B88C8C`
+- `fog-blue`: `#DCE5E8`, `#A9BCC4`, `#6F8894`
+- `olive-paper`: `#E6E1D3`, `#B8B095`, `#7C7A5C`
+- `lavender-ash`: `#E8E3EA`, `#B9AFC2`, `#7D7486`
+Recommended use:
+- main method vs baseline: `sage-clay` + `mist-stone`
+- ablations: `mist-stone` + `fog-blue` + `dust-rose`
+- uncertainty / sensitivity: `mist-stone` + `olive-paper`
+- appendix / supplementary: `mist-stone` + `lavender-ash`
+## Chart selection
+Choose chart type by the research question:
+- line chart: trends over epochs, steps, budget, scaling, or ordered settings
+- bar chart: a small number of categorical comparisons with a common zero baseline
+- dot / point-range chart: comparisons where confidence intervals matter
+- box / violin / histogram: real distribution questions only
+- heatmap: only when matrix structure is genuinely the result
+Do not use a crowded dashboard-style layout for one simple claim.
+## Color semantics
+- ordered magnitude -> sequential muted palette
+- signed delta around zero or a reference -> diverging muted palette with a neutral midpoint
+- categories -> discrete palette only
+Avoid rainbow / jet / HSV-like colormaps.
+## Export rules
+- connector milestone chart: usually `png`
+- paper figure: `pdf` or `svg` plus one `png` preview
+- avoid rasterizing line art or text when vector export is possible
+- keep white or near-white background
+- keep grids light
+- keep legends compact or use direct labeling when clearer
+- ensure text remains readable after likely journal scaling
+- prefer paper-like sizes inspired by common journal layouts:
+  - single-column: about `89 mm` wide
+  - double-column: about `183 mm` wide
+## Mandatory review workflow
+Do not mark a meaningful figure as final immediately after rendering it.
+For milestone charts, paper figures, and appendix figures:
+1. render the first version
+2. inspect the actual exported figure
+3. fix spacing, labels, legend placement, visual hierarchy, or color choices if needed
+4. re-export the final version
+Treat “rendered and visually checked” as the minimum completion condition.
+## Minimal review checklist
+Before treating a figure as done, verify:
+- the visual encoding matches the research question
+- labels, units, and baselines are explicit
+- colors mean the same thing across related figures
+- the source data path is known
+- the generating script path is known
+- the figure can be regenerated from durable files
+- the figure stays readable after realistic down-scaling
+- the main message is obvious within a quick scan
+- the legend does not block the data
+## Reference basis
+This policy was aligned with:
+- PLOS Computational Biology, “Ten Simple Rules for Better Figures”: `https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833`
+- Graphics Principles: `https://graphicsprinciples.github.io/`
+- Nature author formatting guidance: `https://www.nature.com/nature/for-authors/formatting-guide`
+- Matplotlib colormap guidance: `https://matplotlib.org/stable/users/explain/colors/colormaps.html`
+- Datawrapper accessibility guidance: `https://academy.datawrapper.de/article/206-how-we-make-sure-our-charts-maps-and-tables-are-accessible`

package/docs/en/09_DOCTOR.md ADDED Viewed

@@ -0,0 +1,108 @@
+# 09 `ds doctor`: Repair Startup and Environment Problems
+Use `ds doctor` when DeepScientist does not start cleanly after installation.
+## Recommended flow
+1. Install DeepScientist and Codex:
+   ```bash
+   npm install -g @openai/codex @researai/deepscientist
+   ```
+2. Try to start DeepScientist:
+   ```bash
+   ds
+   ```
+3. If startup fails or looks unhealthy, run:
+   ```bash
+   ds doctor
+   ```
+4. Read the checks from top to bottom and fix the failed items first.
+5. Run `ds doctor` again until all checks are healthy, then run `ds`.
+## What `ds doctor` checks
+- local Python runtime health
+- whether `~/DeepScientist` exists and is writable
+- whether `git` is installed and configured
+- whether required config files are valid
+- whether the current release is still using `codex` as the runnable runner
+- whether the Codex CLI can be found and passes a startup probe
+- whether an optional local `pdflatex` runtime is available for paper PDF compilation
+- whether the web and TUI bundles exist
+- whether the configured web port is free or already running the correct daemon
+## Common fixes
+### Codex is missing
+Run:
+```bash
+npm install -g @openai/codex
+```
+### Codex is installed but not logged in
+Run:
+```bash
+codex
+```
+Finish login once, then rerun `ds doctor`.
+### Local paper PDF compilation is unavailable
+DeepScientist can compile papers without a full TeX Live install if you add a lightweight TinyTeX runtime:
+```bash
+ds latex install-runtime
+```
+If you prefer a system package instead, install a distribution that provides `pdflatex` and `bibtex`.
+### Port `20999` is busy
+If it is your managed daemon:
+```bash
+ds --stop
+```
+Then run `ds` again.
+If another service already uses the port, change `ui.port` in:
+```text
+~/DeepScientist/config/config.yaml
+```
+### Git user identity is missing
+Run:
+```bash
+git config --global user.name "Your Name"
+git config --global user.email "you@example.com"
+```
+### Claude was enabled by mistake
+Current open-source releases keep `claude` as a TODO/reserved slot only.
+Set it back to disabled in:
+```text
+~/DeepScientist/config/runners.yaml
+```
+## Notes
+- `ds docker` is kept as a compatibility alias, but the official command is `ds doctor`.
+- The normal browser URL is `http://127.0.0.1:20999`.

package/docs/en/90_ARCHITECTURE.md ADDED Viewed

@@ -0,0 +1,245 @@
+# 90 Architecture: Maintainer Architecture Reference
+This document is the maintainer-facing architecture reference for the current open-source repository.
+It describes the implementation that actually exists in this checkout. When code and docs diverge, fix the docs in the same change.
+## Goals
+DeepScientist is a small, local-first research operating system with these stable constraints:
+- the authoritative runtime is Python
+- `npm` is the install and launch path
+- one quest equals one Git repository
+- durable state stays in files plus Git
+- the public built-in MCP surface stays limited to `memory`, `artifact`, and `bash_exec`
+- the workflow remains prompt-led and skill-led instead of stage-scheduler-heavy
+## Top-Level Layout
+Important repository areas:
+- `bin/`
+  - npm launcher entrypoint
+- `src/deepscientist/`
+  - authoritative runtime, daemon, CLI, registries, quest state, connectors
+- `src/prompts/`
+  - system prompt source
+- `src/skills/`
+  - first-party stage skills
+- `src/ui/`
+  - web workspace
+- `src/tui/`
+  - local TUI
+- `docs/`
+  - user docs and maintainer docs
+- `tests/`
+  - runtime, API, prompt, connector, packaging, and UI contract tests
+## Launch Chain
+The normal user path is:
+1. `npm install -g @researai/deepscientist`
+2. run `ds`
+3. `bin/ds.js` ensures the local Python runtime is ready under `~/DeepScientist/runtime/venv`
+4. the launcher starts the Python daemon
+5. the daemon serves the web workspace and shared API surface
+6. the TUI and web UI both consume the same daemon contracts
+`bin/ds.js` should stay thin. Product behavior belongs in Python services, prompts, and skills.
+## Runtime Home
+The default runtime home is `~/DeepScientist/`.
+Key directories:
+- `runtime/`
+  - launcher-managed runtime state
+  - Python venv
+  - built bundle helpers
+  - managed local tool installs under `runtime/tools/`
+- `config/`
+  - YAML configuration and baseline registry data
+- `memory/`
+  - global memory cards
+- `quests/`
+  - one quest per Git repository
+- `logs/`
+  - daemon and runtime logs
+- `cache/`
+  - reusable caches such as synced skills
+## Quest Model
+Each quest lives under `~/DeepScientist/quests/<quest_id>/` and is its own Git repository.
+Important quest state lives in:
+- `quest.yaml`
+- `brief.md`
+- `plan.md`
+- `status.md`
+- `SUMMARY.md`
+- `.ds/runtime_state.json`
+- `.ds/user_message_queue.json`
+- `.ds/events.jsonl`
+- `.ds/interaction_journal.jsonl`
+The quest layout contract is defined in `src/deepscientist/quest/layout.py`. If it changes, update quest services, daemon handlers, UI/TUI consumers, and tests together.
+## Core Runtime Boundaries
+### CLI
+- file: `src/deepscientist/cli.py`
+- responsibility: thin Python command surface over quest, config, doctor, and runtime services
+### Daemon
+- files:
+  - `src/deepscientist/daemon/app.py`
+  - `src/deepscientist/daemon/api/router.py`
+  - `src/deepscientist/daemon/api/handlers.py`
+- responsibility:
+  - serve the local web workspace
+  - expose shared API routes
+  - coordinate quest turn execution, inbox delivery, connectors, and run control
+### Quest
+- files under `src/deepscientist/quest/`
+- responsibility:
+  - create quests
+  - maintain quest snapshots
+  - persist runtime state
+  - derive explorer and canvas state
+### Artifact
+- files under `src/deepscientist/artifact/`
+- responsibility:
+  - durable structured artifacts
+  - Git-backed quest operations
+  - baselines, approvals, progress, reports, interactions
+### Memory
+- files under `src/deepscientist/memory/`
+- responsibility:
+  - global and quest-scoped Markdown memory
+### Bash Execution
+- files under `src/deepscientist/bash_exec/`
+- responsibility:
+  - managed, stoppable, durable shell sessions
+## Public MCP Boundary
+The public built-in MCP surface must remain:
+- `memory`
+- `artifact`
+- `bash_exec`
+Do not introduce public `git`, `connector`, or `runtime_tool` MCP namespaces.
+Git behavior stays inside `artifact`. Managed local tooling is a daemon/runtime concern, not a public MCP surface.
+## Registry-First Extension Points
+DeepScientist prefers small explicit registries over large dispatch branches.
+Current registry-backed areas include:
+- runners
+- channels
+- connector bridges
+- skill discovery
+- managed local runtime tools
+Patterns should stay close to:
+- `register_*()`
+- `get_*()`
+- `list_*()`
+## Managed Local Runtime Tools
+Managed local tools are optional helper runtimes installed under `~/DeepScientist/runtime/tools/`.
+Examples:
+- TinyTeX for local `pdflatex`
+- future candidates such as `pandoc`, `graphviz`, or `ffmpeg`
+The runtime-tool layer lives under `src/deepscientist/runtime_tools/` and exists to:
+- keep install logic out of unrelated subsystems
+- let the daemon and CLI inspect tool health in one place
+- resolve binaries consistently
+- make new managed tools register the same way as runners or channels
+Current pieces:
+- `runtime_tools/registry.py`
+  - `register_runtime_tool()`, `get_runtime_tool_factory()`, `list_runtime_tool_names()`
+- `runtime_tools/service.py`
+  - high-level access for status, install, and binary resolution
+- `runtime_tools/builtins.py`
+  - built-in registrations
+- `runtime_tools/tinytex.py`
+  - TinyTeX provider adapter
+Low-level TinyTeX implementation remains in `src/deepscientist/tinytex.py`.
+## Prompt And Skill Flow
+Research workflow behavior should primarily live in:
+- `src/prompts/system.md`
+- `src/deepscientist/prompts/builder.py`
+- `src/skills/*/SKILL.md`
+The daemon should persist and route state, but avoid becoming a rigid workflow scheduler.
+## UI Contract
+The web UI and TUI share the same daemon API contract.
+If an API route changes:
+- update daemon handlers/router
+- update `src/ui/src/lib/api.ts`
+- update `src/tui/src/lib/api.ts`
+- update tests that enforce the contract
+Preserve `/projects` and `/projects/:quest_id` as the main web workspace routes.
+## Documentation Contract
+Maintainer docs:
+- this file: `docs/en/90_ARCHITECTURE.md`
+- `docs/en/91_DEVELOPMENT.md`
+User docs:
+- `docs/en/*.md`
+- `docs/zh/*.md`
+Do not put temporary implementation checklists or planning drafts under `docs/`.
+## Testing Layers
+Typical validation layers:
+- Python unit and integration tests under `tests/`
+- API contract tests
+- prompt and skill tests
+- web/TUI bundle builds
+- packaging checks such as `npm pack --dry-run --ignore-scripts`
+When changing registries, quest layout, API contracts, prompts, or packaging, update the corresponding tests in the same change.