npm - @researai/deepscientist - Versions diffs - 1.5.0 → 1.5.1 - Mend

@researai/deepscientist 1.5.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/src/skills/experiment/SKILL.md CHANGED Viewed

@@ -12,10 +12,11 @@ Use this skill for the main evidence-producing runs of the quest.
 - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
 - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the run plan.
 - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
-- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only at real checkpoints, but poll more actively during live work: usually every 3 to 8 tool calls, before another multi-step batch, and before or after long-running `bash_exec` work. Keep updates high-signal and never filler.
-- Each progress update must state completed work, the durable output touched, and the immediate next run step.
+- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only when there is real user-visible progress: the first meaningful signal of long work, a meaningful checkpoint, or an occasional keepalive during truly long work. Do not update by tool-call cadence.
+- Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
+- Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
 - Keep ordinary subtask completions concise. When a main experiment actually finishes or reaches a stage-significant checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report rather than another short progress line.
-- That richer experiment-stage milestone report should normally cover: what run finished, where the durable run artifacts/logs/metrics live, the key metrics versus baseline or expectation, the main failure modes or caveats, and the exact recommended next action.
+- That richer experiment-stage milestone report should normally cover: what run finished, the headline result versus baseline or expectation, the main caveat, and the exact recommended next action.
 - That richer milestone report is still normally non-blocking. If the next route is already justified locally, continue automatically after reporting rather than idling for acknowledgment.
 - If the active communication surface is QQ and QQ milestone media is enabled in config, a completed main experiment may attach one summary PNG to that richer milestone update.
 - That PNG should be a connector-facing report chart, not a raw debug plot and not a draft paper figure.
@@ -478,7 +479,7 @@ That milestone should state:
 - the research question that was tested
 - the primary result and baseline delta
 - whether the run supports, weakens, or leaves the idea inconclusive
-- the durable files or artifact ids that now represent the evidence
+- the main caveat or confidence note that still matters
 - the exact recommended next move
 Do not treat a main run as durably complete until `artifact.record_main_experiment(...)` succeeds.

package/src/skills/finalize/SKILL.md CHANGED Viewed

@@ -12,13 +12,14 @@ Use this skill to close or pause a quest responsibly.
 - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
 - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before closing or pausing the quest.
 - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
-- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only at real checkpoints, but poll more actively during live work: usually every 3 to 8 tool calls, before another multi-step batch, and before or after long-running `bash_exec` work. Keep updates high-signal and never filler.
-- Each progress update must state completed consolidation work, the durable output touched, and the immediate next closure step.
+- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only when there is real user-visible progress: the first meaningful signal of long work, a meaningful checkpoint, or an occasional keepalive during truly long work. Do not update by tool-call cadence.
+- Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
+- Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
 - If the runtime starts an auto-continue turn with no new user message, keep finalizing from the durable quest state and active requirements instead of replaying the previous user turn.
 - Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
 - For any blocking decision request, provide 1 to 3 concrete options, put the recommended option first, explain each option's actual content plus pros and cons, wait up to 1 day when feasible, then choose the best option yourself and notify the user of the chosen option if the timeout expires.
 - If a threaded user reply arrives, interpret it relative to the latest finalize progress update before assuming the task changed completely.
-- When finalize reaches a real closure state, pause-ready packet, or route-back decision, send one threaded `artifact.interact(kind='milestone', ...)` update that names the recommendation, the strongest evidence behind it, and any reopen condition that still matters.
+- When finalize reaches a real closure state, pause-ready packet, or route-back decision, send one threaded `artifact.interact(kind='milestone', ...)` update that names the recommendation, why it is the right call, and any reopen condition that still matters.
 - True quest completion still requires explicit user approval through the runtime completion flow before calling `artifact.complete_quest(...)`.
 ## Stage purpose
@@ -114,7 +115,7 @@ State clearly:
 - key deliverables that exist and where they live
 Do not only say that evidence exists.
-Name the paths or artifact ids that matter.
+Say clearly what exists and why it matters. Name concrete paths or artifact ids only when the user asks for them or needs them to act.
 When a paper bundle exists, verify the manifest inventory explicitly, including:
 - `paper/paper_bundle_manifest.json`

package/src/skills/idea/SKILL.md CHANGED Viewed

@@ -12,13 +12,14 @@ Use this skill to turn the current baseline and problem frame into concrete, lit
 - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
 - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before selecting or refining ideas.
 - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
-- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only at real checkpoints, but poll more actively during live work: usually every 3 to 8 tool calls, before another multi-step batch, and before or after long-running `bash_exec` work. Keep updates high-signal and never filler.
-- Each progress update must state completed analysis, the durable output touched, and the immediate next ideation step.
+- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only when there is real user-visible progress: the first meaningful signal of long work, a meaningful checkpoint, or an occasional keepalive during truly long work. Do not update by tool-call cadence.
+- Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
+- Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
 - Keep ordinary subtask completions concise. When the idea stage actually finishes a meaningful deliverable such as a selected idea package, a rejected-ideas summary, or a route-shaping ideation checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report.
-- That richer idea-stage milestone report should normally cover: the final selected/rejected directions, the strongest supporting literature/codebase evidence, the concrete files or artifacts produced, the main residual risks, and the exact recommended next stage or experiment.
+- That richer idea-stage milestone report should normally cover: the final selected or rejected direction, why it won or lost, the main remaining risk, and the exact recommended next stage or experiment.
 - That richer milestone report is still normally non-blocking. If the next experiment or route is already clear from durable evidence, continue automatically after reporting instead of waiting.
 - If the runtime starts an auto-continue turn with no new user message, keep advancing from the active requirements and current durable state instead of re-answering the previous user turn.
-- Message templates are references only. Adapt to the actual context and vary wording so updates feel respectful, human, and non-robotic.
+- Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
 - Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
 - For any blocking decision request, provide 1 to 3 concrete options, put the recommended option first, explain each option's actual content plus pros and cons, wait up to 1 day when feasible, then choose the best option yourself and notify the user of the chosen option if the timeout expires.
 - If a threaded user reply arrives, interpret it relative to the latest idea progress update before assuming the task changed completely.

package/src/skills/intake-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,277 @@
+---
+name: intake-audit
+description: Use when a quest does not start from a blank state and the agent must first audit, trust-rank, and reconcile existing baselines, results, drafts, or review materials before choosing the next anchor.
+---
+# Intake Audit
+Use this skill when the quest already has meaningful state and the first job is to normalize that state instead of restarting the canonical research loop from zero.
+## Interaction discipline
+- Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
+- If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the audit.
+- Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
+- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` only when there is real user-visible progress: the first meaningful signal of the audit, a meaningful checkpoint, or an occasional keepalive during truly long work. Do not update by tool-call cadence.
+- Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
+- Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
+- Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
+- Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
+- For any blocking decision request, provide 1 to 3 concrete options, put the recommended option first, explain each option's actual content plus pros and cons, wait up to 1 day when feasible, then choose the best option yourself and notify the user of the chosen option if the timeout expires.
+- If a threaded user reply arrives, interpret it relative to the latest intake-audit progress update before assuming the task changed completely.
+- When the audit reaches a durable route recommendation, send one richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` update that says what state is trusted, what still needs work, and which anchor should run next.
+## Purpose
+`intake-audit` is an auxiliary entry skill, not a normal long-running anchor.
+Its purpose is to answer four questions before deeper work begins:
+1. what already exists?
+2. what is trustworthy?
+3. what can be reused directly?
+4. which skill should take over next?
+This skill exists because many quests do **not** start from a clean slate.
+Common non-blank starts include:
+- a baseline already exists and may already be confirmed
+- a main experiment has already finished and only needs durable recording or interpretation
+- analysis results already exist across child branches or worktrees
+- a draft or paper bundle already exists
+- reviewer comments already exist and the quest is really a revision/rebuttal task
+- the user explicitly says not to rerun from scratch
+Do not treat these as edge cases.
+They are common research entry states.
+## Use when
+- `startup_contract.launch_mode = custom` and the profile implies existing work
+- the quest root already contains meaningful baseline, experiment, analysis, or paper assets
+- the user says:
+  - “baseline 已经有了”
+  - “不要重新复现”
+  - “先整理现有结果”
+  - “已有论文/草稿，先基于现有状态继续”
+- review materials exist but the current paper/result state is still unclear
+## Do not use when
+- the quest is genuinely blank and should start with ordinary `scout` or `baseline`
+- the active state is already well-normalized and the next anchor is obvious
+- the task is a pure non-research request
+## Non-negotiable rules
+- Do not rerun expensive work just because files exist. First decide whether a trust gap actually requires rerunning.
+- Do not fabricate missing durable records in order to make the quest look cleaner.
+- Do not mark an existing baseline as trusted unless the metric contract, source, and comparability are clear enough.
+- Do not mark an existing experiment as a durable main result unless it is genuinely the main run for an accepted idea line.
+- Do not silently import old drafts, plots, or notes as the active contract if they belong to a different idea line or branch line.
+- Do not lose provenance. If an artifact is reused, record where it came from and why it is trusted enough.
+- If the quest is really a review/revision task, route to `rebuttal` instead of pretending this is a normal fresh paper-writing pass.
+## Typical intake states
+Classify the current quest into one or more of these buckets:
+- `baseline_ready`
+- `baseline_partial`
+- `main_result_ready`
+- `analysis_ready`
+- `draft_ready`
+- `paper_bundle_ready`
+- `review_package_ready`
+- `unclear_state`
+Also classify every important asset by trust:
+- `trusted`
+- `usable_with_verification`
+- `reference_only`
+- `stale_or_conflicting`
+- `missing_context`
+## Primary truth sources
+Use, in roughly this order:
+- `startup_contract`
+  - especially `launch_mode`, `custom_profile`, `entry_state_summary`, `review_summary`, and `custom_brief` when present
+- quest continuity files:
+  - `brief.md`
+  - `plan.md`
+  - `status.md`
+  - `SUMMARY.md`
+- recent durable artifact state and quest snapshot
+- current workspace tree and visible quest files
+- prior memory cards and decisions
+- git history and current branch topology when needed
+- user messages
+Do not trust chat recollection over durable state.
+## Workflow
+### 1. Read startup intent first
+Before touching the workspace, inspect:
+- `startup_contract`
+- the latest user message
+- recent quest status
+Interpret these fields specially when present:
+- `launch_mode = custom`
+  - do not force the standard full-research route
+- `custom_profile = continue_existing_state`
+  - expect reusable assets and state normalization
+- `custom_profile = revision_rebuttal`
+  - expect a paper/review package and likely handoff to `rebuttal`
+- `custom_profile = freeform`
+  - prefer the custom brief over the default stage ordering
+### 2. Retrieve memory before filesystem triage
+Stage-start requirement:
+- run `memory.list_recent(scope='quest', limit=5)`
+- run at least one `memory.search(...)` using:
+  - the quest title or central topic
+  - any known baseline id or method name
+  - any known paper title or venue short name
+  - any known review keyword such as `rebuttal`, `review`, or `revision`
+The point is to reuse prior route knowledge before re-auditing the same state from scratch.
+### 3. Inventory the quest state
+Create or refresh a durable audit note using `references/state-audit-template.md`.
+The inventory should cover:
+- baseline assets
+- main experiment assets
+- analysis assets
+- writing assets
+- review assets
+- git / branch / worktree state
+- missing or conflicting state
+Useful places to inspect include:
+- `artifacts/`
+- `baselines/`
+- `experiments/main/`
+- `experiments/analysis/`
+- `paper/`
+- `reviews/` or equivalent user-provided review folders
+Do not over-read the entire tree.
+Read enough to classify the state and locate the likely trust anchors.
+### 4. Trust-rank and reconcile
+For each major asset, decide:
+- can it be trusted as-is?
+- does it need a light verification pass?
+- is it only reference material?
+- is it stale or conflicting?
+Then reconcile it with the durable artifact layer:
+- existing reusable baseline:
+  - `artifact.attach_baseline(...)`
+  - then `artifact.confirm_baseline(...)` when trust is justified
+- existing main result:
+  - `artifact.record_main_experiment(...)` only if the run is genuinely the accepted main run and the required fields can be filled honestly
+- existing analysis results:
+  - if the campaign already exists, use `artifact.record_analysis_slice(...)` for each real finished slice that needs durable registration
+- existing outline:
+  - `artifact.submit_paper_outline(mode='select'|'revise', ...)` when there is a real durable outline contract
+- existing paper bundle:
+  - `artifact.submit_paper_bundle(...)` when the draft/package state is genuinely ready
+If the evidence is insufficient for a durable backfill, record that insufficiency explicitly instead of inventing a cleaned-up history.
+### 5. Choose the next anchor
+After reconciliation, write one durable route decision with `artifact.record(kind='decision', ...)`.
+Typical next anchors:
+- baseline exists but trust is incomplete -> `baseline`
+- baseline and route are ready, but no durable main result exists -> `experiment`
+- main result exists, but follow-up evidence is missing -> `analysis-campaign`
+- evidence is strong and writing should begin -> `write`
+- review package is active -> `rebuttal`
+- the quest is effectively complete or should pause -> `finalize`
+### 6. Report and hand off
+At the end of the intake pass, send one threaded `artifact.interact(kind='milestone', ...)` update that says:
+- what already exists and is trusted
+- what remains untrusted or incomplete
+- which next skill should take over
+- whether the user needs to provide anything else
+## Recommended durable outputs
+- `artifacts/intake/state_audit.md`
+- `artifacts/intake/recommended_next_step.md`
+- one `decision` artifact for the post-audit route
+- one or more repair/backfill artifact calls when justified
+## Companion skill routing
+Open additional skills only when the audit indicates they are necessary:
+- `baseline`
+  - when an existing baseline must be validated, repaired, confirmed, or waived
+- `experiment`
+  - when the accepted route lacks a durably recorded main result
+- `analysis-campaign`
+  - when the main result exists but the evidence boundary is still weak
+- `write`
+  - when a trustworthy draft or outline should become the active writing line
+- `rebuttal`
+  - when reviewer comments, revision requests, or meta-review materials define the real task
+- `decision`
+  - when more than one next anchor remains plausible
+## Memory discipline
+Stage-end requirement:
+- if the intake pass produced a durable route choice, trust judgment, or asset-reuse rule, write at least one `memory.write(...)`
+Useful tags include:
+- `stage:intake-audit`
+- `type:state-audit`
+- `type:route-handoff`
+- `type:reuse-rule`
+- `state:trusted`
+- `state:needs-verification`
+When the audit concerns a specific existing line, include identifiers when known:
+- `baseline_id`
+- `idea_id`
+- `run_id`
+- `branch`
+- `paper_state`
+## Success condition
+`intake-audit` is successful when:
+- the quest's current state is understandable
+- the trustworthy reusable assets are explicit
+- the untrusted gaps are explicit
+- the next anchor is explicit
+- the system can continue without pretending the quest started from zero

package/src/skills/intake-audit/references/state-audit-template.md ADDED Viewed

@@ -0,0 +1,41 @@
+# State Audit Template
+## Intake Summary
+- launch mode:
+- custom profile:
+- user intent:
+- recommended next anchor:
+## Asset Matrix
+| Area | Current asset | Trust level | Why | Missing proof | Recommended action |
+| --- | --- | --- | --- | --- | --- |
+| Baseline |  |  |  |  |  |
+| Main experiment |  |  |  |  |  |
+| Analysis |  |  |  |  |  |
+| Draft / paper |  |  |  |  |  |
+| Review package |  |  |  |  |  |
+| Git / branches |  |  |  |  |  |
+## Reusable Assets
+- baseline:
+- metrics:
+- figures:
+- draft sections:
+- review materials:
+## Conflicts / Unknowns
+- conflicting baseline/result state:
+- unclear provenance:
+- missing metric contract:
+- stale draft risk:
+## Route Recommendation
+- next anchor:
+- why now:
+- what should not be repeated:
+- what still needs verification: