npm - @researai/deepscientist - Versions diffs - 1.5.11 → 1.5.12 - Mend

@researai/deepscientist 1.5.11 → 1.5.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (102) hide show

package/src/skills/analysis-campaign/SKILL.md CHANGED Viewed

@@ -15,12 +15,19 @@ Use the same route for:
 - rebuttal-driven extra experiments
 - writing-driven evidence gaps
+For paper-facing work, treat “analysis campaign” broadly:
+- not only post-hoc interpretation
+- also ablations, sensitivity checks, robustness checks, efficiency or cost checks, highlight-validation runs, and limitation-boundary work beyond the main result
+Do not assume a writing-facing campaign means “analysis only”.
 Do not invent a separate experiment system for those cases.
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Prefer `bash_exec` for campaign slice commands so each run has a durable session id, quest-local log folder, and later `read/list/kill` control.
 - Keep ordinary subtask completions concise. When an analysis campaign or a stage-significant campaign checkpoint is complete, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report.
 - That richer campaign milestone report should normally cover: which slices completed, the main takeaway, whether the claim got stronger or weaker, and the exact recommended next route.
@@ -69,11 +76,12 @@ For campaign prioritization and writing-facing slice design, read `references/ca
 Treat this as the compressed campaign map. The authoritative slice protocol and aggregation rules remain in `Workflow`.
 1. Bind the campaign to the parent run or idea and, when writing-facing, to the selected outline.
-2. Before launching slices, create `PLAN.md` and `CHECKLIST.md`.
-3. Use `PLAN.md` as the durable charter and `CHECKLIST.md` as the living execution surface while launching, monitoring, recording, and aggregating slices.
-4. Run claim-critical slices first and smoke-test long slices before their real runs.
-5. Revise the plan if slice feasibility, ordering, comparators, or campaign interpretation changes materially, and record every slice durably, including honest non-success states.
-6. Close meaningful campaign milestones with a concise `1-2` sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, and what happens next.
+2. When the campaign is writing-facing, refresh `paper/paper_experiment_matrix.*` before freezing the slice frontier.
+3. Before launching slices, create `PLAN.md` and `CHECKLIST.md`.
+4. Use `PLAN.md` as the durable charter and `CHECKLIST.md` as the living execution surface while launching, monitoring, recording, and aggregating slices.
+5. Run claim-critical slices first and smoke-test long slices before their real runs.
+6. Revise the plan and matrix if slice feasibility, ordering, comparators, or campaign interpretation changes materially, and record every slice durably, including honest non-success states.
+7. Close meaningful campaign milestones with a concise `1-2` sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, what the matrix frontier now looks like, and what happens next.
 ## Non-negotiable rules
@@ -83,6 +91,8 @@ Treat this as the compressed campaign map. The authoritative slice protocol and
 - Every analysis slice must have a specific research question and a falsifiable or at least decision-relevant expectation.
 - If the campaign is supporting a paper or paper-like report, do not launch it until a selected outline exists.
 - When a selected outline exists, every slice should map to a named `research_question` and `experimental_design` from that outline.
+- When the campaign is supporting a paper or paper-like report, do not launch or reorder the slice set without first reading `paper/paper_experiment_matrix.md` when it exists.
+- For writing-facing campaigns, every slice should correspond to a stable matrix row such as `exp_id`, not just a free-form note.
 - Do not aggregate campaign conclusions without per-run evidence.
 - Do not bury null or contradictory findings.
@@ -110,6 +120,7 @@ Before launching a campaign, confirm:
 - the list of specific analysis questions
 - the current quest / user-provided assets that each planned slice will actually use
 - whether each slice is executable with the current assets, tooling, and available credentials
+- for paper-facing campaigns, the current paper experiment matrix frontier and which rows are actually feasible now
 - if durable state exposes `active_baseline_metric_contract_json`, read that JSON file before defining slice success criteria or comparison tables
 - treat `active_baseline_metric_contract_json` as the default baseline comparison contract unless a slice is explicitly testing a different evaluation contract
@@ -150,6 +161,8 @@ A campaign should usually leave behind:
 - a campaign identifier
 - a selected outline reference when the campaign is writing-facing
+- a refreshed `paper/paper_experiment_matrix.md`
+- a refreshed `paper/paper_experiment_matrix.json`
 - one directory per analysis run
 - any supplementary baseline reproduced for analysis under `baselines/local/<baseline_id>/` or attached under `baselines/imported/<baseline_id>/`
 - one quest-level supplementary baseline inventory at `artifacts/baselines/analysis_inventory.json`
@@ -198,17 +211,28 @@ If the campaign exists to support a paper or paper-like report:
 - do not proceed until one selected outline exists
 - if no selected outline exists yet, route to `write` or `decision` first so the outline can be created and selected durably
+- before deciding the slice list, create or refresh `paper/paper_experiment_matrix.md` when it is missing or stale
+- treat that matrix as the upstream paper experiment contract, not `todo_items` alone
+- use the matrix to decide:
+  - which rows are `main_required`
+  - which are `main_optional`
+  - which are appendix-only
+  - which are optional or should be dropped
+- do not start stable experiments-section drafting while currently feasible non-optional matrix rows remain unresolved
 - call `artifact.create_analysis_campaign(...)` with:
   - `selected_outline_ref`
   - `research_questions`
   - `experimental_designs`
   - `todo_items`
 - ensure each todo item names at least:
+  - `exp_id`
   - `todo_id`
   - `slice_id`
   - `title`
   - `research_question`
   - `experimental_design`
+  - `tier`
+  - `paper_placement`
   - `completion_condition`
 This keeps the analysis campaign aligned with the paper plan instead of becoming a free-floating batch of slices.
@@ -229,6 +253,7 @@ The charter should also include:
 - campaign type priority order
 - expected slice count
 - dependency structure between slices
+- the matrix path and current execution frontier
 - whether any slice requires isolated code changes or only reruns/config changes
 - the top-level success condition for ending the campaign
 - the top-level abandonment condition for stopping it early
@@ -238,6 +263,7 @@ Prefer to keep this charter in `PLAN.md` first and mirror the execution frontier
 For each analysis question, also state:
 - why it matters to the main claim
+- whether it exists mainly to support a core claim, validate a highlight, answer an efficiency or cost concern, or bound a limitation
 - what result would strengthen the claim
 - what result would weaken or complicate the claim
 - whether the run is:
@@ -267,6 +293,8 @@ Each analysis run should correspond to one need, such as:
 - run additional seeds
 - inspect one failure bucket
 - test one environment variation
+- measure one efficiency or cost dimension
+- validate one highlight hypothesis
 Avoid changing many factors at once unless the campaign is explicitly exploratory.
@@ -283,9 +311,13 @@ For each slice, define at minimum:
 Recommended extra per-slice fields:
+- `exp_id`
 - `slice_id`
 - `run_kind`
 - `slice_class`, such as `auxiliary`, `claim-carrying`, or `supporting`
+- `tier`, such as `main_required`, `main_optional`, `appendix`, or `optional`
+- `paper_placement`
+- `highlight_ids`
 - `required_baselines`, where each item records at least `baseline_id` plus the reason, benchmark, and split when known
 If a slice needs an extra comparator baseline:
@@ -321,6 +353,14 @@ Treat `campaign_id` as system-owned, and treat `slice_id` / `todo_id` as agent-a
 Do not replace the normal campaign flow with repeated manual `artifact.prepare_branch(...)` calls.
 After each slice finishes, call `artifact.record_analysis_slice(...)` immediately so the result is mirrored back to the parent branch and the next slice can be activated.
 If a slice fails or becomes infeasible, still call `artifact.record_analysis_slice(...)` with an honest non-success status plus the real blocker and next recommendation; do not leave the campaign state ambiguous.
+After every completed, excluded, or blocked writing-facing slice:
+- reopen `paper/paper_experiment_matrix.md`
+- update the row status, feasibility, and result artifacts
+- update whether the row now belongs in main text, appendix, or omission
+- update the remaining execution frontier before choosing the next slice
+Do not keep launching writing-facing slices from stale memory when the matrix has changed.
 For slice recording, `deviations` and `evidence_paths` are optional context fields, not mandatory ceremony; include them only when they materially help explanation or auditability.
 Each `artifact.record_analysis_slice(...)` call should also include an `evaluation_summary` with exactly these six fields:

package/src/skills/analysis-campaign/references/campaign-plan-template.md CHANGED Viewed

@@ -10,6 +10,9 @@ Treat it as the durable version of the charter, not a separate optional memo.
 - main claim under test:
 - user's core requirements:
 - campaign outcome needed:
+- selected outline ref:
+- paper experiment matrix path:
+- current matrix execution frontier:
 ## 2. Boundary And Comparability
@@ -20,18 +23,26 @@ Treat it as the durable version of the charter, not a separate optional memo.
 ## 3. Slice Plan
-| Slice id | Slice class | Research question | Expected value | Priority | Needs code change? | Needs extra baseline? |
-|---|---|---|---|---|---|---|
-| | auxiliary / claim-carrying / supporting | | | | yes / no | yes / no |
+| Exp id | Slice id | Tier | Slice class | Experiment type | Research question | Expected value | Priority | Paper placement | Needs code change? | Needs extra baseline? |
+|---|---|---|---|---|---|---|---|---|---|---|
+| | | main_required / main_optional / appendix / optional | auxiliary / claim-carrying / supporting | ablation / sensitivity / robustness / efficiency / highlight / boundary / case-study | | | | main_text / appendix / maybe / omit | yes / no | yes / no |
-## 4. Assets And Dependencies
+## 4. Highlight Hypotheses
+- highlight id:
+- one-line claim:
+- why it is plausible:
+- which slices validate or falsify it:
+- what happens if it fails:
+## 5. Assets And Dependencies
 - quest-local assets already available:
 - checkpoints / baselines already available:
 - downloads or services still needed:
 - fallback options if external assets are blocked:
-## 5. Execution Strategy
+## 6. Execution Strategy
 - first slices to run:
 - smoke-test policy:
@@ -49,19 +60,21 @@ Monitoring and sleep plan:
 - health signals that justify continued monitoring:
 - conditions that trigger slice redesign, kill, or campaign revision:
-## 6. Reporting Plan
+## 7. Reporting Plan
 - what will count as stable support:
 - what will count as contradiction:
 - what will count as unresolved ambiguity:
 - campaign summary should say in `1-2` sentences:
+- matrix refresh rule after every slice:
+- main-text gating rule:
-## 7. Checklist Link
+## 8. Checklist Link
 - checklist path:
 - next unchecked item:
-## 8. Revision Log
+## 9. Revision Log
 | Time | What changed | Why it changed | Impact on slices or interpretation |
 |---|---|---|---|

package/src/skills/baseline/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ It absorbs the essential old DeepScientist reproducer discipline into one stage
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Keep ordinary setup and debugging updates concise. Reserve richer milestone reports for accepted / waived / blocked baseline outcomes or other route-changing checkpoints instead of narrating every small setup step.
 - Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
 - If a threaded user reply arrives, interpret it relative to the latest baseline progress update before assuming the task changed completely.

package/src/skills/decision/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Use this skill whenever continuation is non-trivial.
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Message templates are references only. Adapt to context and vary wording so updates feel natural and non-robotic.
 - If the runtime starts an auto-continue turn with no new user message, continue from the active requirements and durable quest state instead of replaying the previous user turn.
 - If `startup_contract.decision_policy = autonomous`, do not emit ordinary `artifact.interact(kind='decision_request', ...)` calls; decide the route yourself, record the reason, and continue.

package/src/skills/experiment/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Use this skill for the main evidence-producing runs of the quest.
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Keep ordinary subtask completions concise. When a main experiment actually finishes or reaches a stage-significant checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report rather than another short progress line.
 - That richer experiment-stage milestone report should normally cover: what run finished, the headline result versus baseline or expectation, the main caveat, and the exact recommended next action.
 - That richer milestone report is still normally non-blocking. If the next route is already justified locally, continue automatically after reporting rather than idling for acknowledgment.

package/src/skills/finalize/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Use this skill to close or pause a quest responsibly.
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - If the runtime starts an auto-continue turn with no new user message, keep finalizing from the durable quest state and active requirements instead of replaying the previous user turn.
 - If a threaded user reply arrives, interpret it relative to the latest finalize progress update before assuming the task changed completely.
 - When finalize reaches a real closure state, pause-ready packet, or route-back decision, send one threaded `artifact.interact(kind='milestone', ...)` update that names the recommendation, why it is the right call, and any reopen condition that still matters.

package/src/skills/idea/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Use this skill to turn the current baseline and problem frame into concrete, lit
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Keep ordinary subtask completions concise. When the idea stage actually finishes a meaningful deliverable such as a selected idea package, a rejected-ideas summary, or a route-shaping ideation checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report.
 - That richer idea-stage milestone report should normally cover: the final selected or rejected direction, why it won or lost, the main remaining risk, and the exact recommended next stage or experiment.
 - That richer milestone report is still normally non-blocking. If the next experiment or route is already clear from durable evidence, continue automatically after reporting instead of waiting.

package/src/skills/intake-audit/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Use this skill when the quest already has meaningful state and the first job is
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
 - If a threaded user reply arrives, interpret it relative to the latest intake-audit progress update before assuming the task changed completely.
 - When the audit reaches a durable route recommendation, send one richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` update that says what state is trusted, what still needs work, and which anchor should run next.

package/src/skills/rebuttal/SKILL.md CHANGED Viewed

@@ -14,7 +14,7 @@ The task is “respond to concrete reviewer pressure with the smallest honest se
 ## Interaction discipline
 - Follow the shared interaction contract injected by the system prompt.
-- For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
+- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
 - Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
 - If a threaded user reply arrives, interpret it relative to the latest rebuttal progress update before assuming the task changed completely.
 - When the rebuttal plan, the main supplementary-evidence package, or the final response bundle becomes durable, send one richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` update that says what reviewer concerns are now addressed, what still remains open, and what happens next.
@@ -73,6 +73,16 @@ First decide whether the issue is actually:
 - Do not run supplementary experiments without first mapping them to named reviewer concerns.
 - Do not keep the original claim scope if the new evidence no longer supports it.
 - If a reviewer request cannot be fully satisfied, say so clearly and explain the honest limitation.
+- If `startup_contract.baseline_execution_policy` is present, honor it:
+  - `must_reproduce_or_verify`
+    - verify or recover the rebuttal-critical baseline/comparator before reviewer-linked follow-up work
+  - `reuse_existing_only`
+    - trust the current baseline/results unless you find concrete inconsistency, corruption, or missing-evidence problems
+  - `skip_unless_blocking`
+    - do not spend time rerunning baselines unless a named reviewer item truly depends on a missing comparator
+- If `startup_contract.manuscript_edit_mode = latex_required`, treat the provided LaTeX tree or `paper/latex/` as the preferred writing surface when manuscript revision is needed.
+- If LaTeX source is unavailable while `latex_required` is requested, do not pretend the manuscript was edited; produce LaTeX-ready replacement text and an explicit blocker note instead.
+- Accept review inputs from URLs, local file paths, local directories, or current-turn attachments; do not assume the review packet must already be neatly structured.
 ## Primary inputs
@@ -81,6 +91,7 @@ Use, in roughly this order:
 - the current paper or draft
 - the selected outline if one exists
 - review comments, meta-review, or editor letter
+- current-turn attachments and user-provided local paths / directories / URLs for the manuscript or review packet
 - the six-field `evaluation_summary` blocks from recent main experiments and analysis slices
 - recent main and analysis experiment results
 - prior decision and writing memory
@@ -88,6 +99,7 @@ Use, in roughly this order:
 If the current paper/result state is still unclear, open `intake-audit` first before continuing the rebuttal workflow.
 Before launching any new supplementary experiment, read those structured `evaluation_summary` blocks first so the rebuttal plan starts from the already-recorded evidence state rather than from raw narrative memory.
+If the user provided manuscript files or review-packet files directly, first normalize them into durable quest-visible paths under `paper/` or `paper/rebuttal/input/` before planning reviewer-linked experiments or draft replies.
 ## Core outputs
@@ -98,6 +110,8 @@ The rebuttal pass should usually leave behind:
 - `paper/rebuttal/response_letter.md`
 - `paper/rebuttal/text_deltas.md`
 - `paper/rebuttal/evidence_update.md`
+- `paper/paper_experiment_matrix.md` when reviewer concerns materially change the paper experiment plan
+- `paper/paper_experiment_matrix.json` when reviewer concerns materially change the paper experiment plan
 Use the templates in `references/` when needed:
@@ -212,6 +226,7 @@ For each reviewer issue, decide whether the right answer is:
 Then write one durable rebuttal plan in `paper/rebuttal/action_plan.md`.
 That plan should explicitly include the analysis-experiment TODO list for reviewer-linked follow-up work.
+If reviewer concerns materially change the paper's experiment story, also create or revise `paper/paper_experiment_matrix.*` so the rebuttal experiment package stays consistent with the paper-facing plan rather than drifting into a reviewer-only side list.
 The action plan should be the main thinking draft before execution.
 For each serious item, record:
@@ -237,6 +252,18 @@ Write at least:
 For novelty / comparison / positioning complaints, do not default to experiments.
 First decide whether the issue is better answered by a focused literature audit and clearer paper positioning.
+When a reviewer concern really does imply experimental follow-up, map it into the same paper experiment taxonomy used by the writing line:
+- `component_ablation`
+- `sensitivity`
+- `robustness`
+- `efficiency_cost`
+- `highlight_validation`
+- `failure_boundary`
+- `case_study_optional`
+Case study remains optional unless the reviewer concern is specifically qualitative and cannot be addressed better with quantitative evidence.
 ### 3. Route experiments only when genuinely needed
 If one or more comments truly require new runs:
@@ -252,9 +279,18 @@ If one or more comments truly require new runs:
 Do not launch a free-floating ablation batch.
 Every supplementary run should answer a named reviewer issue.
 Every slice should reference one or more stable reviewer item ids.
+Every rebuttal-linked slice should also reference the corresponding `exp_id` from `paper/paper_experiment_matrix.*` when that matrix exists.
 After each completed reviewer-linked slice, record the result, the implication for the manuscript, and the concrete modification advice in `paper/rebuttal/evidence_update.md`.
 Use the same shared supplementary-experiment protocol as ordinary analysis work; do not invent a rebuttal-only experiment system.
 If ids or refs are unclear, recover them first with `artifact.resolve_runtime_refs(...)`, `artifact.get_analysis_campaign(...)`, or `artifact.list_paper_outlines(...)`.
+After each completed, excluded, or blocked reviewer-linked slice:
+- reopen `paper/paper_experiment_matrix.*`
+- update the affected `exp_id`
+- update whether the result now belongs in main text, appendix, or omission
+- update which reviewer items are now fully answered
+Do not finalize the rebuttal package while reviewer-critical and currently feasible matrix rows remain unresolved without an explicit blocker note.
 ### 4. Route manuscript changes explicitly
@@ -279,6 +315,14 @@ If a reviewer request forces a narrower story, revise the outline before polishi
 Use `references/response-letter-template.md` when helpful.
+Before treating the response letter as final:
+- first complete every feasible reviewer-linked experiment or analysis slice that the current plan marked as necessary
+- ensure the necessary rows in `paper/paper_experiment_matrix.*` have been refreshed after those runs
+- use real completed experiment results directly in the reply wherever the concern is genuinely experimental
+- for non-experimental items, do not wait for unnecessary experiments; answer as strongly as the current manuscript, literature, and analysis already allow
+- if one experimental item cannot be completed in time, keep the reply honest and explicit about the remaining limitation or fallback wording
 The response should be:
 - professional
@@ -290,6 +334,8 @@ The response should be:
 Good response structure:
 - short appreciation / acknowledgement
+- overall response that summarizes the revision strategy and the strongest strengths acknowledged across reviewers
+- strengths recognized across reviewers
 - direct answer to the reviewer concern
 - keep stable item ids visible when helpful
 - restate reviewer wording faithfully before answering
@@ -300,6 +346,28 @@ Good response structure:
   - claim scope
 - if not fully addressed, why not and what honest limitation remains
+Drafting style rules for the actual author reply body:
+- Treat `response_letter.md` as rebuttal-ready author text, not as internal coaching notes.
+- Write in a calm, direct, precise author voice.
+- Sound like authors clarifying the record, not authors asking for approval.
+- Brief professional courtesy is allowed, but keep it short and move to substance immediately.
+- Avoid sycophancy, flattery, excessive gratitude, or approval-seeking language.
+- Do not default to conceding fault.
+- Use selective concede, selective clarify, and selective defend.
+- Answer the reviewer concern directly in the first 1 to 2 sentences.
+- For non-experimental items, reduce reviewer uncertainty as much as the real evidence allows; the goal is to make a score improvement reasonable for an honest reviewer, not to persuade through rhetoric alone.
+- Write strongly enough that a neutral reviewer or AC can judge the concern substantially addressed from the rebuttal text alone.
+- After the literal answer, address the underlying doubt about validity, novelty, scope, fairness, or completeness.
+- If the answer already exists in the manuscript, restate it in the rebuttal and then point to the manuscript change; do not only say “we will clarify”.
+- If the issue is about wording, interpretation, or claim strength, include the revised sentence or close paraphrase that should appear in the manuscript.
+- Keep the main response body for each item as 1 to 2 full paragraphs of polished prose.
+- Do not use bullets, numbered lists, bold labels, or checklist fragments inside the actual response paragraphs.
+- Do not narrate rebuttal strategy inside the author reply.
+- Do not rely on future edits alone when you can already give the clarification, argument, or wording now.
+- When pushing back, lead with evidence, scope, or feasibility constraints before intuition.
+- If `startup_contract.manuscript_edit_mode = latex_required`, keep manuscript-facing replacement text LaTeX-ready.
 If details are still genuinely unknown, use explicit placeholders such as `[[AUTHOR TO FILL]]` rather than inventing specifics.
 Avoid:
@@ -319,6 +387,8 @@ When the rebuttal package is durably ready:
 If a combined rebuttal note is useful, make sure the total package still covers:
+- overall response
+- strengths recognized across reviewers
 - overview and revision strategy
 - draft responses to reviewers
 - point-to-point triage
@@ -398,6 +468,9 @@ Useful tags include:
 - supplementary experiments, if needed, are routed cleanly
 - manuscript deltas are explicit
 - the response letter is evidence-backed and honest
+- the final package contains both:
+  - reviewer-specific replies
+  - one overall response that makes the paper strengths, the main resolved concerns, and the remaining limitations legible to a neutral reader or AC
 The goal is not just “write a nicer response”.
 The goal is to convert review pressure into a durable, auditable revision workflow.

package/src/skills/rebuttal/references/response-letter-template.md CHANGED Viewed

@@ -1,9 +1,55 @@
 # Response Letter Template
+## Drafting rules
+- Treat this file as rebuttal-ready author text, not as private coaching notes.
+- Write in a calm, direct, precise author voice.
+- Brief professional courtesy is allowed, but keep it short and move to substance immediately.
+- Avoid sycophancy, flattery, excessive gratitude, or approval-seeking language.
+- Do not default to conceding fault.
+- Use selective concede, selective clarify, and selective defend.
+- Answer the reviewer concern directly in the first 1 to 2 sentences.
+- Keep the actual response body for each item as 1 to 2 full paragraphs of polished prose.
+- If the issue is about wording, interpretation, or claim strength, include the revised sentence or close paraphrase that should appear in the manuscript.
+- Do not use bullets, numbered lists, or label-value schemas inside the actual response paragraphs.
+- Do not rely on future edits alone when you can already give the clarification, argument, or draft wording now.
+- If a concrete number, setup detail, or result is still unknown, use `[[AUTHOR TO FILL]]`.
 ## Cover note
 We thank the reviewers for the careful reading and constructive feedback. Below we respond point by point and indicate the corresponding manuscript changes and supplementary evidence when applicable.
+## Overview & Revision Strategy
+- main reviewer risks:
+- current strongest evidence:
+- current weakest evidence:
+- baseline handling decision:
+- response strategy:
+- manuscript_edit_mode:
+## Overall Response
+- strongest strengths recognized across reviewers:
+- overall revision strategy:
+- biggest concerns now addressed:
+- concerns still partially open:
+- claim-scope changes:
+- remaining limitation:
+## Strengths Recognized Across Reviewers
+- strength 1:
+- strength 2:
+- why these strengths still matter after revision:
+## Resolution Snapshot
+| Item ID | Status | What changed | Evidence basis | Manuscript delta |
+| --- | --- | --- | --- | --- |
+| R1-C1 |  |  |  |  |
+| R1-C2 |  |  |  |  |
 ## Reviewer 1
 ### Item R1-C1
@@ -20,9 +66,11 @@ We thank the reviewers for the careful reading and constructive feedback. Below
 - agree / partially_agree / clarify / respectful_disagree
-**Response**
+**Response Draft**
--
+Write 1 to 2 full paragraphs of rebuttal-ready prose here.
+The first 1 to 2 sentences should answer the concern directly.
+Then explain the evidence, manuscript rationale, and the exact clarification or wording that should appear in the revision.
 **What changed**
@@ -30,6 +78,7 @@ We thank the reviewers for the careful reading and constructive feedback. Below
 - evidence basis:
 - claim-scope effect:
 - remaining limitation:
+- latex-ready manuscript text:
 **If an experiment is still pending**
@@ -51,9 +100,9 @@ We thank the reviewers for the careful reading and constructive feedback. Below
 - agree / partially_agree / clarify / respectful_disagree
-**Response**
+**Response Draft**
--
+Write 1 to 2 full paragraphs of rebuttal-ready prose here.
 **What changed**
@@ -84,9 +133,9 @@ We thank the reviewers for the careful reading and constructive feedback. Below
 - agree / partially_agree / clarify / respectful_disagree
-**Response**
+**Response Draft**
--
+Write 1 to 2 full paragraphs of rebuttal-ready prose here.
 **What changed**
@@ -106,8 +155,3 @@ We thank the reviewers for the careful reading and constructive feedback. Below
 - what could not be fully addressed:
 - why:
 - how the manuscript now reflects that limitation:
-## Author placeholders
-- If a concrete number, setup detail, or result is still unknown, use `[[AUTHOR TO FILL]]`.
-- Do not fabricate missing details just to make the letter sound complete.