@researai/deepscientist 1.5.11 → 1.5.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. package/README.md +8 -8
  2. package/bin/ds.js +358 -61
  3. package/docs/en/00_QUICK_START.md +35 -3
  4. package/docs/en/01_SETTINGS_REFERENCE.md +11 -0
  5. package/docs/en/02_START_RESEARCH_GUIDE.md +68 -4
  6. package/docs/en/09_DOCTOR.md +28 -3
  7. package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +21 -2
  8. package/docs/en/15_CODEX_PROVIDER_SETUP.md +284 -0
  9. package/docs/en/README.md +4 -0
  10. package/docs/zh/00_QUICK_START.md +34 -2
  11. package/docs/zh/01_SETTINGS_REFERENCE.md +11 -0
  12. package/docs/zh/02_START_RESEARCH_GUIDE.md +69 -3
  13. package/docs/zh/09_DOCTOR.md +28 -1
  14. package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +21 -2
  15. package/docs/zh/15_CODEX_PROVIDER_SETUP.md +285 -0
  16. package/docs/zh/README.md +4 -1
  17. package/package.json +1 -1
  18. package/pyproject.toml +1 -1
  19. package/src/deepscientist/__init__.py +1 -1
  20. package/src/deepscientist/bash_exec/monitor.py +7 -5
  21. package/src/deepscientist/bash_exec/service.py +84 -21
  22. package/src/deepscientist/channels/local.py +3 -3
  23. package/src/deepscientist/channels/qq.py +7 -7
  24. package/src/deepscientist/channels/relay.py +7 -7
  25. package/src/deepscientist/channels/weixin_ilink.py +90 -19
  26. package/src/deepscientist/config/models.py +1 -0
  27. package/src/deepscientist/config/service.py +121 -20
  28. package/src/deepscientist/daemon/app.py +314 -6
  29. package/src/deepscientist/doctor.py +1 -5
  30. package/src/deepscientist/mcp/server.py +124 -3
  31. package/src/deepscientist/prompts/builder.py +113 -11
  32. package/src/deepscientist/quest/service.py +247 -31
  33. package/src/deepscientist/runners/codex.py +121 -22
  34. package/src/deepscientist/runners/runtime_overrides.py +6 -0
  35. package/src/deepscientist/shared.py +33 -14
  36. package/src/prompts/connectors/qq.md +2 -1
  37. package/src/prompts/connectors/weixin.md +2 -1
  38. package/src/prompts/contracts/shared_interaction.md +4 -1
  39. package/src/prompts/system.md +59 -9
  40. package/src/skills/analysis-campaign/SKILL.md +46 -6
  41. package/src/skills/analysis-campaign/references/campaign-plan-template.md +21 -8
  42. package/src/skills/baseline/SKILL.md +1 -1
  43. package/src/skills/decision/SKILL.md +1 -1
  44. package/src/skills/experiment/SKILL.md +1 -1
  45. package/src/skills/finalize/SKILL.md +1 -1
  46. package/src/skills/idea/SKILL.md +1 -1
  47. package/src/skills/intake-audit/SKILL.md +1 -1
  48. package/src/skills/rebuttal/SKILL.md +74 -1
  49. package/src/skills/rebuttal/references/response-letter-template.md +55 -11
  50. package/src/skills/review/SKILL.md +118 -1
  51. package/src/skills/review/references/experiment-todo-template.md +23 -0
  52. package/src/skills/review/references/review-report-template.md +16 -0
  53. package/src/skills/review/references/revision-log-template.md +4 -0
  54. package/src/skills/scout/SKILL.md +1 -1
  55. package/src/skills/write/SKILL.md +168 -7
  56. package/src/skills/write/references/paper-experiment-matrix-template.md +131 -0
  57. package/src/tui/package.json +1 -1
  58. package/src/ui/dist/assets/{AiManusChatView-D0mTXG4-.js → AiManusChatView-CnJcXynW.js} +12 -12
  59. package/src/ui/dist/assets/{AnalysisPlugin-Db0cTXxm.js → AnalysisPlugin-DeyzPEhV.js} +1 -1
  60. package/src/ui/dist/assets/{CliPlugin-DrV8je02.js → CliPlugin-CB1YODQn.js} +9 -9
  61. package/src/ui/dist/assets/{CodeEditorPlugin-QXMSCH71.js → CodeEditorPlugin-B-xicq1e.js} +8 -8
  62. package/src/ui/dist/assets/{CodeViewerPlugin-7hhtWj_E.js → CodeViewerPlugin-DT54ysXa.js} +5 -5
  63. package/src/ui/dist/assets/{DocViewerPlugin-BWMSnRJe.js → DocViewerPlugin-DQtKT-VD.js} +3 -3
  64. package/src/ui/dist/assets/{GitDiffViewerPlugin-7J9h9Vy_.js → GitDiffViewerPlugin-hqHbCfnv.js} +20 -20
  65. package/src/ui/dist/assets/{ImageViewerPlugin-CHJl_0lr.js → ImageViewerPlugin-OcVo33jV.js} +5 -5
  66. package/src/ui/dist/assets/{LabCopilotPanel-1qSow1es.js → LabCopilotPanel-DdGwhEUV.js} +11 -11
  67. package/src/ui/dist/assets/{LabPlugin-eQpPPCEp.js → LabPlugin-Ciz1gDaX.js} +2 -2
  68. package/src/ui/dist/assets/{LatexPlugin-BwRfi89Z.js → LatexPlugin-BhmjNQRC.js} +37 -11
  69. package/src/ui/dist/assets/{MarkdownViewerPlugin-836PVQWV.js → MarkdownViewerPlugin-BzdVH9Bx.js} +4 -4
  70. package/src/ui/dist/assets/{MarketplacePlugin-C2y_556i.js → MarketplacePlugin-DmyHspXt.js} +3 -3
  71. package/src/ui/dist/assets/{NotebookEditor-DIX7Mlzu.js → NotebookEditor-BMXKrDRk.js} +1 -1
  72. package/src/ui/dist/assets/{NotebookEditor-BRzJbGsn.js → NotebookEditor-BTVYRGkm.js} +11 -11
  73. package/src/ui/dist/assets/{PdfLoader-DzRaTAlq.js → PdfLoader-CvcjJHXv.js} +1 -1
  74. package/src/ui/dist/assets/{PdfMarkdownPlugin-DZUfIUnp.js → PdfMarkdownPlugin-DW2ej8Vk.js} +2 -2
  75. package/src/ui/dist/assets/{PdfViewerPlugin-BwtICzue.js → PdfViewerPlugin-CmlDxbhU.js} +10 -10
  76. package/src/ui/dist/assets/{SearchPlugin-DHeIAMsx.js → SearchPlugin-DAjQZPSv.js} +1 -1
  77. package/src/ui/dist/assets/{TextViewerPlugin-C3tCmFox.js → TextViewerPlugin-C-nVAZb_.js} +5 -5
  78. package/src/ui/dist/assets/{VNCViewer-CQsKVm3t.js → VNCViewer-D7-dIYon.js} +10 -10
  79. package/src/ui/dist/assets/{bot-BEA2vWuK.js → bot-C_G4WtNI.js} +1 -1
  80. package/src/ui/dist/assets/{code-XfbSR8K2.js → code-Cd7WfiWq.js} +1 -1
  81. package/src/ui/dist/assets/{file-content-BjxNaIfy.js → file-content-B57zsL9y.js} +1 -1
  82. package/src/ui/dist/assets/{file-diff-panel-D_lLVQk0.js → file-diff-panel-DVoheLFq.js} +1 -1
  83. package/src/ui/dist/assets/{file-socket-D9x_5vlY.js → file-socket-B5kXFxZP.js} +1 -1
  84. package/src/ui/dist/assets/{image-BhWT33W1.js → image-LLOjkMHF.js} +1 -1
  85. package/src/ui/dist/assets/{index-Dqj-Mjb4.css → index-BQG-1s2o.css} +40 -2
  86. package/src/ui/dist/assets/{index--c4iXtuy.js → index-C3r2iGrp.js} +12 -12
  87. package/src/ui/dist/assets/{index-DZTZ8mWP.js → index-CLQauncb.js} +911 -120
  88. package/src/ui/dist/assets/{index-PJbSbPTy.js → index-Dxa2eYMY.js} +1 -1
  89. package/src/ui/dist/assets/{index-BDxipwrC.js → index-hOUOWbW2.js} +2 -2
  90. package/src/ui/dist/assets/{monaco-K8izTGgo.js → monaco-BGGAEii3.js} +1 -1
  91. package/src/ui/dist/assets/{pdf-effect-queue-DfBors6y.js → pdf-effect-queue-DlEr1_y5.js} +1 -1
  92. package/src/ui/dist/assets/{popover-yFK1J4fL.js → popover-CWJbJuYY.js} +1 -1
  93. package/src/ui/dist/assets/{project-sync-PENr2zcz.js → project-sync-CRJiucYO.js} +18 -4
  94. package/src/ui/dist/assets/{select-CAbJDfYv.js → select-CoHB7pvH.js} +2 -2
  95. package/src/ui/dist/assets/{sigma-DEuYJqTl.js → sigma-D5aJWR8J.js} +1 -1
  96. package/src/ui/dist/assets/{square-check-big-omoSUmcd.js → square-check-big-DUK_mnkS.js} +1 -1
  97. package/src/ui/dist/assets/{trash--F119N47.js → trash-ChU3SEE3.js} +1 -1
  98. package/src/ui/dist/assets/{useCliAccess-D31UR23I.js → useCliAccess-BrJBV3tY.js} +1 -1
  99. package/src/ui/dist/assets/{useFileDiffOverlay-BH6KcMzq.js → useFileDiffOverlay-C2OQaVWc.js} +1 -1
  100. package/src/ui/dist/assets/{wrap-text-CZ613PM5.js → wrap-text-C7Qqh-om.js} +1 -1
  101. package/src/ui/dist/assets/{zoom-out-BgDLAv3z.js → zoom-out-rtX0FKya.js} +1 -1
  102. package/src/ui/dist/index.html +2 -2
@@ -19,6 +19,11 @@ from ..shared import append_jsonl, ensure_dir, generate_id, read_yaml, resolve_r
19
19
  from ..web_search import extract_web_search_payload
20
20
  from .base import RunRequest, RunResult
21
21
 
22
+ _TOOL_EVENT_ARGS_TEXT_LIMIT = 8_000
23
+ _TOOL_EVENT_OUTPUT_TEXT_LIMIT = 16_000
24
+ _MAX_QUEST_EVENT_JSON_BYTES = 2_000_000
25
+ _OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT = 12_000
26
+
22
27
 
23
28
  def _compact_text(value: object, *, limit: int = 1200) -> str:
24
29
  if value is None:
@@ -35,15 +40,94 @@ def _compact_text(value: object, *, limit: int = 1200) -> str:
35
40
  return text[: limit - 1].rstrip() + "…"
36
41
 
37
42
 
38
- def _structured_text(value: object) -> str:
43
+ def _truncate_leaf_text(text: str, *, limit: int) -> str:
44
+ if limit <= 0 or len(text) <= limit:
45
+ return text
46
+ head = max(int(limit * 0.7), 256)
47
+ tail = max(limit - head - 64, 128)
48
+ omitted = max(len(text) - head - tail, 0)
49
+ return f"{text[:head].rstrip()}\n...[truncated {omitted} chars]...\n{text[-tail:].lstrip()}"
50
+
51
+
52
+ def _truncate_structured_value(value: object, *, string_limit: int) -> object:
53
+ if isinstance(value, str):
54
+ return _truncate_leaf_text(value.strip(), limit=string_limit)
55
+ if isinstance(value, list):
56
+ return [_truncate_structured_value(item, string_limit=string_limit) for item in value[:200]]
57
+ if isinstance(value, dict):
58
+ truncated: dict[object, object] = {}
59
+ for index, (key, item) in enumerate(value.items()):
60
+ if index >= 200:
61
+ truncated["__truncated__"] = f"truncated remaining {len(value) - 200} item(s)"
62
+ break
63
+ truncated[key] = _truncate_structured_value(item, string_limit=string_limit)
64
+ return truncated
65
+ return value
66
+
67
+
68
+ def _structured_text(value: object, *, limit: int | None = None) -> str:
39
69
  if value is None:
40
70
  return ""
41
71
  if isinstance(value, str):
42
- return value.strip()
72
+ return _truncate_leaf_text(value.strip(), limit=limit or len(value))
73
+ normalized_value = _truncate_structured_value(value, string_limit=max(limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT, 512))
43
74
  try:
44
- return json.dumps(value, ensure_ascii=False, indent=2)
75
+ return json.dumps(normalized_value, ensure_ascii=False, indent=2)
45
76
  except TypeError:
46
- return str(value)
77
+ return _truncate_leaf_text(str(value), limit=limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT)
78
+
79
+
80
+ def _encoded_json_size(value: object) -> int:
81
+ try:
82
+ return len(json.dumps(value, ensure_ascii=False).encode("utf-8"))
83
+ except Exception:
84
+ return len(str(value).encode("utf-8", errors="ignore"))
85
+
86
+
87
+ def _compact_tool_event_payload(payload: dict[str, Any]) -> dict[str, Any]:
88
+ if _encoded_json_size(payload) <= _MAX_QUEST_EVENT_JSON_BYTES:
89
+ return payload
90
+
91
+ compacted = dict(payload)
92
+ output_text = str(compacted.get("output") or "")
93
+ if output_text:
94
+ compacted["output_bytes"] = len(output_text.encode("utf-8", errors="ignore"))
95
+ compacted["output"] = _truncate_leaf_text(
96
+ output_text,
97
+ limit=_OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT,
98
+ )
99
+ compacted["output_truncated"] = True
100
+ args_text = str(compacted.get("args") or "")
101
+ if args_text and _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
102
+ compacted["args"] = _truncate_leaf_text(args_text, limit=4_000)
103
+ compacted["args_truncated"] = True
104
+ if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
105
+ metadata = compacted.get("metadata")
106
+ if isinstance(metadata, dict):
107
+ allowed_keys = {
108
+ "mcp_server",
109
+ "mcp_tool",
110
+ "bash_id",
111
+ "status",
112
+ "command",
113
+ "workdir",
114
+ "cwd",
115
+ "started_at",
116
+ "finished_at",
117
+ "exit_code",
118
+ "stop_reason",
119
+ "log_path",
120
+ }
121
+ compacted["metadata"] = {
122
+ key: metadata.get(key)
123
+ for key in allowed_keys
124
+ if key in metadata
125
+ }
126
+ compacted["metadata_truncated"] = True
127
+ if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
128
+ compacted["output"] = _compact_text(compacted.get("output"), limit=2_000)
129
+ compacted["output_truncated"] = True
130
+ return compacted
47
131
 
48
132
 
49
133
  def _iter_event_texts(event: dict[str, Any]) -> list[str]:
@@ -209,7 +293,7 @@ def _tool_args(event: dict[str, Any], item: dict[str, Any]) -> str:
209
293
  item.get("input"),
210
294
  event.get("input"),
211
295
  ):
212
- text = _structured_text(value)
296
+ text = _structured_text(value, limit=_TOOL_EVENT_ARGS_TEXT_LIMIT)
213
297
  if text:
214
298
  return text
215
299
  return ""
@@ -243,7 +327,7 @@ def _tool_output(event: dict[str, Any], item: dict[str, Any]) -> str:
243
327
  item.get("aggregated_output"),
244
328
  event.get("aggregated_output"),
245
329
  ):
246
- text = _structured_text(value)
330
+ text = _structured_text(value, limit=_TOOL_EVENT_OUTPUT_TEXT_LIMIT)
247
331
  if text:
248
332
  return text
249
333
  return ""
@@ -361,7 +445,7 @@ def _tool_event(
361
445
  "raw_event_type": event_type,
362
446
  "created_at": created_at,
363
447
  }
364
- return {
448
+ return _compact_tool_event_payload({
365
449
  "event_id": generate_id("evt"),
366
450
  "type": "runner.tool_result",
367
451
  "quest_id": quest_id,
@@ -375,7 +459,7 @@ def _tool_event(
375
459
  "output": _tool_output(event, item),
376
460
  "raw_event_type": event_type,
377
461
  "created_at": created_at,
378
- }
462
+ })
379
463
 
380
464
  if item_type == "web_search":
381
465
  tool_call_id = _tool_call_id(event, item)
@@ -399,7 +483,7 @@ def _tool_event(
399
483
  "raw_event_type": event_type,
400
484
  "created_at": created_at,
401
485
  }
402
- return {
486
+ return _compact_tool_event_payload({
403
487
  "event_id": generate_id("evt"),
404
488
  "type": "runner.tool_result",
405
489
  "quest_id": quest_id,
@@ -414,13 +498,13 @@ def _tool_event(
414
498
  "metadata": metadata,
415
499
  "raw_event_type": event_type,
416
500
  "created_at": created_at,
417
- }
501
+ })
418
502
 
419
503
  if item_type == "file_change":
420
504
  tool_call_id = _tool_call_id(event, item)
421
505
  tool_name = "file_change"
422
506
  known_tool_names[tool_call_id] = tool_name
423
- return {
507
+ return _compact_tool_event_payload({
424
508
  "event_id": generate_id("evt"),
425
509
  "type": "runner.tool_result",
426
510
  "quest_id": quest_id,
@@ -433,7 +517,7 @@ def _tool_event(
433
517
  "output": _tool_output(event, item),
434
518
  "raw_event_type": event_type,
435
519
  "created_at": created_at,
436
- }
520
+ })
437
521
 
438
522
  if item_type == "mcp_tool_call":
439
523
  tool_call_id = _tool_call_id(event, item)
@@ -466,7 +550,7 @@ def _tool_event(
466
550
  "raw_event_type": event_type,
467
551
  "created_at": created_at,
468
552
  }
469
- return {
553
+ return _compact_tool_event_payload({
470
554
  "event_id": generate_id("evt"),
471
555
  "type": "runner.tool_result",
472
556
  "quest_id": quest_id,
@@ -483,7 +567,7 @@ def _tool_event(
483
567
  "metadata": metadata,
484
568
  "raw_event_type": event_type,
485
569
  "created_at": created_at,
486
- }
570
+ })
487
571
 
488
572
  if item_type in {"function_call", "custom_tool_call", "tool_call"} or "function_call" in event_type or "tool_call" in event_type:
489
573
  tool_call_id = _tool_call_id(event, item)
@@ -507,7 +591,7 @@ def _tool_event(
507
591
  if item_type in {"function_call_output", "custom_tool_call_output", "tool_result", "tool_call_output"} or "function_call_output" in event_type or "tool_result" in event_type:
508
592
  tool_call_id = _tool_call_id(event, item)
509
593
  tool_name = known_tool_names.get(tool_call_id) or _tool_name(event, item)
510
- return {
594
+ return _compact_tool_event_payload({
511
595
  "event_id": generate_id("evt"),
512
596
  "type": "runner.tool_result",
513
597
  "quest_id": quest_id,
@@ -521,7 +605,7 @@ def _tool_event(
521
605
  "output": _tool_output(event, item),
522
606
  "raw_event_type": event_type,
523
607
  "created_at": created_at,
524
- }
608
+ })
525
609
 
526
610
  return None
527
611
 
@@ -582,6 +666,12 @@ class CodexRunner:
582
666
  )
583
667
 
584
668
  env = dict(**os.environ)
669
+ runner_env = runner_config.get("env") if isinstance(runner_config.get("env"), dict) else {}
670
+ for key, value in runner_env.items():
671
+ env_key = str(key or "").strip()
672
+ if not env_key or value is None:
673
+ continue
674
+ env[env_key] = str(value)
585
675
  env["CODEX_HOME"] = str(codex_home)
586
676
  env["DEEPSCIENTIST_HOME"] = str(self.home)
587
677
  env["DS_HOME"] = str(self.home)
@@ -809,16 +899,23 @@ class CodexRunner:
809
899
  workspace_root = request.worktree_root or request.quest_root
810
900
  resolved_binary = resolve_runner_binary(self.binary, runner_name="codex")
811
901
  resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
902
+ profile = str(resolved_runner_config.get("profile") or "").strip()
812
903
  normalized_model = str(request.model or "").strip()
813
904
  command = [
814
905
  resolved_binary or self.binary,
815
906
  "--search",
816
- "exec",
817
- "--json",
818
- "--cd",
819
- str(workspace_root),
820
- "--skip-git-repo-check",
821
907
  ]
908
+ if profile:
909
+ command.extend(["--profile", profile])
910
+ command.extend(
911
+ [
912
+ "exec",
913
+ "--json",
914
+ "--cd",
915
+ str(workspace_root),
916
+ "--skip-git-repo-check",
917
+ ]
918
+ )
822
919
  if normalized_model.lower() not in {"", "inherit", "default", "codex-default"}:
823
920
  command.extend(["--model", normalized_model])
824
921
  if request.approval_policy:
@@ -846,7 +943,9 @@ class CodexRunner:
846
943
  runner_config: dict[str, Any] | None = None,
847
944
  ) -> Path:
848
945
  target = ensure_dir(workspace_root / ".codex")
849
- source = Path(os.environ.get("CODEX_HOME", str(Path.home() / ".codex"))).expanduser()
946
+ resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
947
+ configured_home = str(resolved_runner_config.get("config_dir") or os.environ.get("CODEX_HOME") or str(Path.home() / ".codex"))
948
+ source = Path(configured_home).expanduser()
850
949
  for filename in ("config.toml", "auth.json"):
851
950
  source_path = source / filename
852
951
  target_path = target / filename
@@ -20,6 +20,8 @@ def _as_bool_env(name: str) -> bool:
20
20
  def codex_runtime_overrides() -> dict[str, str]:
21
21
  approval_policy = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_APPROVAL_POLICY"))
22
22
  sandbox_mode = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_SANDBOX_MODE"))
23
+ profile = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_PROFILE"))
24
+ model = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_MODEL"))
23
25
 
24
26
  if _as_bool_env("DEEPSCIENTIST_CODEX_YOLO"):
25
27
  approval_policy = approval_policy or "never"
@@ -30,6 +32,10 @@ def codex_runtime_overrides() -> dict[str, str]:
30
32
  overrides["approval_policy"] = approval_policy
31
33
  if sandbox_mode:
32
34
  overrides["sandbox_mode"] = sandbox_mode
35
+ if profile:
36
+ overrides["profile"] = profile
37
+ if model:
38
+ overrides["model"] = model
33
39
  return overrides
34
40
 
35
41
 
@@ -1,5 +1,6 @@
1
1
  from __future__ import annotations
2
2
 
3
+ from collections import deque
3
4
  import hashlib
4
5
  import json
5
6
  import os
@@ -9,7 +10,7 @@ import subprocess
9
10
  import sys
10
11
  from datetime import UTC, datetime
11
12
  from pathlib import Path
12
- from typing import Any
13
+ from typing import Any, Iterator
13
14
  from uuid import uuid4
14
15
 
15
16
  try:
@@ -90,21 +91,39 @@ def append_jsonl(path: Path, payload: dict[str, Any]) -> None:
90
91
  handle.write(json.dumps(payload, ensure_ascii=False) + "\n")
91
92
 
92
93
 
93
- def read_jsonl(path: Path) -> list[dict[str, Any]]:
94
+ def iter_jsonl(path: Path | str) -> Iterator[dict[str, Any]]:
95
+ path = Path(path)
94
96
  if not path.exists():
97
+ return
98
+ with path.open("r", encoding="utf-8") as handle:
99
+ for raw_line in handle:
100
+ line = raw_line.strip()
101
+ if not line:
102
+ continue
103
+ try:
104
+ payload = json.loads(line)
105
+ except json.JSONDecodeError:
106
+ continue
107
+ if isinstance(payload, dict):
108
+ yield payload
109
+
110
+
111
+ def read_jsonl(path: Path) -> list[dict[str, Any]]:
112
+ return list(iter_jsonl(path))
113
+
114
+
115
+ def count_jsonl(path: Path | str) -> int:
116
+ return sum(1 for _ in iter_jsonl(path))
117
+
118
+
119
+ def read_jsonl_tail(path: Path | str, limit: int) -> list[dict[str, Any]]:
120
+ normalized_limit = max(int(limit or 0), 0)
121
+ if normalized_limit <= 0:
95
122
  return []
96
- items: list[dict[str, Any]] = []
97
- for line in path.read_text(encoding="utf-8").splitlines():
98
- line = line.strip()
99
- if not line:
100
- continue
101
- try:
102
- payload = json.loads(line)
103
- except json.JSONDecodeError:
104
- continue
105
- if isinstance(payload, dict):
106
- items.append(payload)
107
- return items
123
+ items: deque[dict[str, Any]] = deque(maxlen=normalized_limit)
124
+ for payload in iter_jsonl(path):
125
+ items.append(payload)
126
+ return list(items)
108
127
 
109
128
 
110
129
  def read_yaml(path: Path, default: Any = None) -> Any:
@@ -10,7 +10,8 @@
10
10
  - qq_summary_first_rule: start with the conclusion the user cares about, then what it means, then the next action
11
11
  - qq_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
12
12
  - qq_eta_rule: for baseline reproduction, main experiments, analysis experiments, and other important long-running research phases, include a rough ETA for the next meaningful result or the next update; if uncertain, say that and still give the next check-in window
13
- - qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly 10 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible checkpoint
13
+ - qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
14
+ - qq_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short QQ-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
14
15
  - qq_internal_detail_rule: omit worker names, heartbeat timestamps, retry counters, pending/running/completed counts, file names, and monitor-window narration unless the user asked for them or the detail changes the recommended action
15
16
  - qq_translation_rule: convert internal execution and file-management work into user value, such as saying the baseline record is now organized for easier later comparison instead of listing touched files
16
17
  - qq_preflight_rule: before sending a QQ progress update, rewrite it if it still sounds like a monitoring log, execution diary, or file inventory
@@ -10,7 +10,8 @@
10
10
  - weixin_summary_first_rule: start with the user-facing conclusion, then what it means, then the next action
11
11
  - weixin_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
12
12
  - weixin_eta_rule: for important long-running phases such as baseline reproduction, main experiments, analysis, or paper packaging, include a rough ETA or next check-in window when you can
13
- - weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 10 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible checkpoint
13
+ - weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
14
+ - weixin_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short Weixin-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
14
15
  - weixin_internal_detail_rule: omit worker names, retry counters, pending/running/completed counts, low-level file listings, and monitor-window narration unless the user explicitly asked for them or they change the recommended action
15
16
  - weixin_translation_rule: translate internal execution and file-management work into user value instead of narrating tool or filesystem churn
16
17
  - weixin_preflight_rule: before sending a Weixin-facing progress update, rewrite it if it still reads like a monitor log, execution diary, or file inventory
@@ -7,7 +7,10 @@ This shared contract is injected once per turn and applies across the stage and
7
7
  - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
8
8
  - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the current stage or companion-skill task.
9
9
  - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
10
- - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 10 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
10
+ - Stage-kickoff rule: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work.
11
+ - Reading/planning keepalive rule: if you spend 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet.
12
+ - Subtask-boundary rule: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal.
13
+ - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 6 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
11
14
  - Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
12
15
  - Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
13
16
  - Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
@@ -53,7 +53,7 @@ Your job is to keep a research quest moving forward in a durable, auditable, evi
53
53
  - for ordinary progress replies, usually stay within 2 to 4 short sentences or 3 short bullets at most
54
54
  - start with the conclusion the user cares about, then what it means, then the next action
55
55
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running research phases, also tell the user roughly how long until the next meaningful result, next step, or next update
56
- - for ordinary active multi-step work, prefer a concise update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 20 tool calls or about 15 minutes of active foreground work without a user-visible update unless a real milestone is imminent
56
+ - for ordinary active multi-step work, prefer a concise update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 12 tool calls or about 8 minutes of active foreground work without a user-visible update unless a real milestone is imminent
57
57
  - do not spam internal tool chatter, raw diffs, or every small checkpoint
58
58
  - do not proactively enumerate file paths, file inventories, or low-level file details unless the user explicitly asks
59
59
  - do not proactively expose worker names, heartbeat timestamps, retry counters, pending/running/completed counts, or monitor-window narration unless that detail changes the recommended action or is required for honesty about risk
@@ -203,7 +203,7 @@ When you send user-facing updates (especially via `artifact.interact(...)`), wri
203
203
  - what task you are currently working on
204
204
  - what the main difficulty, risk, or latest real progress is
205
205
  - what concrete next step or mitigation you will take
206
- - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without any user-visible checkpoint
206
+ - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without any user-visible checkpoint
207
207
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running phases, also make the timing expectation explicit:
208
208
  - roughly how long until the next meaningful result, next milestone, or next update, usually within a 10 to 30 minute window
209
209
  - if runtime is uncertain, say that directly and give the next check-in window instead of pretending to know an exact ETA
@@ -463,9 +463,12 @@ Each milestone update should usually state:
463
463
  Cadence defaults for ordinary active work:
464
464
 
465
465
  - treat `artifact.interact(...)` as the default user-visible heartbeat rather than an optional extra
466
- - soft trigger: after about 10 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
467
- - hard trigger: do not exceed about 20 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
468
- - time trigger: do not exceed about 15 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
466
+ - stage-kickoff trigger: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work
467
+ - reading/planning trigger: if you spend about 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet
468
+ - boundary trigger: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal
469
+ - soft trigger: after about 6 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
470
+ - hard trigger: do not exceed about 12 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
471
+ - time trigger: do not exceed about 8 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
469
472
  - immediate trigger: send a user-visible update as soon as a real blocker, recovery, route change, branch/worktree switch, baseline gate change, selected idea, recorded main experiment, or user-priority interruption becomes clear
470
473
  - de-duplication rule: do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a real milestone, blocker, route change, or new user message makes that extra update genuinely useful
471
474
  - keep ordinary subtask completions short; reserve richer milestone reports for stage-significant deliverables and route-changing checkpoints instead of narrating every small setup step
@@ -1080,9 +1083,10 @@ For `artifact.interact(...)` specifically:
1080
1083
  - raw logs
1081
1084
  - internal tool names
1082
1085
  - mention those details only if the user asked for them or needs them to act on the message
1083
- - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without a user-visible update
1086
+ - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without a user-visible update
1084
1087
  - during long active execution, after the first meaningful signal from long-running work, keep the user informed and never let active user-relevant work go more than 30 minutes without a real progress inspection and, if still running, a user-visible keepalive
1085
- - do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1088
+ - if the active work is still mostly reading, comparison, synthesis, or planning, do not hide behind "no result yet"; send a short user-visible checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
1089
+ - do not send another ordinary progress update within about 2 additional tool calls or about 60 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1086
1090
  - each ordinary progress update should usually answer only:
1087
1091
  - what changed
1088
1092
  - what it means now
@@ -1321,7 +1325,7 @@ If the field is absent, default to `freeform`.
1321
1325
  When `launch_mode = custom`:
1322
1326
 
1323
1327
  - do not force the quest back into the canonical full-research path if the custom brief is narrower
1324
- - treat `entry_state_summary`, `review_summary`, and `custom_brief` as real startup context rather than decorative metadata
1328
+ - treat `entry_state_summary`, `review_summary`, `review_materials`, and `custom_brief` as real startup context rather than decorative metadata
1325
1329
  - if the quest clearly starts from existing baseline / result / draft state, open `intake-audit` before restarting baseline discovery or fresh experimentation
1326
1330
  - if the quest clearly starts from reviewer comments, a revision request, or a rebuttal packet, open `rebuttal` before ordinary `write`
1327
1331
  - after the custom entry skill stabilizes the route, continue through the normal stage skills as needed
@@ -1331,12 +1335,58 @@ When `custom_profile = continue_existing_state`:
1331
1335
  - assume the quest may already contain reusable baselines, measured results, analysis assets, or writing assets
1332
1336
  - audit and trust-rank those assets first instead of reflexively rerunning everything
1333
1337
 
1338
+ When `custom_profile = review_audit`:
1339
+
1340
+ - assume the active contract is a substantial draft or paper package that needs an independent skeptical audit
1341
+ - open `review` before more writing or finalization
1342
+ - if the audit finds real gaps, route to the needed downstream skill instead of polishing blindly
1343
+
1344
+ When `startup_contract.review_followup_policy = auto_execute_followups`:
1345
+
1346
+ - after review artifacts are durable, continue automatically into the required experiments, manuscript deltas, and review-closure work
1347
+ - do not stop at the audit report if the route is already clear
1348
+
1349
+ When `startup_contract.review_followup_policy = user_gated_followups`:
1350
+
1351
+ - finish the review artifacts first
1352
+ - then raise one structured decision before expensive experiments or manuscript revisions continue
1353
+
1354
+ When `startup_contract.review_followup_policy = audit_only`:
1355
+
1356
+ - stop after the durable audit artifacts and route recommendation unless the user later asks for execution follow-up
1357
+
1334
1358
  When `custom_profile = revision_rebuttal`:
1335
1359
 
1336
1360
  - assume the active contract is a paper-review workflow rather than a blank research loop
1337
1361
  - preserve the existing paper, results, and reviewer package as the starting state
1338
1362
  - route supplementary experiments through `analysis-campaign` and manuscript deltas through `write`, but let `rebuttal` orchestrate that mapping
1339
1363
 
1364
+ When `startup_contract.baseline_execution_policy = must_reproduce_or_verify`:
1365
+
1366
+ - explicitly verify or recover the rebuttal-critical baseline or comparator before reviewer-linked follow-up work
1367
+
1368
+ When `startup_contract.baseline_execution_policy = reuse_existing_only`:
1369
+
1370
+ - trust the current confirmed baseline/results unless you find concrete inconsistency, corruption, or missing-evidence problems
1371
+
1372
+ When `startup_contract.baseline_execution_policy = skip_unless_blocking`:
1373
+
1374
+ - do not spend time rerunning baselines by default
1375
+ - only open `baseline` if a named review/rebuttal issue truly depends on a missing comparator or unusable prior evidence
1376
+
1377
+ When `startup_contract.manuscript_edit_mode = latex_required`:
1378
+
1379
+ - if manuscript revision is required, treat the provided LaTeX tree or `paper/latex/` as the writing surface
1380
+ - if LaTeX source is unavailable, do not pretend the manuscript was edited; produce LaTeX-ready replacement text and state the blocker explicitly
1381
+
1382
+ When `startup_contract.manuscript_edit_mode = copy_ready_text`:
1383
+
1384
+ - provide section-level copy-ready replacement text and explicit deltas when manuscript revision is required
1385
+
1386
+ When `startup_contract.manuscript_edit_mode = none`:
1387
+
1388
+ - revision planning artifacts are sufficient unless the user later broadens scope
1389
+
1340
1390
  When `custom_profile = freeform`:
1341
1391
 
1342
1392
  - treat the custom brief as the primary scope contract
@@ -2078,7 +2128,7 @@ When summarizing long logs, campaigns, or multi-agent work:
2078
2128
  - the estimated next reply time (usually the next sleep interval you are about to use)
2079
2129
  - If the run still looks healthy but there is no human-meaningful delta yet, continue monitoring silently instead of sending a no-change keepalive just because a sleep finished.
2080
2130
  - For baseline reproduction, main experiments, analysis experiments, and similar user-relevant long runs, translate that monitoring ETA into user-facing language such as how long until the next meaningful result or the next expected update.
2081
- - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 20 tool calls or about 15 minutes without a user-visible checkpoint.
2131
+ - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 12 tool calls or about 8 minutes without a user-visible checkpoint.
2082
2132
  - If you forget a bash id, do not guess. Use `bash_exec(mode='history')` or `bash_exec(mode='list')` and recover it from the reverse-chronological session list.
2083
2133
  - If the long-running command or wrapper code can emit structured progress markers, prefer a concise `__DS_PROGRESS__ { ... }` JSON line with fields such as:
2084
2134
  - `current`