@researai/deepscientist 1.5.11 → 1.5.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. package/README.md +8 -8
  2. package/bin/ds.js +375 -61
  3. package/docs/en/00_QUICK_START.md +55 -4
  4. package/docs/en/01_SETTINGS_REFERENCE.md +15 -0
  5. package/docs/en/02_START_RESEARCH_GUIDE.md +68 -4
  6. package/docs/en/09_DOCTOR.md +48 -4
  7. package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +21 -2
  8. package/docs/en/15_CODEX_PROVIDER_SETUP.md +382 -0
  9. package/docs/en/README.md +4 -0
  10. package/docs/zh/00_QUICK_START.md +54 -3
  11. package/docs/zh/01_SETTINGS_REFERENCE.md +15 -0
  12. package/docs/zh/02_START_RESEARCH_GUIDE.md +69 -3
  13. package/docs/zh/09_DOCTOR.md +48 -2
  14. package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +21 -2
  15. package/docs/zh/15_CODEX_PROVIDER_SETUP.md +383 -0
  16. package/docs/zh/README.md +4 -1
  17. package/package.json +2 -1
  18. package/pyproject.toml +1 -1
  19. package/src/deepscientist/__init__.py +1 -1
  20. package/src/deepscientist/bash_exec/monitor.py +7 -5
  21. package/src/deepscientist/bash_exec/service.py +84 -21
  22. package/src/deepscientist/channels/local.py +3 -3
  23. package/src/deepscientist/channels/qq.py +7 -7
  24. package/src/deepscientist/channels/relay.py +7 -7
  25. package/src/deepscientist/channels/weixin_ilink.py +90 -19
  26. package/src/deepscientist/cli.py +3 -0
  27. package/src/deepscientist/codex_cli_compat.py +117 -0
  28. package/src/deepscientist/config/models.py +1 -0
  29. package/src/deepscientist/config/service.py +173 -25
  30. package/src/deepscientist/daemon/app.py +314 -6
  31. package/src/deepscientist/doctor.py +1 -5
  32. package/src/deepscientist/mcp/server.py +124 -3
  33. package/src/deepscientist/prompts/builder.py +113 -11
  34. package/src/deepscientist/quest/service.py +247 -31
  35. package/src/deepscientist/runners/codex.py +132 -24
  36. package/src/deepscientist/runners/runtime_overrides.py +9 -0
  37. package/src/deepscientist/shared.py +33 -14
  38. package/src/prompts/connectors/qq.md +2 -1
  39. package/src/prompts/connectors/weixin.md +2 -1
  40. package/src/prompts/contracts/shared_interaction.md +4 -1
  41. package/src/prompts/system.md +59 -9
  42. package/src/skills/analysis-campaign/SKILL.md +46 -6
  43. package/src/skills/analysis-campaign/references/campaign-plan-template.md +21 -8
  44. package/src/skills/baseline/SKILL.md +1 -1
  45. package/src/skills/baseline/references/artifact-payload-examples.md +39 -0
  46. package/src/skills/decision/SKILL.md +1 -1
  47. package/src/skills/experiment/SKILL.md +1 -1
  48. package/src/skills/finalize/SKILL.md +1 -1
  49. package/src/skills/idea/SKILL.md +1 -1
  50. package/src/skills/intake-audit/SKILL.md +1 -1
  51. package/src/skills/rebuttal/SKILL.md +74 -1
  52. package/src/skills/rebuttal/references/response-letter-template.md +55 -11
  53. package/src/skills/review/SKILL.md +118 -1
  54. package/src/skills/review/references/experiment-todo-template.md +23 -0
  55. package/src/skills/review/references/review-report-template.md +16 -0
  56. package/src/skills/review/references/revision-log-template.md +4 -0
  57. package/src/skills/scout/SKILL.md +1 -1
  58. package/src/skills/write/SKILL.md +168 -7
  59. package/src/skills/write/references/paper-experiment-matrix-template.md +131 -0
  60. package/src/tui/dist/lib/connectorConfig.js +90 -0
  61. package/src/tui/dist/lib/qr.js +21 -0
  62. package/src/tui/package.json +2 -1
  63. package/src/ui/dist/assets/{AiManusChatView-D0mTXG4-.js → AiManusChatView-CnJcXynW.js} +12 -12
  64. package/src/ui/dist/assets/{AnalysisPlugin-Db0cTXxm.js → AnalysisPlugin-DeyzPEhV.js} +1 -1
  65. package/src/ui/dist/assets/{CliPlugin-DrV8je02.js → CliPlugin-CB1YODQn.js} +9 -9
  66. package/src/ui/dist/assets/{CodeEditorPlugin-QXMSCH71.js → CodeEditorPlugin-B-xicq1e.js} +8 -8
  67. package/src/ui/dist/assets/{CodeViewerPlugin-7hhtWj_E.js → CodeViewerPlugin-DT54ysXa.js} +5 -5
  68. package/src/ui/dist/assets/{DocViewerPlugin-BWMSnRJe.js → DocViewerPlugin-DQtKT-VD.js} +3 -3
  69. package/src/ui/dist/assets/{GitDiffViewerPlugin-7J9h9Vy_.js → GitDiffViewerPlugin-hqHbCfnv.js} +20 -20
  70. package/src/ui/dist/assets/{ImageViewerPlugin-CHJl_0lr.js → ImageViewerPlugin-OcVo33jV.js} +5 -5
  71. package/src/ui/dist/assets/{LabCopilotPanel-1qSow1es.js → LabCopilotPanel-DdGwhEUV.js} +11 -11
  72. package/src/ui/dist/assets/{LabPlugin-eQpPPCEp.js → LabPlugin-Ciz1gDaX.js} +2 -2
  73. package/src/ui/dist/assets/{LatexPlugin-BwRfi89Z.js → LatexPlugin-BhmjNQRC.js} +37 -11
  74. package/src/ui/dist/assets/{MarkdownViewerPlugin-836PVQWV.js → MarkdownViewerPlugin-BzdVH9Bx.js} +4 -4
  75. package/src/ui/dist/assets/{MarketplacePlugin-C2y_556i.js → MarketplacePlugin-DmyHspXt.js} +3 -3
  76. package/src/ui/dist/assets/{NotebookEditor-DIX7Mlzu.js → NotebookEditor-BMXKrDRk.js} +1 -1
  77. package/src/ui/dist/assets/{NotebookEditor-BRzJbGsn.js → NotebookEditor-BTVYRGkm.js} +11 -11
  78. package/src/ui/dist/assets/{PdfLoader-DzRaTAlq.js → PdfLoader-CvcjJHXv.js} +1 -1
  79. package/src/ui/dist/assets/{PdfMarkdownPlugin-DZUfIUnp.js → PdfMarkdownPlugin-DW2ej8Vk.js} +2 -2
  80. package/src/ui/dist/assets/{PdfViewerPlugin-BwtICzue.js → PdfViewerPlugin-CmlDxbhU.js} +10 -10
  81. package/src/ui/dist/assets/{SearchPlugin-DHeIAMsx.js → SearchPlugin-DAjQZPSv.js} +1 -1
  82. package/src/ui/dist/assets/{TextViewerPlugin-C3tCmFox.js → TextViewerPlugin-C-nVAZb_.js} +5 -5
  83. package/src/ui/dist/assets/{VNCViewer-CQsKVm3t.js → VNCViewer-D7-dIYon.js} +10 -10
  84. package/src/ui/dist/assets/{bot-BEA2vWuK.js → bot-C_G4WtNI.js} +1 -1
  85. package/src/ui/dist/assets/{code-XfbSR8K2.js → code-Cd7WfiWq.js} +1 -1
  86. package/src/ui/dist/assets/{file-content-BjxNaIfy.js → file-content-B57zsL9y.js} +1 -1
  87. package/src/ui/dist/assets/{file-diff-panel-D_lLVQk0.js → file-diff-panel-DVoheLFq.js} +1 -1
  88. package/src/ui/dist/assets/{file-socket-D9x_5vlY.js → file-socket-B5kXFxZP.js} +1 -1
  89. package/src/ui/dist/assets/{image-BhWT33W1.js → image-LLOjkMHF.js} +1 -1
  90. package/src/ui/dist/assets/{index-Dqj-Mjb4.css → index-BQG-1s2o.css} +40 -2
  91. package/src/ui/dist/assets/{index--c4iXtuy.js → index-C3r2iGrp.js} +12 -12
  92. package/src/ui/dist/assets/{index-DZTZ8mWP.js → index-CLQauncb.js} +911 -120
  93. package/src/ui/dist/assets/{index-PJbSbPTy.js → index-Dxa2eYMY.js} +1 -1
  94. package/src/ui/dist/assets/{index-BDxipwrC.js → index-hOUOWbW2.js} +2 -2
  95. package/src/ui/dist/assets/{monaco-K8izTGgo.js → monaco-BGGAEii3.js} +1 -1
  96. package/src/ui/dist/assets/{pdf-effect-queue-DfBors6y.js → pdf-effect-queue-DlEr1_y5.js} +1 -1
  97. package/src/ui/dist/assets/{popover-yFK1J4fL.js → popover-CWJbJuYY.js} +1 -1
  98. package/src/ui/dist/assets/{project-sync-PENr2zcz.js → project-sync-CRJiucYO.js} +18 -4
  99. package/src/ui/dist/assets/{select-CAbJDfYv.js → select-CoHB7pvH.js} +2 -2
  100. package/src/ui/dist/assets/{sigma-DEuYJqTl.js → sigma-D5aJWR8J.js} +1 -1
  101. package/src/ui/dist/assets/{square-check-big-omoSUmcd.js → square-check-big-DUK_mnkS.js} +1 -1
  102. package/src/ui/dist/assets/{trash--F119N47.js → trash-ChU3SEE3.js} +1 -1
  103. package/src/ui/dist/assets/{useCliAccess-D31UR23I.js → useCliAccess-BrJBV3tY.js} +1 -1
  104. package/src/ui/dist/assets/{useFileDiffOverlay-BH6KcMzq.js → useFileDiffOverlay-C2OQaVWc.js} +1 -1
  105. package/src/ui/dist/assets/{wrap-text-CZ613PM5.js → wrap-text-C7Qqh-om.js} +1 -1
  106. package/src/ui/dist/assets/{zoom-out-BgDLAv3z.js → zoom-out-rtX0FKya.js} +1 -1
  107. package/src/ui/dist/index.html +2 -2
@@ -11,14 +11,20 @@ from pathlib import Path
11
11
  from typing import Any
12
12
 
13
13
  from ..artifact import ArtifactService
14
+ from ..codex_cli_compat import adapt_profile_only_provider_config, normalize_codex_reasoning_effort
14
15
  from ..config import ConfigManager
15
16
  from ..gitops import export_git_graph
16
17
  from ..prompts import PromptBuilder
17
18
  from ..runtime_logs import JsonlLogger
18
- from ..shared import append_jsonl, ensure_dir, generate_id, read_yaml, resolve_runner_binary, utc_now, write_json, write_text
19
+ from ..shared import append_jsonl, ensure_dir, generate_id, read_text, read_yaml, resolve_runner_binary, utc_now, write_json, write_text
19
20
  from ..web_search import extract_web_search_payload
20
21
  from .base import RunRequest, RunResult
21
22
 
23
+ _TOOL_EVENT_ARGS_TEXT_LIMIT = 8_000
24
+ _TOOL_EVENT_OUTPUT_TEXT_LIMIT = 16_000
25
+ _MAX_QUEST_EVENT_JSON_BYTES = 2_000_000
26
+ _OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT = 12_000
27
+
22
28
 
23
29
  def _compact_text(value: object, *, limit: int = 1200) -> str:
24
30
  if value is None:
@@ -35,15 +41,94 @@ def _compact_text(value: object, *, limit: int = 1200) -> str:
35
41
  return text[: limit - 1].rstrip() + "…"
36
42
 
37
43
 
38
- def _structured_text(value: object) -> str:
44
+ def _truncate_leaf_text(text: str, *, limit: int) -> str:
45
+ if limit <= 0 or len(text) <= limit:
46
+ return text
47
+ head = max(int(limit * 0.7), 256)
48
+ tail = max(limit - head - 64, 128)
49
+ omitted = max(len(text) - head - tail, 0)
50
+ return f"{text[:head].rstrip()}\n...[truncated {omitted} chars]...\n{text[-tail:].lstrip()}"
51
+
52
+
53
+ def _truncate_structured_value(value: object, *, string_limit: int) -> object:
54
+ if isinstance(value, str):
55
+ return _truncate_leaf_text(value.strip(), limit=string_limit)
56
+ if isinstance(value, list):
57
+ return [_truncate_structured_value(item, string_limit=string_limit) for item in value[:200]]
58
+ if isinstance(value, dict):
59
+ truncated: dict[object, object] = {}
60
+ for index, (key, item) in enumerate(value.items()):
61
+ if index >= 200:
62
+ truncated["__truncated__"] = f"truncated remaining {len(value) - 200} item(s)"
63
+ break
64
+ truncated[key] = _truncate_structured_value(item, string_limit=string_limit)
65
+ return truncated
66
+ return value
67
+
68
+
69
+ def _structured_text(value: object, *, limit: int | None = None) -> str:
39
70
  if value is None:
40
71
  return ""
41
72
  if isinstance(value, str):
42
- return value.strip()
73
+ return _truncate_leaf_text(value.strip(), limit=limit or len(value))
74
+ normalized_value = _truncate_structured_value(value, string_limit=max(limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT, 512))
43
75
  try:
44
- return json.dumps(value, ensure_ascii=False, indent=2)
76
+ return json.dumps(normalized_value, ensure_ascii=False, indent=2)
45
77
  except TypeError:
46
- return str(value)
78
+ return _truncate_leaf_text(str(value), limit=limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT)
79
+
80
+
81
+ def _encoded_json_size(value: object) -> int:
82
+ try:
83
+ return len(json.dumps(value, ensure_ascii=False).encode("utf-8"))
84
+ except Exception:
85
+ return len(str(value).encode("utf-8", errors="ignore"))
86
+
87
+
88
+ def _compact_tool_event_payload(payload: dict[str, Any]) -> dict[str, Any]:
89
+ if _encoded_json_size(payload) <= _MAX_QUEST_EVENT_JSON_BYTES:
90
+ return payload
91
+
92
+ compacted = dict(payload)
93
+ output_text = str(compacted.get("output") or "")
94
+ if output_text:
95
+ compacted["output_bytes"] = len(output_text.encode("utf-8", errors="ignore"))
96
+ compacted["output"] = _truncate_leaf_text(
97
+ output_text,
98
+ limit=_OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT,
99
+ )
100
+ compacted["output_truncated"] = True
101
+ args_text = str(compacted.get("args") or "")
102
+ if args_text and _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
103
+ compacted["args"] = _truncate_leaf_text(args_text, limit=4_000)
104
+ compacted["args_truncated"] = True
105
+ if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
106
+ metadata = compacted.get("metadata")
107
+ if isinstance(metadata, dict):
108
+ allowed_keys = {
109
+ "mcp_server",
110
+ "mcp_tool",
111
+ "bash_id",
112
+ "status",
113
+ "command",
114
+ "workdir",
115
+ "cwd",
116
+ "started_at",
117
+ "finished_at",
118
+ "exit_code",
119
+ "stop_reason",
120
+ "log_path",
121
+ }
122
+ compacted["metadata"] = {
123
+ key: metadata.get(key)
124
+ for key in allowed_keys
125
+ if key in metadata
126
+ }
127
+ compacted["metadata_truncated"] = True
128
+ if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
129
+ compacted["output"] = _compact_text(compacted.get("output"), limit=2_000)
130
+ compacted["output_truncated"] = True
131
+ return compacted
47
132
 
48
133
 
49
134
  def _iter_event_texts(event: dict[str, Any]) -> list[str]:
@@ -209,7 +294,7 @@ def _tool_args(event: dict[str, Any], item: dict[str, Any]) -> str:
209
294
  item.get("input"),
210
295
  event.get("input"),
211
296
  ):
212
- text = _structured_text(value)
297
+ text = _structured_text(value, limit=_TOOL_EVENT_ARGS_TEXT_LIMIT)
213
298
  if text:
214
299
  return text
215
300
  return ""
@@ -243,7 +328,7 @@ def _tool_output(event: dict[str, Any], item: dict[str, Any]) -> str:
243
328
  item.get("aggregated_output"),
244
329
  event.get("aggregated_output"),
245
330
  ):
246
- text = _structured_text(value)
331
+ text = _structured_text(value, limit=_TOOL_EVENT_OUTPUT_TEXT_LIMIT)
247
332
  if text:
248
333
  return text
249
334
  return ""
@@ -361,7 +446,7 @@ def _tool_event(
361
446
  "raw_event_type": event_type,
362
447
  "created_at": created_at,
363
448
  }
364
- return {
449
+ return _compact_tool_event_payload({
365
450
  "event_id": generate_id("evt"),
366
451
  "type": "runner.tool_result",
367
452
  "quest_id": quest_id,
@@ -375,7 +460,7 @@ def _tool_event(
375
460
  "output": _tool_output(event, item),
376
461
  "raw_event_type": event_type,
377
462
  "created_at": created_at,
378
- }
463
+ })
379
464
 
380
465
  if item_type == "web_search":
381
466
  tool_call_id = _tool_call_id(event, item)
@@ -399,7 +484,7 @@ def _tool_event(
399
484
  "raw_event_type": event_type,
400
485
  "created_at": created_at,
401
486
  }
402
- return {
487
+ return _compact_tool_event_payload({
403
488
  "event_id": generate_id("evt"),
404
489
  "type": "runner.tool_result",
405
490
  "quest_id": quest_id,
@@ -414,13 +499,13 @@ def _tool_event(
414
499
  "metadata": metadata,
415
500
  "raw_event_type": event_type,
416
501
  "created_at": created_at,
417
- }
502
+ })
418
503
 
419
504
  if item_type == "file_change":
420
505
  tool_call_id = _tool_call_id(event, item)
421
506
  tool_name = "file_change"
422
507
  known_tool_names[tool_call_id] = tool_name
423
- return {
508
+ return _compact_tool_event_payload({
424
509
  "event_id": generate_id("evt"),
425
510
  "type": "runner.tool_result",
426
511
  "quest_id": quest_id,
@@ -433,7 +518,7 @@ def _tool_event(
433
518
  "output": _tool_output(event, item),
434
519
  "raw_event_type": event_type,
435
520
  "created_at": created_at,
436
- }
521
+ })
437
522
 
438
523
  if item_type == "mcp_tool_call":
439
524
  tool_call_id = _tool_call_id(event, item)
@@ -466,7 +551,7 @@ def _tool_event(
466
551
  "raw_event_type": event_type,
467
552
  "created_at": created_at,
468
553
  }
469
- return {
554
+ return _compact_tool_event_payload({
470
555
  "event_id": generate_id("evt"),
471
556
  "type": "runner.tool_result",
472
557
  "quest_id": quest_id,
@@ -483,7 +568,7 @@ def _tool_event(
483
568
  "metadata": metadata,
484
569
  "raw_event_type": event_type,
485
570
  "created_at": created_at,
486
- }
571
+ })
487
572
 
488
573
  if item_type in {"function_call", "custom_tool_call", "tool_call"} or "function_call" in event_type or "tool_call" in event_type:
489
574
  tool_call_id = _tool_call_id(event, item)
@@ -507,7 +592,7 @@ def _tool_event(
507
592
  if item_type in {"function_call_output", "custom_tool_call_output", "tool_result", "tool_call_output"} or "function_call_output" in event_type or "tool_result" in event_type:
508
593
  tool_call_id = _tool_call_id(event, item)
509
594
  tool_name = known_tool_names.get(tool_call_id) or _tool_name(event, item)
510
- return {
595
+ return _compact_tool_event_payload({
511
596
  "event_id": generate_id("evt"),
512
597
  "type": "runner.tool_result",
513
598
  "quest_id": quest_id,
@@ -521,7 +606,7 @@ def _tool_event(
521
606
  "output": _tool_output(event, item),
522
607
  "raw_event_type": event_type,
523
608
  "created_at": created_at,
524
- }
609
+ })
525
610
 
526
611
  return None
527
612
 
@@ -582,6 +667,12 @@ class CodexRunner:
582
667
  )
583
668
 
584
669
  env = dict(**os.environ)
670
+ runner_env = runner_config.get("env") if isinstance(runner_config.get("env"), dict) else {}
671
+ for key, value in runner_env.items():
672
+ env_key = str(key or "").strip()
673
+ if not env_key or value is None:
674
+ continue
675
+ env[env_key] = str(value)
585
676
  env["CODEX_HOME"] = str(codex_home)
586
677
  env["DEEPSCIENTIST_HOME"] = str(self.home)
587
678
  env["DS_HOME"] = str(self.home)
@@ -809,21 +900,31 @@ class CodexRunner:
809
900
  workspace_root = request.worktree_root or request.quest_root
810
901
  resolved_binary = resolve_runner_binary(self.binary, runner_name="codex")
811
902
  resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
903
+ profile = str(resolved_runner_config.get("profile") or "").strip()
812
904
  normalized_model = str(request.model or "").strip()
813
905
  command = [
814
906
  resolved_binary or self.binary,
815
907
  "--search",
816
- "exec",
817
- "--json",
818
- "--cd",
819
- str(workspace_root),
820
- "--skip-git-repo-check",
821
908
  ]
909
+ if profile:
910
+ command.extend(["--profile", profile])
911
+ command.extend(
912
+ [
913
+ "exec",
914
+ "--json",
915
+ "--cd",
916
+ str(workspace_root),
917
+ "--skip-git-repo-check",
918
+ ]
919
+ )
822
920
  if normalized_model.lower() not in {"", "inherit", "default", "codex-default"}:
823
921
  command.extend(["--model", normalized_model])
824
922
  if request.approval_policy:
825
923
  command.extend(["-c", f'approval_policy="{request.approval_policy}"'])
826
- reasoning_effort = request.reasoning_effort
924
+ reasoning_effort, _ = normalize_codex_reasoning_effort(
925
+ request.reasoning_effort,
926
+ resolved_binary=resolved_binary or self.binary,
927
+ )
827
928
  if reasoning_effort:
828
929
  command.extend(["-c", f'model_reasoning_effort="{reasoning_effort}"'])
829
930
  tool_timeout_sec = self._positive_timeout_seconds(resolved_runner_config.get("mcp_tool_timeout_sec"))
@@ -846,7 +947,10 @@ class CodexRunner:
846
947
  runner_config: dict[str, Any] | None = None,
847
948
  ) -> Path:
848
949
  target = ensure_dir(workspace_root / ".codex")
849
- source = Path(os.environ.get("CODEX_HOME", str(Path.home() / ".codex"))).expanduser()
950
+ resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
951
+ configured_home = str(resolved_runner_config.get("config_dir") or os.environ.get("CODEX_HOME") or str(Path.home() / ".codex"))
952
+ profile = str(resolved_runner_config.get("profile") or "").strip()
953
+ source = Path(configured_home).expanduser()
850
954
  for filename in ("config.toml", "auth.json"):
851
955
  source_path = source / filename
852
956
  target_path = target / filename
@@ -854,6 +958,10 @@ class CodexRunner:
854
958
  if source_path.resolve() == target_path.resolve():
855
959
  continue
856
960
  shutil.copy2(source_path, target_path)
961
+ config_path = target / "config.toml"
962
+ if profile and config_path.exists():
963
+ adapted_text, _ = adapt_profile_only_provider_config(read_text(config_path), profile=profile)
964
+ write_text(config_path, adapted_text)
857
965
  ensure_dir(target / "skills")
858
966
  quest_skills_root = quest_root / ".codex" / "skills"
859
967
  if quest_skills_root.exists():
@@ -18,18 +18,27 @@ def _as_bool_env(name: str) -> bool:
18
18
 
19
19
 
20
20
  def codex_runtime_overrides() -> dict[str, str]:
21
+ binary = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_BINARY") or os.environ.get("DS_CODEX_BINARY"))
21
22
  approval_policy = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_APPROVAL_POLICY"))
22
23
  sandbox_mode = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_SANDBOX_MODE"))
24
+ profile = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_PROFILE"))
25
+ model = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_MODEL"))
23
26
 
24
27
  if _as_bool_env("DEEPSCIENTIST_CODEX_YOLO"):
25
28
  approval_policy = approval_policy or "never"
26
29
  sandbox_mode = sandbox_mode or "danger-full-access"
27
30
 
28
31
  overrides: dict[str, str] = {}
32
+ if binary:
33
+ overrides["binary"] = binary
29
34
  if approval_policy:
30
35
  overrides["approval_policy"] = approval_policy
31
36
  if sandbox_mode:
32
37
  overrides["sandbox_mode"] = sandbox_mode
38
+ if profile:
39
+ overrides["profile"] = profile
40
+ if model:
41
+ overrides["model"] = model
33
42
  return overrides
34
43
 
35
44
 
@@ -1,5 +1,6 @@
1
1
  from __future__ import annotations
2
2
 
3
+ from collections import deque
3
4
  import hashlib
4
5
  import json
5
6
  import os
@@ -9,7 +10,7 @@ import subprocess
9
10
  import sys
10
11
  from datetime import UTC, datetime
11
12
  from pathlib import Path
12
- from typing import Any
13
+ from typing import Any, Iterator
13
14
  from uuid import uuid4
14
15
 
15
16
  try:
@@ -90,21 +91,39 @@ def append_jsonl(path: Path, payload: dict[str, Any]) -> None:
90
91
  handle.write(json.dumps(payload, ensure_ascii=False) + "\n")
91
92
 
92
93
 
93
- def read_jsonl(path: Path) -> list[dict[str, Any]]:
94
+ def iter_jsonl(path: Path | str) -> Iterator[dict[str, Any]]:
95
+ path = Path(path)
94
96
  if not path.exists():
97
+ return
98
+ with path.open("r", encoding="utf-8") as handle:
99
+ for raw_line in handle:
100
+ line = raw_line.strip()
101
+ if not line:
102
+ continue
103
+ try:
104
+ payload = json.loads(line)
105
+ except json.JSONDecodeError:
106
+ continue
107
+ if isinstance(payload, dict):
108
+ yield payload
109
+
110
+
111
+ def read_jsonl(path: Path) -> list[dict[str, Any]]:
112
+ return list(iter_jsonl(path))
113
+
114
+
115
+ def count_jsonl(path: Path | str) -> int:
116
+ return sum(1 for _ in iter_jsonl(path))
117
+
118
+
119
+ def read_jsonl_tail(path: Path | str, limit: int) -> list[dict[str, Any]]:
120
+ normalized_limit = max(int(limit or 0), 0)
121
+ if normalized_limit <= 0:
95
122
  return []
96
- items: list[dict[str, Any]] = []
97
- for line in path.read_text(encoding="utf-8").splitlines():
98
- line = line.strip()
99
- if not line:
100
- continue
101
- try:
102
- payload = json.loads(line)
103
- except json.JSONDecodeError:
104
- continue
105
- if isinstance(payload, dict):
106
- items.append(payload)
107
- return items
123
+ items: deque[dict[str, Any]] = deque(maxlen=normalized_limit)
124
+ for payload in iter_jsonl(path):
125
+ items.append(payload)
126
+ return list(items)
108
127
 
109
128
 
110
129
  def read_yaml(path: Path, default: Any = None) -> Any:
@@ -10,7 +10,8 @@
10
10
  - qq_summary_first_rule: start with the conclusion the user cares about, then what it means, then the next action
11
11
  - qq_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
12
12
  - qq_eta_rule: for baseline reproduction, main experiments, analysis experiments, and other important long-running research phases, include a rough ETA for the next meaningful result or the next update; if uncertain, say that and still give the next check-in window
13
- - qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly 10 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible checkpoint
13
+ - qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
14
+ - qq_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short QQ-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
14
15
  - qq_internal_detail_rule: omit worker names, heartbeat timestamps, retry counters, pending/running/completed counts, file names, and monitor-window narration unless the user asked for them or the detail changes the recommended action
15
16
  - qq_translation_rule: convert internal execution and file-management work into user value, such as saying the baseline record is now organized for easier later comparison instead of listing touched files
16
17
  - qq_preflight_rule: before sending a QQ progress update, rewrite it if it still sounds like a monitoring log, execution diary, or file inventory
@@ -10,7 +10,8 @@
10
10
  - weixin_summary_first_rule: start with the user-facing conclusion, then what it means, then the next action
11
11
  - weixin_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
12
12
  - weixin_eta_rule: for important long-running phases such as baseline reproduction, main experiments, analysis, or paper packaging, include a rough ETA or next check-in window when you can
13
- - weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 10 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible checkpoint
13
+ - weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
14
+ - weixin_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short Weixin-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
14
15
  - weixin_internal_detail_rule: omit worker names, retry counters, pending/running/completed counts, low-level file listings, and monitor-window narration unless the user explicitly asked for them or they change the recommended action
15
16
  - weixin_translation_rule: translate internal execution and file-management work into user value instead of narrating tool or filesystem churn
16
17
  - weixin_preflight_rule: before sending a Weixin-facing progress update, rewrite it if it still reads like a monitor log, execution diary, or file inventory
@@ -7,7 +7,10 @@ This shared contract is injected once per turn and applies across the stage and
7
7
  - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
8
8
  - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the current stage or companion-skill task.
9
9
  - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
10
- - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 10 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
10
+ - Stage-kickoff rule: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work.
11
+ - Reading/planning keepalive rule: if you spend 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet.
12
+ - Subtask-boundary rule: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal.
13
+ - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 6 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
11
14
  - Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
12
15
  - Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
13
16
  - Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
@@ -53,7 +53,7 @@ Your job is to keep a research quest moving forward in a durable, auditable, evi
53
53
  - for ordinary progress replies, usually stay within 2 to 4 short sentences or 3 short bullets at most
54
54
  - start with the conclusion the user cares about, then what it means, then the next action
55
55
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running research phases, also tell the user roughly how long until the next meaningful result, next step, or next update
56
- - for ordinary active multi-step work, prefer a concise update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 20 tool calls or about 15 minutes of active foreground work without a user-visible update unless a real milestone is imminent
56
+ - for ordinary active multi-step work, prefer a concise update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 12 tool calls or about 8 minutes of active foreground work without a user-visible update unless a real milestone is imminent
57
57
  - do not spam internal tool chatter, raw diffs, or every small checkpoint
58
58
  - do not proactively enumerate file paths, file inventories, or low-level file details unless the user explicitly asks
59
59
  - do not proactively expose worker names, heartbeat timestamps, retry counters, pending/running/completed counts, or monitor-window narration unless that detail changes the recommended action or is required for honesty about risk
@@ -203,7 +203,7 @@ When you send user-facing updates (especially via `artifact.interact(...)`), wri
203
203
  - what task you are currently working on
204
204
  - what the main difficulty, risk, or latest real progress is
205
205
  - what concrete next step or mitigation you will take
206
- - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without any user-visible checkpoint
206
+ - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without any user-visible checkpoint
207
207
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running phases, also make the timing expectation explicit:
208
208
  - roughly how long until the next meaningful result, next milestone, or next update, usually within a 10 to 30 minute window
209
209
  - if runtime is uncertain, say that directly and give the next check-in window instead of pretending to know an exact ETA
@@ -463,9 +463,12 @@ Each milestone update should usually state:
463
463
  Cadence defaults for ordinary active work:
464
464
 
465
465
  - treat `artifact.interact(...)` as the default user-visible heartbeat rather than an optional extra
466
- - soft trigger: after about 10 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
467
- - hard trigger: do not exceed about 20 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
468
- - time trigger: do not exceed about 15 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
466
+ - stage-kickoff trigger: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work
467
+ - reading/planning trigger: if you spend about 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet
468
+ - boundary trigger: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal
469
+ - soft trigger: after about 6 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
470
+ - hard trigger: do not exceed about 12 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
471
+ - time trigger: do not exceed about 8 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
469
472
  - immediate trigger: send a user-visible update as soon as a real blocker, recovery, route change, branch/worktree switch, baseline gate change, selected idea, recorded main experiment, or user-priority interruption becomes clear
470
473
  - de-duplication rule: do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a real milestone, blocker, route change, or new user message makes that extra update genuinely useful
471
474
  - keep ordinary subtask completions short; reserve richer milestone reports for stage-significant deliverables and route-changing checkpoints instead of narrating every small setup step
@@ -1080,9 +1083,10 @@ For `artifact.interact(...)` specifically:
1080
1083
  - raw logs
1081
1084
  - internal tool names
1082
1085
  - mention those details only if the user asked for them or needs them to act on the message
1083
- - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without a user-visible update
1086
+ - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without a user-visible update
1084
1087
  - during long active execution, after the first meaningful signal from long-running work, keep the user informed and never let active user-relevant work go more than 30 minutes without a real progress inspection and, if still running, a user-visible keepalive
1085
- - do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1088
+ - if the active work is still mostly reading, comparison, synthesis, or planning, do not hide behind "no result yet"; send a short user-visible checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
1089
+ - do not send another ordinary progress update within about 2 additional tool calls or about 60 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1086
1090
  - each ordinary progress update should usually answer only:
1087
1091
  - what changed
1088
1092
  - what it means now
@@ -1321,7 +1325,7 @@ If the field is absent, default to `freeform`.
1321
1325
  When `launch_mode = custom`:
1322
1326
 
1323
1327
  - do not force the quest back into the canonical full-research path if the custom brief is narrower
1324
- - treat `entry_state_summary`, `review_summary`, and `custom_brief` as real startup context rather than decorative metadata
1328
+ - treat `entry_state_summary`, `review_summary`, `review_materials`, and `custom_brief` as real startup context rather than decorative metadata
1325
1329
  - if the quest clearly starts from existing baseline / result / draft state, open `intake-audit` before restarting baseline discovery or fresh experimentation
1326
1330
  - if the quest clearly starts from reviewer comments, a revision request, or a rebuttal packet, open `rebuttal` before ordinary `write`
1327
1331
  - after the custom entry skill stabilizes the route, continue through the normal stage skills as needed
@@ -1331,12 +1335,58 @@ When `custom_profile = continue_existing_state`:
1331
1335
  - assume the quest may already contain reusable baselines, measured results, analysis assets, or writing assets
1332
1336
  - audit and trust-rank those assets first instead of reflexively rerunning everything
1333
1337
 
1338
+ When `custom_profile = review_audit`:
1339
+
1340
+ - assume the active contract is a substantial draft or paper package that needs an independent skeptical audit
1341
+ - open `review` before more writing or finalization
1342
+ - if the audit finds real gaps, route to the needed downstream skill instead of polishing blindly
1343
+
1344
+ When `startup_contract.review_followup_policy = auto_execute_followups`:
1345
+
1346
+ - after review artifacts are durable, continue automatically into the required experiments, manuscript deltas, and review-closure work
1347
+ - do not stop at the audit report if the route is already clear
1348
+
1349
+ When `startup_contract.review_followup_policy = user_gated_followups`:
1350
+
1351
+ - finish the review artifacts first
1352
+ - then raise one structured decision before expensive experiments or manuscript revisions continue
1353
+
1354
+ When `startup_contract.review_followup_policy = audit_only`:
1355
+
1356
+ - stop after the durable audit artifacts and route recommendation unless the user later asks for execution follow-up
1357
+
1334
1358
  When `custom_profile = revision_rebuttal`:
1335
1359
 
1336
1360
  - assume the active contract is a paper-review workflow rather than a blank research loop
1337
1361
  - preserve the existing paper, results, and reviewer package as the starting state
1338
1362
  - route supplementary experiments through `analysis-campaign` and manuscript deltas through `write`, but let `rebuttal` orchestrate that mapping
1339
1363
 
1364
+ When `startup_contract.baseline_execution_policy = must_reproduce_or_verify`:
1365
+
1366
+ - explicitly verify or recover the rebuttal-critical baseline or comparator before reviewer-linked follow-up work
1367
+
1368
+ When `startup_contract.baseline_execution_policy = reuse_existing_only`:
1369
+
1370
+ - trust the current confirmed baseline/results unless you find concrete inconsistency, corruption, or missing-evidence problems
1371
+
1372
+ When `startup_contract.baseline_execution_policy = skip_unless_blocking`:
1373
+
1374
+ - do not spend time rerunning baselines by default
1375
+ - only open `baseline` if a named review/rebuttal issue truly depends on a missing comparator or unusable prior evidence
1376
+
1377
+ When `startup_contract.manuscript_edit_mode = latex_required`:
1378
+
1379
+ - if manuscript revision is required, treat the provided LaTeX tree or `paper/latex/` as the writing surface
1380
+ - if LaTeX source is unavailable, do not pretend the manuscript was edited; produce LaTeX-ready replacement text and state the blocker explicitly
1381
+
1382
+ When `startup_contract.manuscript_edit_mode = copy_ready_text`:
1383
+
1384
+ - provide section-level copy-ready replacement text and explicit deltas when manuscript revision is required
1385
+
1386
+ When `startup_contract.manuscript_edit_mode = none`:
1387
+
1388
+ - revision planning artifacts are sufficient unless the user later broadens scope
1389
+
1340
1390
  When `custom_profile = freeform`:
1341
1391
 
1342
1392
  - treat the custom brief as the primary scope contract
@@ -2078,7 +2128,7 @@ When summarizing long logs, campaigns, or multi-agent work:
2078
2128
  - the estimated next reply time (usually the next sleep interval you are about to use)
2079
2129
  - If the run still looks healthy but there is no human-meaningful delta yet, continue monitoring silently instead of sending a no-change keepalive just because a sleep finished.
2080
2130
  - For baseline reproduction, main experiments, analysis experiments, and similar user-relevant long runs, translate that monitoring ETA into user-facing language such as how long until the next meaningful result or the next expected update.
2081
- - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 20 tool calls or about 15 minutes without a user-visible checkpoint.
2131
+ - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 12 tool calls or about 8 minutes without a user-visible checkpoint.
2082
2132
  - If you forget a bash id, do not guess. Use `bash_exec(mode='history')` or `bash_exec(mode='list')` and recover it from the reverse-chronological session list.
2083
2133
  - If the long-running command or wrapper code can emit structured progress markers, prefer a concise `__DS_PROGRESS__ { ... }` JSON line with fields such as:
2084
2134
  - `current`