@researai/deepscientist 1.5.11 → 1.5.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +8 -8
- package/bin/ds.js +375 -61
- package/docs/en/00_QUICK_START.md +55 -4
- package/docs/en/01_SETTINGS_REFERENCE.md +15 -0
- package/docs/en/02_START_RESEARCH_GUIDE.md +68 -4
- package/docs/en/09_DOCTOR.md +48 -4
- package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +21 -2
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +382 -0
- package/docs/en/README.md +4 -0
- package/docs/zh/00_QUICK_START.md +54 -3
- package/docs/zh/01_SETTINGS_REFERENCE.md +15 -0
- package/docs/zh/02_START_RESEARCH_GUIDE.md +69 -3
- package/docs/zh/09_DOCTOR.md +48 -2
- package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +21 -2
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +383 -0
- package/docs/zh/README.md +4 -1
- package/package.json +2 -1
- package/pyproject.toml +1 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/bash_exec/monitor.py +7 -5
- package/src/deepscientist/bash_exec/service.py +84 -21
- package/src/deepscientist/channels/local.py +3 -3
- package/src/deepscientist/channels/qq.py +7 -7
- package/src/deepscientist/channels/relay.py +7 -7
- package/src/deepscientist/channels/weixin_ilink.py +90 -19
- package/src/deepscientist/cli.py +3 -0
- package/src/deepscientist/codex_cli_compat.py +117 -0
- package/src/deepscientist/config/models.py +1 -0
- package/src/deepscientist/config/service.py +173 -25
- package/src/deepscientist/daemon/app.py +314 -6
- package/src/deepscientist/doctor.py +1 -5
- package/src/deepscientist/mcp/server.py +124 -3
- package/src/deepscientist/prompts/builder.py +113 -11
- package/src/deepscientist/quest/service.py +247 -31
- package/src/deepscientist/runners/codex.py +132 -24
- package/src/deepscientist/runners/runtime_overrides.py +9 -0
- package/src/deepscientist/shared.py +33 -14
- package/src/prompts/connectors/qq.md +2 -1
- package/src/prompts/connectors/weixin.md +2 -1
- package/src/prompts/contracts/shared_interaction.md +4 -1
- package/src/prompts/system.md +59 -9
- package/src/skills/analysis-campaign/SKILL.md +46 -6
- package/src/skills/analysis-campaign/references/campaign-plan-template.md +21 -8
- package/src/skills/baseline/SKILL.md +1 -1
- package/src/skills/baseline/references/artifact-payload-examples.md +39 -0
- package/src/skills/decision/SKILL.md +1 -1
- package/src/skills/experiment/SKILL.md +1 -1
- package/src/skills/finalize/SKILL.md +1 -1
- package/src/skills/idea/SKILL.md +1 -1
- package/src/skills/intake-audit/SKILL.md +1 -1
- package/src/skills/rebuttal/SKILL.md +74 -1
- package/src/skills/rebuttal/references/response-letter-template.md +55 -11
- package/src/skills/review/SKILL.md +118 -1
- package/src/skills/review/references/experiment-todo-template.md +23 -0
- package/src/skills/review/references/review-report-template.md +16 -0
- package/src/skills/review/references/revision-log-template.md +4 -0
- package/src/skills/scout/SKILL.md +1 -1
- package/src/skills/write/SKILL.md +168 -7
- package/src/skills/write/references/paper-experiment-matrix-template.md +131 -0
- package/src/tui/dist/lib/connectorConfig.js +90 -0
- package/src/tui/dist/lib/qr.js +21 -0
- package/src/tui/package.json +2 -1
- package/src/ui/dist/assets/{AiManusChatView-D0mTXG4-.js → AiManusChatView-CnJcXynW.js} +12 -12
- package/src/ui/dist/assets/{AnalysisPlugin-Db0cTXxm.js → AnalysisPlugin-DeyzPEhV.js} +1 -1
- package/src/ui/dist/assets/{CliPlugin-DrV8je02.js → CliPlugin-CB1YODQn.js} +9 -9
- package/src/ui/dist/assets/{CodeEditorPlugin-QXMSCH71.js → CodeEditorPlugin-B-xicq1e.js} +8 -8
- package/src/ui/dist/assets/{CodeViewerPlugin-7hhtWj_E.js → CodeViewerPlugin-DT54ysXa.js} +5 -5
- package/src/ui/dist/assets/{DocViewerPlugin-BWMSnRJe.js → DocViewerPlugin-DQtKT-VD.js} +3 -3
- package/src/ui/dist/assets/{GitDiffViewerPlugin-7J9h9Vy_.js → GitDiffViewerPlugin-hqHbCfnv.js} +20 -20
- package/src/ui/dist/assets/{ImageViewerPlugin-CHJl_0lr.js → ImageViewerPlugin-OcVo33jV.js} +5 -5
- package/src/ui/dist/assets/{LabCopilotPanel-1qSow1es.js → LabCopilotPanel-DdGwhEUV.js} +11 -11
- package/src/ui/dist/assets/{LabPlugin-eQpPPCEp.js → LabPlugin-Ciz1gDaX.js} +2 -2
- package/src/ui/dist/assets/{LatexPlugin-BwRfi89Z.js → LatexPlugin-BhmjNQRC.js} +37 -11
- package/src/ui/dist/assets/{MarkdownViewerPlugin-836PVQWV.js → MarkdownViewerPlugin-BzdVH9Bx.js} +4 -4
- package/src/ui/dist/assets/{MarketplacePlugin-C2y_556i.js → MarketplacePlugin-DmyHspXt.js} +3 -3
- package/src/ui/dist/assets/{NotebookEditor-DIX7Mlzu.js → NotebookEditor-BMXKrDRk.js} +1 -1
- package/src/ui/dist/assets/{NotebookEditor-BRzJbGsn.js → NotebookEditor-BTVYRGkm.js} +11 -11
- package/src/ui/dist/assets/{PdfLoader-DzRaTAlq.js → PdfLoader-CvcjJHXv.js} +1 -1
- package/src/ui/dist/assets/{PdfMarkdownPlugin-DZUfIUnp.js → PdfMarkdownPlugin-DW2ej8Vk.js} +2 -2
- package/src/ui/dist/assets/{PdfViewerPlugin-BwtICzue.js → PdfViewerPlugin-CmlDxbhU.js} +10 -10
- package/src/ui/dist/assets/{SearchPlugin-DHeIAMsx.js → SearchPlugin-DAjQZPSv.js} +1 -1
- package/src/ui/dist/assets/{TextViewerPlugin-C3tCmFox.js → TextViewerPlugin-C-nVAZb_.js} +5 -5
- package/src/ui/dist/assets/{VNCViewer-CQsKVm3t.js → VNCViewer-D7-dIYon.js} +10 -10
- package/src/ui/dist/assets/{bot-BEA2vWuK.js → bot-C_G4WtNI.js} +1 -1
- package/src/ui/dist/assets/{code-XfbSR8K2.js → code-Cd7WfiWq.js} +1 -1
- package/src/ui/dist/assets/{file-content-BjxNaIfy.js → file-content-B57zsL9y.js} +1 -1
- package/src/ui/dist/assets/{file-diff-panel-D_lLVQk0.js → file-diff-panel-DVoheLFq.js} +1 -1
- package/src/ui/dist/assets/{file-socket-D9x_5vlY.js → file-socket-B5kXFxZP.js} +1 -1
- package/src/ui/dist/assets/{image-BhWT33W1.js → image-LLOjkMHF.js} +1 -1
- package/src/ui/dist/assets/{index-Dqj-Mjb4.css → index-BQG-1s2o.css} +40 -2
- package/src/ui/dist/assets/{index--c4iXtuy.js → index-C3r2iGrp.js} +12 -12
- package/src/ui/dist/assets/{index-DZTZ8mWP.js → index-CLQauncb.js} +911 -120
- package/src/ui/dist/assets/{index-PJbSbPTy.js → index-Dxa2eYMY.js} +1 -1
- package/src/ui/dist/assets/{index-BDxipwrC.js → index-hOUOWbW2.js} +2 -2
- package/src/ui/dist/assets/{monaco-K8izTGgo.js → monaco-BGGAEii3.js} +1 -1
- package/src/ui/dist/assets/{pdf-effect-queue-DfBors6y.js → pdf-effect-queue-DlEr1_y5.js} +1 -1
- package/src/ui/dist/assets/{popover-yFK1J4fL.js → popover-CWJbJuYY.js} +1 -1
- package/src/ui/dist/assets/{project-sync-PENr2zcz.js → project-sync-CRJiucYO.js} +18 -4
- package/src/ui/dist/assets/{select-CAbJDfYv.js → select-CoHB7pvH.js} +2 -2
- package/src/ui/dist/assets/{sigma-DEuYJqTl.js → sigma-D5aJWR8J.js} +1 -1
- package/src/ui/dist/assets/{square-check-big-omoSUmcd.js → square-check-big-DUK_mnkS.js} +1 -1
- package/src/ui/dist/assets/{trash--F119N47.js → trash-ChU3SEE3.js} +1 -1
- package/src/ui/dist/assets/{useCliAccess-D31UR23I.js → useCliAccess-BrJBV3tY.js} +1 -1
- package/src/ui/dist/assets/{useFileDiffOverlay-BH6KcMzq.js → useFileDiffOverlay-C2OQaVWc.js} +1 -1
- package/src/ui/dist/assets/{wrap-text-CZ613PM5.js → wrap-text-C7Qqh-om.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-BgDLAv3z.js → zoom-out-rtX0FKya.js} +1 -1
- package/src/ui/dist/index.html +2 -2
|
@@ -11,14 +11,20 @@ from pathlib import Path
|
|
|
11
11
|
from typing import Any
|
|
12
12
|
|
|
13
13
|
from ..artifact import ArtifactService
|
|
14
|
+
from ..codex_cli_compat import adapt_profile_only_provider_config, normalize_codex_reasoning_effort
|
|
14
15
|
from ..config import ConfigManager
|
|
15
16
|
from ..gitops import export_git_graph
|
|
16
17
|
from ..prompts import PromptBuilder
|
|
17
18
|
from ..runtime_logs import JsonlLogger
|
|
18
|
-
from ..shared import append_jsonl, ensure_dir, generate_id, read_yaml, resolve_runner_binary, utc_now, write_json, write_text
|
|
19
|
+
from ..shared import append_jsonl, ensure_dir, generate_id, read_text, read_yaml, resolve_runner_binary, utc_now, write_json, write_text
|
|
19
20
|
from ..web_search import extract_web_search_payload
|
|
20
21
|
from .base import RunRequest, RunResult
|
|
21
22
|
|
|
23
|
+
_TOOL_EVENT_ARGS_TEXT_LIMIT = 8_000
|
|
24
|
+
_TOOL_EVENT_OUTPUT_TEXT_LIMIT = 16_000
|
|
25
|
+
_MAX_QUEST_EVENT_JSON_BYTES = 2_000_000
|
|
26
|
+
_OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT = 12_000
|
|
27
|
+
|
|
22
28
|
|
|
23
29
|
def _compact_text(value: object, *, limit: int = 1200) -> str:
|
|
24
30
|
if value is None:
|
|
@@ -35,15 +41,94 @@ def _compact_text(value: object, *, limit: int = 1200) -> str:
|
|
|
35
41
|
return text[: limit - 1].rstrip() + "…"
|
|
36
42
|
|
|
37
43
|
|
|
38
|
-
def
|
|
44
|
+
def _truncate_leaf_text(text: str, *, limit: int) -> str:
|
|
45
|
+
if limit <= 0 or len(text) <= limit:
|
|
46
|
+
return text
|
|
47
|
+
head = max(int(limit * 0.7), 256)
|
|
48
|
+
tail = max(limit - head - 64, 128)
|
|
49
|
+
omitted = max(len(text) - head - tail, 0)
|
|
50
|
+
return f"{text[:head].rstrip()}\n...[truncated {omitted} chars]...\n{text[-tail:].lstrip()}"
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def _truncate_structured_value(value: object, *, string_limit: int) -> object:
|
|
54
|
+
if isinstance(value, str):
|
|
55
|
+
return _truncate_leaf_text(value.strip(), limit=string_limit)
|
|
56
|
+
if isinstance(value, list):
|
|
57
|
+
return [_truncate_structured_value(item, string_limit=string_limit) for item in value[:200]]
|
|
58
|
+
if isinstance(value, dict):
|
|
59
|
+
truncated: dict[object, object] = {}
|
|
60
|
+
for index, (key, item) in enumerate(value.items()):
|
|
61
|
+
if index >= 200:
|
|
62
|
+
truncated["__truncated__"] = f"truncated remaining {len(value) - 200} item(s)"
|
|
63
|
+
break
|
|
64
|
+
truncated[key] = _truncate_structured_value(item, string_limit=string_limit)
|
|
65
|
+
return truncated
|
|
66
|
+
return value
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
def _structured_text(value: object, *, limit: int | None = None) -> str:
|
|
39
70
|
if value is None:
|
|
40
71
|
return ""
|
|
41
72
|
if isinstance(value, str):
|
|
42
|
-
return value.strip()
|
|
73
|
+
return _truncate_leaf_text(value.strip(), limit=limit or len(value))
|
|
74
|
+
normalized_value = _truncate_structured_value(value, string_limit=max(limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT, 512))
|
|
43
75
|
try:
|
|
44
|
-
return json.dumps(
|
|
76
|
+
return json.dumps(normalized_value, ensure_ascii=False, indent=2)
|
|
45
77
|
except TypeError:
|
|
46
|
-
return str(value)
|
|
78
|
+
return _truncate_leaf_text(str(value), limit=limit or _TOOL_EVENT_OUTPUT_TEXT_LIMIT)
|
|
79
|
+
|
|
80
|
+
|
|
81
|
+
def _encoded_json_size(value: object) -> int:
|
|
82
|
+
try:
|
|
83
|
+
return len(json.dumps(value, ensure_ascii=False).encode("utf-8"))
|
|
84
|
+
except Exception:
|
|
85
|
+
return len(str(value).encode("utf-8", errors="ignore"))
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
def _compact_tool_event_payload(payload: dict[str, Any]) -> dict[str, Any]:
|
|
89
|
+
if _encoded_json_size(payload) <= _MAX_QUEST_EVENT_JSON_BYTES:
|
|
90
|
+
return payload
|
|
91
|
+
|
|
92
|
+
compacted = dict(payload)
|
|
93
|
+
output_text = str(compacted.get("output") or "")
|
|
94
|
+
if output_text:
|
|
95
|
+
compacted["output_bytes"] = len(output_text.encode("utf-8", errors="ignore"))
|
|
96
|
+
compacted["output"] = _truncate_leaf_text(
|
|
97
|
+
output_text,
|
|
98
|
+
limit=_OVERSIZED_EVENT_PREVIEW_TEXT_LIMIT,
|
|
99
|
+
)
|
|
100
|
+
compacted["output_truncated"] = True
|
|
101
|
+
args_text = str(compacted.get("args") or "")
|
|
102
|
+
if args_text and _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
|
|
103
|
+
compacted["args"] = _truncate_leaf_text(args_text, limit=4_000)
|
|
104
|
+
compacted["args_truncated"] = True
|
|
105
|
+
if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
|
|
106
|
+
metadata = compacted.get("metadata")
|
|
107
|
+
if isinstance(metadata, dict):
|
|
108
|
+
allowed_keys = {
|
|
109
|
+
"mcp_server",
|
|
110
|
+
"mcp_tool",
|
|
111
|
+
"bash_id",
|
|
112
|
+
"status",
|
|
113
|
+
"command",
|
|
114
|
+
"workdir",
|
|
115
|
+
"cwd",
|
|
116
|
+
"started_at",
|
|
117
|
+
"finished_at",
|
|
118
|
+
"exit_code",
|
|
119
|
+
"stop_reason",
|
|
120
|
+
"log_path",
|
|
121
|
+
}
|
|
122
|
+
compacted["metadata"] = {
|
|
123
|
+
key: metadata.get(key)
|
|
124
|
+
for key in allowed_keys
|
|
125
|
+
if key in metadata
|
|
126
|
+
}
|
|
127
|
+
compacted["metadata_truncated"] = True
|
|
128
|
+
if _encoded_json_size(compacted) > _MAX_QUEST_EVENT_JSON_BYTES:
|
|
129
|
+
compacted["output"] = _compact_text(compacted.get("output"), limit=2_000)
|
|
130
|
+
compacted["output_truncated"] = True
|
|
131
|
+
return compacted
|
|
47
132
|
|
|
48
133
|
|
|
49
134
|
def _iter_event_texts(event: dict[str, Any]) -> list[str]:
|
|
@@ -209,7 +294,7 @@ def _tool_args(event: dict[str, Any], item: dict[str, Any]) -> str:
|
|
|
209
294
|
item.get("input"),
|
|
210
295
|
event.get("input"),
|
|
211
296
|
):
|
|
212
|
-
text = _structured_text(value)
|
|
297
|
+
text = _structured_text(value, limit=_TOOL_EVENT_ARGS_TEXT_LIMIT)
|
|
213
298
|
if text:
|
|
214
299
|
return text
|
|
215
300
|
return ""
|
|
@@ -243,7 +328,7 @@ def _tool_output(event: dict[str, Any], item: dict[str, Any]) -> str:
|
|
|
243
328
|
item.get("aggregated_output"),
|
|
244
329
|
event.get("aggregated_output"),
|
|
245
330
|
):
|
|
246
|
-
text = _structured_text(value)
|
|
331
|
+
text = _structured_text(value, limit=_TOOL_EVENT_OUTPUT_TEXT_LIMIT)
|
|
247
332
|
if text:
|
|
248
333
|
return text
|
|
249
334
|
return ""
|
|
@@ -361,7 +446,7 @@ def _tool_event(
|
|
|
361
446
|
"raw_event_type": event_type,
|
|
362
447
|
"created_at": created_at,
|
|
363
448
|
}
|
|
364
|
-
return {
|
|
449
|
+
return _compact_tool_event_payload({
|
|
365
450
|
"event_id": generate_id("evt"),
|
|
366
451
|
"type": "runner.tool_result",
|
|
367
452
|
"quest_id": quest_id,
|
|
@@ -375,7 +460,7 @@ def _tool_event(
|
|
|
375
460
|
"output": _tool_output(event, item),
|
|
376
461
|
"raw_event_type": event_type,
|
|
377
462
|
"created_at": created_at,
|
|
378
|
-
}
|
|
463
|
+
})
|
|
379
464
|
|
|
380
465
|
if item_type == "web_search":
|
|
381
466
|
tool_call_id = _tool_call_id(event, item)
|
|
@@ -399,7 +484,7 @@ def _tool_event(
|
|
|
399
484
|
"raw_event_type": event_type,
|
|
400
485
|
"created_at": created_at,
|
|
401
486
|
}
|
|
402
|
-
return {
|
|
487
|
+
return _compact_tool_event_payload({
|
|
403
488
|
"event_id": generate_id("evt"),
|
|
404
489
|
"type": "runner.tool_result",
|
|
405
490
|
"quest_id": quest_id,
|
|
@@ -414,13 +499,13 @@ def _tool_event(
|
|
|
414
499
|
"metadata": metadata,
|
|
415
500
|
"raw_event_type": event_type,
|
|
416
501
|
"created_at": created_at,
|
|
417
|
-
}
|
|
502
|
+
})
|
|
418
503
|
|
|
419
504
|
if item_type == "file_change":
|
|
420
505
|
tool_call_id = _tool_call_id(event, item)
|
|
421
506
|
tool_name = "file_change"
|
|
422
507
|
known_tool_names[tool_call_id] = tool_name
|
|
423
|
-
return {
|
|
508
|
+
return _compact_tool_event_payload({
|
|
424
509
|
"event_id": generate_id("evt"),
|
|
425
510
|
"type": "runner.tool_result",
|
|
426
511
|
"quest_id": quest_id,
|
|
@@ -433,7 +518,7 @@ def _tool_event(
|
|
|
433
518
|
"output": _tool_output(event, item),
|
|
434
519
|
"raw_event_type": event_type,
|
|
435
520
|
"created_at": created_at,
|
|
436
|
-
}
|
|
521
|
+
})
|
|
437
522
|
|
|
438
523
|
if item_type == "mcp_tool_call":
|
|
439
524
|
tool_call_id = _tool_call_id(event, item)
|
|
@@ -466,7 +551,7 @@ def _tool_event(
|
|
|
466
551
|
"raw_event_type": event_type,
|
|
467
552
|
"created_at": created_at,
|
|
468
553
|
}
|
|
469
|
-
return {
|
|
554
|
+
return _compact_tool_event_payload({
|
|
470
555
|
"event_id": generate_id("evt"),
|
|
471
556
|
"type": "runner.tool_result",
|
|
472
557
|
"quest_id": quest_id,
|
|
@@ -483,7 +568,7 @@ def _tool_event(
|
|
|
483
568
|
"metadata": metadata,
|
|
484
569
|
"raw_event_type": event_type,
|
|
485
570
|
"created_at": created_at,
|
|
486
|
-
}
|
|
571
|
+
})
|
|
487
572
|
|
|
488
573
|
if item_type in {"function_call", "custom_tool_call", "tool_call"} or "function_call" in event_type or "tool_call" in event_type:
|
|
489
574
|
tool_call_id = _tool_call_id(event, item)
|
|
@@ -507,7 +592,7 @@ def _tool_event(
|
|
|
507
592
|
if item_type in {"function_call_output", "custom_tool_call_output", "tool_result", "tool_call_output"} or "function_call_output" in event_type or "tool_result" in event_type:
|
|
508
593
|
tool_call_id = _tool_call_id(event, item)
|
|
509
594
|
tool_name = known_tool_names.get(tool_call_id) or _tool_name(event, item)
|
|
510
|
-
return {
|
|
595
|
+
return _compact_tool_event_payload({
|
|
511
596
|
"event_id": generate_id("evt"),
|
|
512
597
|
"type": "runner.tool_result",
|
|
513
598
|
"quest_id": quest_id,
|
|
@@ -521,7 +606,7 @@ def _tool_event(
|
|
|
521
606
|
"output": _tool_output(event, item),
|
|
522
607
|
"raw_event_type": event_type,
|
|
523
608
|
"created_at": created_at,
|
|
524
|
-
}
|
|
609
|
+
})
|
|
525
610
|
|
|
526
611
|
return None
|
|
527
612
|
|
|
@@ -582,6 +667,12 @@ class CodexRunner:
|
|
|
582
667
|
)
|
|
583
668
|
|
|
584
669
|
env = dict(**os.environ)
|
|
670
|
+
runner_env = runner_config.get("env") if isinstance(runner_config.get("env"), dict) else {}
|
|
671
|
+
for key, value in runner_env.items():
|
|
672
|
+
env_key = str(key or "").strip()
|
|
673
|
+
if not env_key or value is None:
|
|
674
|
+
continue
|
|
675
|
+
env[env_key] = str(value)
|
|
585
676
|
env["CODEX_HOME"] = str(codex_home)
|
|
586
677
|
env["DEEPSCIENTIST_HOME"] = str(self.home)
|
|
587
678
|
env["DS_HOME"] = str(self.home)
|
|
@@ -809,21 +900,31 @@ class CodexRunner:
|
|
|
809
900
|
workspace_root = request.worktree_root or request.quest_root
|
|
810
901
|
resolved_binary = resolve_runner_binary(self.binary, runner_name="codex")
|
|
811
902
|
resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
|
|
903
|
+
profile = str(resolved_runner_config.get("profile") or "").strip()
|
|
812
904
|
normalized_model = str(request.model or "").strip()
|
|
813
905
|
command = [
|
|
814
906
|
resolved_binary or self.binary,
|
|
815
907
|
"--search",
|
|
816
|
-
"exec",
|
|
817
|
-
"--json",
|
|
818
|
-
"--cd",
|
|
819
|
-
str(workspace_root),
|
|
820
|
-
"--skip-git-repo-check",
|
|
821
908
|
]
|
|
909
|
+
if profile:
|
|
910
|
+
command.extend(["--profile", profile])
|
|
911
|
+
command.extend(
|
|
912
|
+
[
|
|
913
|
+
"exec",
|
|
914
|
+
"--json",
|
|
915
|
+
"--cd",
|
|
916
|
+
str(workspace_root),
|
|
917
|
+
"--skip-git-repo-check",
|
|
918
|
+
]
|
|
919
|
+
)
|
|
822
920
|
if normalized_model.lower() not in {"", "inherit", "default", "codex-default"}:
|
|
823
921
|
command.extend(["--model", normalized_model])
|
|
824
922
|
if request.approval_policy:
|
|
825
923
|
command.extend(["-c", f'approval_policy="{request.approval_policy}"'])
|
|
826
|
-
reasoning_effort =
|
|
924
|
+
reasoning_effort, _ = normalize_codex_reasoning_effort(
|
|
925
|
+
request.reasoning_effort,
|
|
926
|
+
resolved_binary=resolved_binary or self.binary,
|
|
927
|
+
)
|
|
827
928
|
if reasoning_effort:
|
|
828
929
|
command.extend(["-c", f'model_reasoning_effort="{reasoning_effort}"'])
|
|
829
930
|
tool_timeout_sec = self._positive_timeout_seconds(resolved_runner_config.get("mcp_tool_timeout_sec"))
|
|
@@ -846,7 +947,10 @@ class CodexRunner:
|
|
|
846
947
|
runner_config: dict[str, Any] | None = None,
|
|
847
948
|
) -> Path:
|
|
848
949
|
target = ensure_dir(workspace_root / ".codex")
|
|
849
|
-
|
|
950
|
+
resolved_runner_config = runner_config if isinstance(runner_config, dict) else self._load_runner_config()
|
|
951
|
+
configured_home = str(resolved_runner_config.get("config_dir") or os.environ.get("CODEX_HOME") or str(Path.home() / ".codex"))
|
|
952
|
+
profile = str(resolved_runner_config.get("profile") or "").strip()
|
|
953
|
+
source = Path(configured_home).expanduser()
|
|
850
954
|
for filename in ("config.toml", "auth.json"):
|
|
851
955
|
source_path = source / filename
|
|
852
956
|
target_path = target / filename
|
|
@@ -854,6 +958,10 @@ class CodexRunner:
|
|
|
854
958
|
if source_path.resolve() == target_path.resolve():
|
|
855
959
|
continue
|
|
856
960
|
shutil.copy2(source_path, target_path)
|
|
961
|
+
config_path = target / "config.toml"
|
|
962
|
+
if profile and config_path.exists():
|
|
963
|
+
adapted_text, _ = adapt_profile_only_provider_config(read_text(config_path), profile=profile)
|
|
964
|
+
write_text(config_path, adapted_text)
|
|
857
965
|
ensure_dir(target / "skills")
|
|
858
966
|
quest_skills_root = quest_root / ".codex" / "skills"
|
|
859
967
|
if quest_skills_root.exists():
|
|
@@ -18,18 +18,27 @@ def _as_bool_env(name: str) -> bool:
|
|
|
18
18
|
|
|
19
19
|
|
|
20
20
|
def codex_runtime_overrides() -> dict[str, str]:
|
|
21
|
+
binary = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_BINARY") or os.environ.get("DS_CODEX_BINARY"))
|
|
21
22
|
approval_policy = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_APPROVAL_POLICY"))
|
|
22
23
|
sandbox_mode = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_SANDBOX_MODE"))
|
|
24
|
+
profile = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_PROFILE"))
|
|
25
|
+
model = _as_text(os.environ.get("DEEPSCIENTIST_CODEX_MODEL"))
|
|
23
26
|
|
|
24
27
|
if _as_bool_env("DEEPSCIENTIST_CODEX_YOLO"):
|
|
25
28
|
approval_policy = approval_policy or "never"
|
|
26
29
|
sandbox_mode = sandbox_mode or "danger-full-access"
|
|
27
30
|
|
|
28
31
|
overrides: dict[str, str] = {}
|
|
32
|
+
if binary:
|
|
33
|
+
overrides["binary"] = binary
|
|
29
34
|
if approval_policy:
|
|
30
35
|
overrides["approval_policy"] = approval_policy
|
|
31
36
|
if sandbox_mode:
|
|
32
37
|
overrides["sandbox_mode"] = sandbox_mode
|
|
38
|
+
if profile:
|
|
39
|
+
overrides["profile"] = profile
|
|
40
|
+
if model:
|
|
41
|
+
overrides["model"] = model
|
|
33
42
|
return overrides
|
|
34
43
|
|
|
35
44
|
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
from __future__ import annotations
|
|
2
2
|
|
|
3
|
+
from collections import deque
|
|
3
4
|
import hashlib
|
|
4
5
|
import json
|
|
5
6
|
import os
|
|
@@ -9,7 +10,7 @@ import subprocess
|
|
|
9
10
|
import sys
|
|
10
11
|
from datetime import UTC, datetime
|
|
11
12
|
from pathlib import Path
|
|
12
|
-
from typing import Any
|
|
13
|
+
from typing import Any, Iterator
|
|
13
14
|
from uuid import uuid4
|
|
14
15
|
|
|
15
16
|
try:
|
|
@@ -90,21 +91,39 @@ def append_jsonl(path: Path, payload: dict[str, Any]) -> None:
|
|
|
90
91
|
handle.write(json.dumps(payload, ensure_ascii=False) + "\n")
|
|
91
92
|
|
|
92
93
|
|
|
93
|
-
def
|
|
94
|
+
def iter_jsonl(path: Path | str) -> Iterator[dict[str, Any]]:
|
|
95
|
+
path = Path(path)
|
|
94
96
|
if not path.exists():
|
|
97
|
+
return
|
|
98
|
+
with path.open("r", encoding="utf-8") as handle:
|
|
99
|
+
for raw_line in handle:
|
|
100
|
+
line = raw_line.strip()
|
|
101
|
+
if not line:
|
|
102
|
+
continue
|
|
103
|
+
try:
|
|
104
|
+
payload = json.loads(line)
|
|
105
|
+
except json.JSONDecodeError:
|
|
106
|
+
continue
|
|
107
|
+
if isinstance(payload, dict):
|
|
108
|
+
yield payload
|
|
109
|
+
|
|
110
|
+
|
|
111
|
+
def read_jsonl(path: Path) -> list[dict[str, Any]]:
|
|
112
|
+
return list(iter_jsonl(path))
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
def count_jsonl(path: Path | str) -> int:
|
|
116
|
+
return sum(1 for _ in iter_jsonl(path))
|
|
117
|
+
|
|
118
|
+
|
|
119
|
+
def read_jsonl_tail(path: Path | str, limit: int) -> list[dict[str, Any]]:
|
|
120
|
+
normalized_limit = max(int(limit or 0), 0)
|
|
121
|
+
if normalized_limit <= 0:
|
|
95
122
|
return []
|
|
96
|
-
items:
|
|
97
|
-
for
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
continue
|
|
101
|
-
try:
|
|
102
|
-
payload = json.loads(line)
|
|
103
|
-
except json.JSONDecodeError:
|
|
104
|
-
continue
|
|
105
|
-
if isinstance(payload, dict):
|
|
106
|
-
items.append(payload)
|
|
107
|
-
return items
|
|
123
|
+
items: deque[dict[str, Any]] = deque(maxlen=normalized_limit)
|
|
124
|
+
for payload in iter_jsonl(path):
|
|
125
|
+
items.append(payload)
|
|
126
|
+
return list(items)
|
|
108
127
|
|
|
109
128
|
|
|
110
129
|
def read_yaml(path: Path, default: Any = None) -> Any:
|
|
@@ -10,7 +10,8 @@
|
|
|
10
10
|
- qq_summary_first_rule: start with the conclusion the user cares about, then what it means, then the next action
|
|
11
11
|
- qq_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
|
|
12
12
|
- qq_eta_rule: for baseline reproduction, main experiments, analysis experiments, and other important long-running research phases, include a rough ETA for the next meaningful result or the next update; if uncertain, say that and still give the next check-in window
|
|
13
|
-
- qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly
|
|
13
|
+
- qq_tool_call_keepalive_rule: for ordinary active work, prefer one concise QQ progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
|
|
14
|
+
- qq_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short QQ-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
|
|
14
15
|
- qq_internal_detail_rule: omit worker names, heartbeat timestamps, retry counters, pending/running/completed counts, file names, and monitor-window narration unless the user asked for them or the detail changes the recommended action
|
|
15
16
|
- qq_translation_rule: convert internal execution and file-management work into user value, such as saying the baseline record is now organized for easier later comparison instead of listing touched files
|
|
16
17
|
- qq_preflight_rule: before sending a QQ progress update, rewrite it if it still sounds like a monitoring log, execution diary, or file inventory
|
|
@@ -10,7 +10,8 @@
|
|
|
10
10
|
- weixin_summary_first_rule: start with the user-facing conclusion, then what it means, then the next action
|
|
11
11
|
- weixin_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
|
|
12
12
|
- weixin_eta_rule: for important long-running phases such as baseline reproduction, main experiments, analysis, or paper packaging, include a rough ETA or next check-in window when you can
|
|
13
|
-
- weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly
|
|
13
|
+
- weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
|
|
14
|
+
- weixin_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short Weixin-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
|
|
14
15
|
- weixin_internal_detail_rule: omit worker names, retry counters, pending/running/completed counts, low-level file listings, and monitor-window narration unless the user explicitly asked for them or they change the recommended action
|
|
15
16
|
- weixin_translation_rule: translate internal execution and file-management work into user value instead of narrating tool or filesystem churn
|
|
16
17
|
- weixin_preflight_rule: before sending a Weixin-facing progress update, rewrite it if it still reads like a monitor log, execution diary, or file inventory
|
|
@@ -7,7 +7,10 @@ This shared contract is injected once per turn and applies across the stage and
|
|
|
7
7
|
- Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
|
|
8
8
|
- If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the current stage or companion-skill task.
|
|
9
9
|
- Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
|
|
10
|
-
-
|
|
10
|
+
- Stage-kickoff rule: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work.
|
|
11
|
+
- Reading/planning keepalive rule: if you spend 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet.
|
|
12
|
+
- Subtask-boundary rule: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal.
|
|
13
|
+
- Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 6 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
11
14
|
- Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
|
|
12
15
|
- Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
|
|
13
16
|
- Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
|
package/src/prompts/system.md
CHANGED
|
@@ -53,7 +53,7 @@ Your job is to keep a research quest moving forward in a durable, auditable, evi
|
|
|
53
53
|
- for ordinary progress replies, usually stay within 2 to 4 short sentences or 3 short bullets at most
|
|
54
54
|
- start with the conclusion the user cares about, then what it means, then the next action
|
|
55
55
|
- for baseline reproduction, main experiments, analysis experiments, and similar long-running research phases, also tell the user roughly how long until the next meaningful result, next step, or next update
|
|
56
|
-
- for ordinary active multi-step work, prefer a concise update once active work has crossed about
|
|
56
|
+
- for ordinary active multi-step work, prefer a concise update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 12 tool calls or about 8 minutes of active foreground work without a user-visible update unless a real milestone is imminent
|
|
57
57
|
- do not spam internal tool chatter, raw diffs, or every small checkpoint
|
|
58
58
|
- do not proactively enumerate file paths, file inventories, or low-level file details unless the user explicitly asks
|
|
59
59
|
- do not proactively expose worker names, heartbeat timestamps, retry counters, pending/running/completed counts, or monitor-window narration unless that detail changes the recommended action or is required for honesty about risk
|
|
@@ -203,7 +203,7 @@ When you send user-facing updates (especially via `artifact.interact(...)`), wri
|
|
|
203
203
|
- what task you are currently working on
|
|
204
204
|
- what the main difficulty, risk, or latest real progress is
|
|
205
205
|
- what concrete next step or mitigation you will take
|
|
206
|
-
- for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about
|
|
206
|
+
- for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without any user-visible checkpoint
|
|
207
207
|
- for baseline reproduction, main experiments, analysis experiments, and similar long-running phases, also make the timing expectation explicit:
|
|
208
208
|
- roughly how long until the next meaningful result, next milestone, or next update, usually within a 10 to 30 minute window
|
|
209
209
|
- if runtime is uncertain, say that directly and give the next check-in window instead of pretending to know an exact ETA
|
|
@@ -463,9 +463,12 @@ Each milestone update should usually state:
|
|
|
463
463
|
Cadence defaults for ordinary active work:
|
|
464
464
|
|
|
465
465
|
- treat `artifact.interact(...)` as the default user-visible heartbeat rather than an optional extra
|
|
466
|
-
-
|
|
467
|
-
-
|
|
468
|
-
-
|
|
466
|
+
- stage-kickoff trigger: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work
|
|
467
|
+
- reading/planning trigger: if you spend about 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet
|
|
468
|
+
- boundary trigger: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal
|
|
469
|
+
- soft trigger: after about 6 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
|
|
470
|
+
- hard trigger: do not exceed about 12 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
|
|
471
|
+
- time trigger: do not exceed about 8 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
|
|
469
472
|
- immediate trigger: send a user-visible update as soon as a real blocker, recovery, route change, branch/worktree switch, baseline gate change, selected idea, recorded main experiment, or user-priority interruption becomes clear
|
|
470
473
|
- de-duplication rule: do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a real milestone, blocker, route change, or new user message makes that extra update genuinely useful
|
|
471
474
|
- keep ordinary subtask completions short; reserve richer milestone reports for stage-significant deliverables and route-changing checkpoints instead of narrating every small setup step
|
|
@@ -1080,9 +1083,10 @@ For `artifact.interact(...)` specifically:
|
|
|
1080
1083
|
- raw logs
|
|
1081
1084
|
- internal tool names
|
|
1082
1085
|
- mention those details only if the user asked for them or needs them to act on the message
|
|
1083
|
-
- during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about
|
|
1086
|
+
- during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without a user-visible update
|
|
1084
1087
|
- during long active execution, after the first meaningful signal from long-running work, keep the user informed and never let active user-relevant work go more than 30 minutes without a real progress inspection and, if still running, a user-visible keepalive
|
|
1085
|
-
-
|
|
1088
|
+
- if the active work is still mostly reading, comparison, synthesis, or planning, do not hide behind "no result yet"; send a short user-visible checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
|
|
1089
|
+
- do not send another ordinary progress update within about 2 additional tool calls or about 60 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
|
|
1086
1090
|
- each ordinary progress update should usually answer only:
|
|
1087
1091
|
- what changed
|
|
1088
1092
|
- what it means now
|
|
@@ -1321,7 +1325,7 @@ If the field is absent, default to `freeform`.
|
|
|
1321
1325
|
When `launch_mode = custom`:
|
|
1322
1326
|
|
|
1323
1327
|
- do not force the quest back into the canonical full-research path if the custom brief is narrower
|
|
1324
|
-
- treat `entry_state_summary`, `review_summary`, and `custom_brief` as real startup context rather than decorative metadata
|
|
1328
|
+
- treat `entry_state_summary`, `review_summary`, `review_materials`, and `custom_brief` as real startup context rather than decorative metadata
|
|
1325
1329
|
- if the quest clearly starts from existing baseline / result / draft state, open `intake-audit` before restarting baseline discovery or fresh experimentation
|
|
1326
1330
|
- if the quest clearly starts from reviewer comments, a revision request, or a rebuttal packet, open `rebuttal` before ordinary `write`
|
|
1327
1331
|
- after the custom entry skill stabilizes the route, continue through the normal stage skills as needed
|
|
@@ -1331,12 +1335,58 @@ When `custom_profile = continue_existing_state`:
|
|
|
1331
1335
|
- assume the quest may already contain reusable baselines, measured results, analysis assets, or writing assets
|
|
1332
1336
|
- audit and trust-rank those assets first instead of reflexively rerunning everything
|
|
1333
1337
|
|
|
1338
|
+
When `custom_profile = review_audit`:
|
|
1339
|
+
|
|
1340
|
+
- assume the active contract is a substantial draft or paper package that needs an independent skeptical audit
|
|
1341
|
+
- open `review` before more writing or finalization
|
|
1342
|
+
- if the audit finds real gaps, route to the needed downstream skill instead of polishing blindly
|
|
1343
|
+
|
|
1344
|
+
When `startup_contract.review_followup_policy = auto_execute_followups`:
|
|
1345
|
+
|
|
1346
|
+
- after review artifacts are durable, continue automatically into the required experiments, manuscript deltas, and review-closure work
|
|
1347
|
+
- do not stop at the audit report if the route is already clear
|
|
1348
|
+
|
|
1349
|
+
When `startup_contract.review_followup_policy = user_gated_followups`:
|
|
1350
|
+
|
|
1351
|
+
- finish the review artifacts first
|
|
1352
|
+
- then raise one structured decision before expensive experiments or manuscript revisions continue
|
|
1353
|
+
|
|
1354
|
+
When `startup_contract.review_followup_policy = audit_only`:
|
|
1355
|
+
|
|
1356
|
+
- stop after the durable audit artifacts and route recommendation unless the user later asks for execution follow-up
|
|
1357
|
+
|
|
1334
1358
|
When `custom_profile = revision_rebuttal`:
|
|
1335
1359
|
|
|
1336
1360
|
- assume the active contract is a paper-review workflow rather than a blank research loop
|
|
1337
1361
|
- preserve the existing paper, results, and reviewer package as the starting state
|
|
1338
1362
|
- route supplementary experiments through `analysis-campaign` and manuscript deltas through `write`, but let `rebuttal` orchestrate that mapping
|
|
1339
1363
|
|
|
1364
|
+
When `startup_contract.baseline_execution_policy = must_reproduce_or_verify`:
|
|
1365
|
+
|
|
1366
|
+
- explicitly verify or recover the rebuttal-critical baseline or comparator before reviewer-linked follow-up work
|
|
1367
|
+
|
|
1368
|
+
When `startup_contract.baseline_execution_policy = reuse_existing_only`:
|
|
1369
|
+
|
|
1370
|
+
- trust the current confirmed baseline/results unless you find concrete inconsistency, corruption, or missing-evidence problems
|
|
1371
|
+
|
|
1372
|
+
When `startup_contract.baseline_execution_policy = skip_unless_blocking`:
|
|
1373
|
+
|
|
1374
|
+
- do not spend time rerunning baselines by default
|
|
1375
|
+
- only open `baseline` if a named review/rebuttal issue truly depends on a missing comparator or unusable prior evidence
|
|
1376
|
+
|
|
1377
|
+
When `startup_contract.manuscript_edit_mode = latex_required`:
|
|
1378
|
+
|
|
1379
|
+
- if manuscript revision is required, treat the provided LaTeX tree or `paper/latex/` as the writing surface
|
|
1380
|
+
- if LaTeX source is unavailable, do not pretend the manuscript was edited; produce LaTeX-ready replacement text and state the blocker explicitly
|
|
1381
|
+
|
|
1382
|
+
When `startup_contract.manuscript_edit_mode = copy_ready_text`:
|
|
1383
|
+
|
|
1384
|
+
- provide section-level copy-ready replacement text and explicit deltas when manuscript revision is required
|
|
1385
|
+
|
|
1386
|
+
When `startup_contract.manuscript_edit_mode = none`:
|
|
1387
|
+
|
|
1388
|
+
- revision planning artifacts are sufficient unless the user later broadens scope
|
|
1389
|
+
|
|
1340
1390
|
When `custom_profile = freeform`:
|
|
1341
1391
|
|
|
1342
1392
|
- treat the custom brief as the primary scope contract
|
|
@@ -2078,7 +2128,7 @@ When summarizing long logs, campaigns, or multi-agent work:
|
|
|
2078
2128
|
- the estimated next reply time (usually the next sleep interval you are about to use)
|
|
2079
2129
|
- If the run still looks healthy but there is no human-meaningful delta yet, continue monitoring silently instead of sending a no-change keepalive just because a sleep finished.
|
|
2080
2130
|
- For baseline reproduction, main experiments, analysis experiments, and similar user-relevant long runs, translate that monitoring ETA into user-facing language such as how long until the next meaningful result or the next expected update.
|
|
2081
|
-
- Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about
|
|
2131
|
+
- Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 12 tool calls or about 8 minutes without a user-visible checkpoint.
|
|
2082
2132
|
- If you forget a bash id, do not guess. Use `bash_exec(mode='history')` or `bash_exec(mode='list')` and recover it from the reverse-chronological session list.
|
|
2083
2133
|
- If the long-running command or wrapper code can emit structured progress markers, prefer a concise `__DS_PROGRESS__ { ... }` JSON line with fields such as:
|
|
2084
2134
|
- `current`
|