@researai/deepscientist 1.5.3 → 1.5.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -1
- package/bin/ds.js +27 -187
- package/docs/en/00_QUICK_START.md +1 -1
- package/docs/en/01_SETTINGS_REFERENCE.md +13 -4
- package/docs/en/99_ACKNOWLEDGEMENTS.md +1 -0
- package/docs/images/connectors/discord-setup-overview.svg +52 -0
- package/docs/images/connectors/feishu-setup-overview.svg +53 -0
- package/docs/images/connectors/slack-setup-overview.svg +51 -0
- package/docs/images/connectors/telegram-setup-overview.svg +55 -0
- package/docs/images/connectors/whatsapp-setup-overview.svg +51 -0
- package/docs/images/lingzhu/lingzhu-openclaw-config.svg +17 -0
- package/docs/images/lingzhu/lingzhu-platform-values.svg +16 -0
- package/docs/images/lingzhu/lingzhu-settings-overview.svg +30 -0
- package/docs/images/qq/tencent-cloud-qq-chat.png +0 -0
- package/docs/images/qq/tencent-cloud-qq-register.png +0 -0
- package/docs/images/quickstart/00-home.png +0 -0
- package/docs/images/quickstart/01-start-research.png +0 -0
- package/docs/images/quickstart/02-list-quest.png +0 -0
- package/docs/zh/00_QUICK_START.md +1 -1
- package/docs/zh/01_SETTINGS_REFERENCE.md +14 -5
- package/docs/zh/99_ACKNOWLEDGEMENTS.md +1 -0
- package/package.json +8 -4
- package/pyproject.toml +1 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/bridges/base.py +2 -1
- package/src/deepscientist/bridges/connectors.py +2 -1
- package/src/deepscientist/channels/discord_gateway.py +2 -2
- package/src/deepscientist/channels/qq_gateway.py +2 -2
- package/src/deepscientist/channels/slack_socket.py +2 -2
- package/src/deepscientist/channels/telegram_polling.py +2 -1
- package/src/deepscientist/cli.py +4 -1
- package/src/deepscientist/config/models.py +7 -3
- package/src/deepscientist/config/service.py +52 -2
- package/src/deepscientist/daemon/api/handlers.py +6 -6
- package/src/deepscientist/daemon/app.py +65 -10
- package/src/deepscientist/network.py +78 -0
- package/src/deepscientist/prompts/builder.py +22 -3
- package/src/deepscientist/quest/service.py +1 -1
- package/src/deepscientist/skills/installer.py +77 -1
- package/src/prompts/system.md +78 -2
- package/src/skills/analysis-campaign/SKILL.md +8 -0
- package/src/skills/idea/SKILL.md +245 -3
- package/src/skills/write/SKILL.md +151 -1
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/{AiManusChatView-qzChi9uh.js → AiManusChatView-BGLArZRn.js} +11 -11
- package/src/ui/dist/assets/{AnalysisPlugin-CcC_-UqN.js → AnalysisPlugin-BgDGSigG.js} +1 -1
- package/src/ui/dist/assets/{AutoFigurePlugin-DD8LkJLe.js → AutoFigurePlugin-B65HD7L4.js} +5 -5
- package/src/ui/dist/assets/{CliPlugin-DJJFfVmW.js → CliPlugin-CUqgsFHC.js} +9 -9
- package/src/ui/dist/assets/{CodeEditorPlugin-CrjkHNLh.js → CodeEditorPlugin-CF5EdvaS.js} +8 -8
- package/src/ui/dist/assets/{CodeViewerPlugin-obnD6G5R.js → CodeViewerPlugin-DEeU063D.js} +5 -5
- package/src/ui/dist/assets/{DocViewerPlugin-DB9SUQVd.js → DocViewerPlugin-Df-FuDlZ.js} +3 -3
- package/src/ui/dist/assets/{GitDiffViewerPlugin-DZLlNlD2.js → GitDiffViewerPlugin-RAnNaRxM.js} +1 -1
- package/src/ui/dist/assets/{ImageViewerPlugin-BGwfDZ0Y.js → ImageViewerPlugin-DXJ0ZJGg.js} +5 -5
- package/src/ui/dist/assets/{LabCopilotPanel-dfLptQcR.js → LabCopilotPanel-BlO-sKsj.js} +10 -10
- package/src/ui/dist/assets/{LabPlugin-CeGjAl3A.js → LabPlugin-BajPZW5v.js} +1 -1
- package/src/ui/dist/assets/{LatexPlugin-BBJ7kd1V.js → LatexPlugin-F1OEol8D.js} +7 -7
- package/src/ui/dist/assets/{MarkdownViewerPlugin-DKZi7BcB.js → MarkdownViewerPlugin-MhUupqwT.js} +4 -4
- package/src/ui/dist/assets/{MarketplacePlugin-C_k-9jD0.js → MarketplacePlugin-DxhIEsv0.js} +3 -3
- package/src/ui/dist/assets/{NotebookEditor-4R88_BMO.js → NotebookEditor-q7TkhewC.js} +1 -1
- package/src/ui/dist/assets/{PdfLoader-DwEFQLrw.js → PdfLoader-B8ZOTKFc.js} +1 -1
- package/src/ui/dist/assets/{PdfMarkdownPlugin-D-jdsqF8.js → PdfMarkdownPlugin-xFPvzvWh.js} +3 -3
- package/src/ui/dist/assets/{PdfViewerPlugin-CmeBGDY0.js → PdfViewerPlugin-EjEcsIB8.js} +10 -10
- package/src/ui/dist/assets/{SearchPlugin-Dlz2WKJ4.js → SearchPlugin-ixY-1lgW.js} +1 -1
- package/src/ui/dist/assets/{Stepper-ClOgzWM3.js → Stepper-gYFK2Pgz.js} +1 -1
- package/src/ui/dist/assets/{TextViewerPlugin-DDQWxibk.js → TextViewerPlugin-Cym6pv_n.js} +4 -4
- package/src/ui/dist/assets/{VNCViewer-CJXT0Nm8.js → VNCViewer-BPmIHcmK.js} +9 -9
- package/src/ui/dist/assets/{bibtex-DLr4Rtk4.js → bibtex-Btv6Wi7f.js} +1 -1
- package/src/ui/dist/assets/{code-DgKK408Y.js → code-BlG7g85c.js} +1 -1
- package/src/ui/dist/assets/{file-content-6HBqQnvQ.js → file-content-DBT5OfTZ.js} +1 -1
- package/src/ui/dist/assets/{file-diff-panel-Dhu0TbBM.js → file-diff-panel-BWXYzqHk.js} +1 -1
- package/src/ui/dist/assets/{file-socket-CP3iwVZG.js → file-socket-wDlx6byM.js} +1 -1
- package/src/ui/dist/assets/{file-utils-BsS-Aw68.js → file-utils-Ba3nJmH0.js} +1 -1
- package/src/ui/dist/assets/{image-ByeK-Zcv.js → image-BwtCyguk.js} +1 -1
- package/src/ui/dist/assets/{index-BdsE0uRz.js → index-B-2scqCJ.js} +11 -11
- package/src/ui/dist/assets/{index-BLjo5--a.js → index-Bz5AaWL7.js} +50793 -50661
- package/src/ui/dist/assets/{index-DyremSIv.js → index-CfRpE209.js} +2 -2
- package/src/ui/dist/assets/{index-C-eX-N6A.js → index-DcqvKzeJ.js} +1 -1
- package/src/ui/dist/assets/{index-CuQhlrR-.css → index-DpMZw8aM.css} +2 -2
- package/src/ui/dist/assets/{message-square-DnagiLnc.js → message-square-BnlyWVH0.js} +1 -1
- package/src/ui/dist/assets/{monaco-4kBFeprs.js → monaco-CXe0pAVe.js} +1 -1
- package/src/ui/dist/assets/{popover-hRCXZzs2.js → popover-BCHmVhHj.js} +1 -1
- package/src/ui/dist/assets/{project-sync-O_85YuP6.js → project-sync-Brk6kaOD.js} +1 -1
- package/src/ui/dist/assets/{sigma-DvKopSnL.js → sigma-D72eSUep.js} +1 -1
- package/src/ui/dist/assets/{tooltip-BmlPc6kc.js → tooltip-BMWd0dqX.js} +1 -1
- package/src/ui/dist/assets/{trash-n-UvdZFR.js → trash-BIt_eWIS.js} +1 -1
- package/src/ui/dist/assets/{useCliAccess-WDd3_wIh.js → useCliAccess-N1hkTRrR.js} +1 -1
- package/src/ui/dist/assets/{useFileDiffOverlay-rXLIL2NF.js → useFileDiffOverlay-DPRPv6rv.js} +1 -1
- package/src/ui/dist/assets/{wrap-text-qIYQ4a_W.js → wrap-text-E5-UheyP.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-fZXCEFsy.js → zoom-out-D4TR-ZZ_.js} +1 -1
- package/src/ui/dist/index.html +2 -2
|
@@ -8,12 +8,14 @@ import shutil
|
|
|
8
8
|
import subprocess
|
|
9
9
|
import threading
|
|
10
10
|
import time
|
|
11
|
+
from datetime import UTC, datetime, timedelta
|
|
11
12
|
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
|
|
12
13
|
from pathlib import Path
|
|
13
14
|
from typing import Any
|
|
14
15
|
from urllib.parse import parse_qs, urlencode, urlparse
|
|
15
|
-
from urllib.request import Request
|
|
16
|
+
from urllib.request import Request
|
|
16
17
|
|
|
18
|
+
from .. import __version__
|
|
17
19
|
from ..artifact import ArtifactService
|
|
18
20
|
from ..bash_exec import BashExecService
|
|
19
21
|
from ..bash_exec.runtime import TerminalClient
|
|
@@ -32,6 +34,7 @@ from ..connector_runtime import conversation_identity_key, format_conversation_i
|
|
|
32
34
|
from ..config import ConfigManager
|
|
33
35
|
from ..home import repo_root
|
|
34
36
|
from ..memory import MemoryService
|
|
37
|
+
from ..network import urlopen_with_proxy as urlopen
|
|
35
38
|
from ..latex_runtime import QuestLatexService
|
|
36
39
|
from ..prompts import PromptBuilder
|
|
37
40
|
from ..prompts.builder import STANDARD_SKILLS
|
|
@@ -52,6 +55,13 @@ from websockets.sync.server import Server as WebSocketServer
|
|
|
52
55
|
from websockets.sync.server import ServerConnection, serve as websocket_serve
|
|
53
56
|
|
|
54
57
|
TERMINAL_STREAM_IDLE_SLEEP_SECONDS = 0.02
|
|
58
|
+
CODEX_RETRY_DEFAULT_MAX_ATTEMPTS = 5
|
|
59
|
+
CODEX_RETRY_DEFAULT_INITIAL_BACKOFF_SEC = 10.0
|
|
60
|
+
CODEX_RETRY_DEFAULT_BACKOFF_MULTIPLIER = 6.0
|
|
61
|
+
CODEX_RETRY_DEFAULT_MAX_BACKOFF_SEC = 1800.0
|
|
62
|
+
LEGACY_CODEX_RETRY_INITIAL_BACKOFF_SEC = 1.0
|
|
63
|
+
LEGACY_CODEX_RETRY_BACKOFF_MULTIPLIER = 2.0
|
|
64
|
+
LEGACY_CODEX_RETRY_MAX_BACKOFF_SEC = 8.0
|
|
55
65
|
|
|
56
66
|
|
|
57
67
|
class DaemonApp:
|
|
@@ -74,6 +84,12 @@ class DaemonApp:
|
|
|
74
84
|
self.team_service = SingleTeamService(home)
|
|
75
85
|
self.cloud_service = CloudLinkService(home)
|
|
76
86
|
config = self.config_manager.load_named("config")
|
|
87
|
+
skill_config = config.get("skills") if isinstance(config.get("skills"), dict) else {}
|
|
88
|
+
self.skill_sync_summary = self.skill_installer.ensure_release_sync(
|
|
89
|
+
installed_version=__version__,
|
|
90
|
+
sync_global_enabled=bool(skill_config.get("sync_global_on_init", True)),
|
|
91
|
+
sync_existing_quests_enabled=bool(skill_config.get("sync_quest_on_open", True)),
|
|
92
|
+
)
|
|
77
93
|
self.logger = JsonlLogger(home / "logs", level=config.get("logging", {}).get("level", "info"))
|
|
78
94
|
self.reconciled_quests = self.quest_service.reconcile_runtime_state()
|
|
79
95
|
for item in self.reconciled_quests:
|
|
@@ -633,7 +649,7 @@ class DaemonApp:
|
|
|
633
649
|
|
|
634
650
|
def _preferred_locale(self) -> str:
|
|
635
651
|
config = self.config_manager.load_named("config")
|
|
636
|
-
return str(config.get("default_locale") or "
|
|
652
|
+
return str(config.get("default_locale") or "en-US").lower()
|
|
637
653
|
|
|
638
654
|
def _polite_copy(self, *, zh: str, en: str) -> str:
|
|
639
655
|
return zh if self._preferred_locale().startswith("zh") else en
|
|
@@ -1304,7 +1320,7 @@ class DaemonApp:
|
|
|
1304
1320
|
message_id=claimed_message_id,
|
|
1305
1321
|
run_id=run_id,
|
|
1306
1322
|
)
|
|
1307
|
-
retry_policy = self._runner_retry_policy(runner_cfg if isinstance(runner_cfg, dict) else {})
|
|
1323
|
+
retry_policy = self._runner_retry_policy(runner_name, runner_cfg if isinstance(runner_cfg, dict) else {})
|
|
1308
1324
|
max_attempts = int(retry_policy.get("max_attempts") or 1)
|
|
1309
1325
|
turn_id = generate_id("turn")
|
|
1310
1326
|
retry_context: dict[str, Any] | None = None
|
|
@@ -1380,7 +1396,7 @@ class DaemonApp:
|
|
|
1380
1396
|
)
|
|
1381
1397
|
if bool(retry_policy.get("enabled")) and attempt_index < max_attempts:
|
|
1382
1398
|
delay_seconds = self._retry_delay_seconds(retry_policy, attempt_index=attempt_index + 1)
|
|
1383
|
-
next_retry_at =
|
|
1399
|
+
next_retry_at = self._retry_next_timestamp(delay_seconds)
|
|
1384
1400
|
self.quest_service.update_runtime_state(
|
|
1385
1401
|
quest_root=quest_root,
|
|
1386
1402
|
status="running",
|
|
@@ -1505,6 +1521,7 @@ class DaemonApp:
|
|
|
1505
1521
|
)
|
|
1506
1522
|
if bool(retry_policy.get("enabled")) and attempt_index < max_attempts:
|
|
1507
1523
|
delay_seconds = self._retry_delay_seconds(retry_policy, attempt_index=attempt_index + 1)
|
|
1524
|
+
next_retry_at = self._retry_next_timestamp(delay_seconds)
|
|
1508
1525
|
self.quest_service.update_runtime_state(
|
|
1509
1526
|
quest_root=quest_root,
|
|
1510
1527
|
status="running",
|
|
@@ -1516,7 +1533,7 @@ class DaemonApp:
|
|
|
1516
1533
|
"max_attempts": max_attempts,
|
|
1517
1534
|
"last_run_id": result.run_id,
|
|
1518
1535
|
"last_error": failure_summary,
|
|
1519
|
-
"next_retry_at":
|
|
1536
|
+
"next_retry_at": next_retry_at,
|
|
1520
1537
|
},
|
|
1521
1538
|
)
|
|
1522
1539
|
self._append_retry_event(
|
|
@@ -1658,12 +1675,43 @@ class DaemonApp:
|
|
|
1658
1675
|
return default
|
|
1659
1676
|
return resolved if resolved >= 0 else default
|
|
1660
1677
|
|
|
1661
|
-
|
|
1678
|
+
@staticmethod
|
|
1679
|
+
def _float_matches(left: float, right: float) -> bool:
|
|
1680
|
+
return abs(left - right) < 1e-9
|
|
1681
|
+
|
|
1682
|
+
def _runner_retry_policy(self, runner_name: str, runner_cfg: dict[str, Any]) -> dict[str, Any]:
|
|
1662
1683
|
enabled = bool(runner_cfg.get("retry_on_failure", True))
|
|
1663
|
-
max_attempts = min(
|
|
1664
|
-
|
|
1665
|
-
|
|
1666
|
-
|
|
1684
|
+
max_attempts = min(
|
|
1685
|
+
CODEX_RETRY_DEFAULT_MAX_ATTEMPTS,
|
|
1686
|
+
self._coerce_positive_int(runner_cfg.get("retry_max_attempts", CODEX_RETRY_DEFAULT_MAX_ATTEMPTS), CODEX_RETRY_DEFAULT_MAX_ATTEMPTS),
|
|
1687
|
+
)
|
|
1688
|
+
initial_backoff = self._coerce_nonnegative_float(
|
|
1689
|
+
runner_cfg.get("retry_initial_backoff_sec", CODEX_RETRY_DEFAULT_INITIAL_BACKOFF_SEC),
|
|
1690
|
+
CODEX_RETRY_DEFAULT_INITIAL_BACKOFF_SEC,
|
|
1691
|
+
)
|
|
1692
|
+
multiplier = max(
|
|
1693
|
+
1.0,
|
|
1694
|
+
self._coerce_nonnegative_float(
|
|
1695
|
+
runner_cfg.get("retry_backoff_multiplier", CODEX_RETRY_DEFAULT_BACKOFF_MULTIPLIER),
|
|
1696
|
+
CODEX_RETRY_DEFAULT_BACKOFF_MULTIPLIER,
|
|
1697
|
+
),
|
|
1698
|
+
)
|
|
1699
|
+
max_backoff = max(
|
|
1700
|
+
initial_backoff,
|
|
1701
|
+
self._coerce_nonnegative_float(
|
|
1702
|
+
runner_cfg.get("retry_max_backoff_sec", CODEX_RETRY_DEFAULT_MAX_BACKOFF_SEC),
|
|
1703
|
+
CODEX_RETRY_DEFAULT_MAX_BACKOFF_SEC,
|
|
1704
|
+
),
|
|
1705
|
+
)
|
|
1706
|
+
if (
|
|
1707
|
+
runner_name == "codex"
|
|
1708
|
+
and self._float_matches(initial_backoff, LEGACY_CODEX_RETRY_INITIAL_BACKOFF_SEC)
|
|
1709
|
+
and self._float_matches(multiplier, LEGACY_CODEX_RETRY_BACKOFF_MULTIPLIER)
|
|
1710
|
+
and self._float_matches(max_backoff, LEGACY_CODEX_RETRY_MAX_BACKOFF_SEC)
|
|
1711
|
+
):
|
|
1712
|
+
initial_backoff = CODEX_RETRY_DEFAULT_INITIAL_BACKOFF_SEC
|
|
1713
|
+
multiplier = CODEX_RETRY_DEFAULT_BACKOFF_MULTIPLIER
|
|
1714
|
+
max_backoff = CODEX_RETRY_DEFAULT_MAX_BACKOFF_SEC
|
|
1667
1715
|
return {
|
|
1668
1716
|
"enabled": enabled,
|
|
1669
1717
|
"max_attempts": max_attempts,
|
|
@@ -1689,6 +1737,12 @@ class DaemonApp:
|
|
|
1689
1737
|
delay = initial_backoff * pow(multiplier, max(0, attempt_index - 2))
|
|
1690
1738
|
return max(0.0, min(delay, max_backoff))
|
|
1691
1739
|
|
|
1740
|
+
@staticmethod
|
|
1741
|
+
def _retry_next_timestamp(delay_seconds: float) -> str:
|
|
1742
|
+
if delay_seconds <= 0:
|
|
1743
|
+
return utc_now()
|
|
1744
|
+
return (datetime.now(UTC) + timedelta(seconds=delay_seconds)).replace(microsecond=0).isoformat()
|
|
1745
|
+
|
|
1692
1746
|
def _wait_for_retry_delay(self, quest_id: str, delay_seconds: float) -> bool:
|
|
1693
1747
|
if delay_seconds <= 0:
|
|
1694
1748
|
return not self._turn_stop_requested(quest_id)
|
|
@@ -4212,6 +4266,7 @@ class DaemonApp:
|
|
|
4212
4266
|
"document_asset",
|
|
4213
4267
|
"terminal_restore",
|
|
4214
4268
|
"terminal_history",
|
|
4269
|
+
"latex_builds",
|
|
4215
4270
|
}:
|
|
4216
4271
|
payload = result(**params, path=self.path)
|
|
4217
4272
|
elif method == "GET":
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
from __future__ import annotations
|
|
2
|
+
|
|
3
|
+
import os
|
|
4
|
+
from urllib.parse import urlparse
|
|
5
|
+
from urllib.request import ProxyHandler, Request, build_opener, urlopen as stdlib_urlopen
|
|
6
|
+
|
|
7
|
+
from websockets.sync.client import connect as stdlib_websocket_connect
|
|
8
|
+
|
|
9
|
+
_RUNTIME_PROXY_URL: str | None = None
|
|
10
|
+
_NO_PROXY_OPENER = build_opener(ProxyHandler({}))
|
|
11
|
+
_PROXY_OPENERS: dict[str, object] = {}
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
def normalize_proxy_url(value: str | None) -> str | None:
|
|
15
|
+
text = str(value or "").strip()
|
|
16
|
+
return text or None
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
def configure_runtime_proxy(proxy_url: str | None) -> str | None:
|
|
20
|
+
normalized = normalize_proxy_url(proxy_url)
|
|
21
|
+
global _RUNTIME_PROXY_URL
|
|
22
|
+
previous = _RUNTIME_PROXY_URL
|
|
23
|
+
_RUNTIME_PROXY_URL = normalized
|
|
24
|
+
if normalized is None:
|
|
25
|
+
if previous is not None:
|
|
26
|
+
for key in ("HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY", "http_proxy", "https_proxy", "all_proxy"):
|
|
27
|
+
if os.environ.get(key) == previous:
|
|
28
|
+
os.environ.pop(key, None)
|
|
29
|
+
return None
|
|
30
|
+
for key in ("HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY", "http_proxy", "https_proxy", "all_proxy"):
|
|
31
|
+
os.environ[key] = normalized
|
|
32
|
+
# Keep local daemon traffic and loopback websocket attaches off the proxy path.
|
|
33
|
+
for key in ("NO_PROXY", "no_proxy"):
|
|
34
|
+
current = str(os.environ.get(key) or "").strip()
|
|
35
|
+
values = [item.strip() for item in current.split(",") if item.strip()]
|
|
36
|
+
for host in ("127.0.0.1", "localhost", "::1", "0.0.0.0"):
|
|
37
|
+
if host not in values:
|
|
38
|
+
values.append(host)
|
|
39
|
+
os.environ[key] = ",".join(values)
|
|
40
|
+
return normalized
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
def runtime_proxy_url() -> str | None:
|
|
44
|
+
return _RUNTIME_PROXY_URL
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def should_bypass_proxy(url: str) -> bool:
|
|
48
|
+
parsed = urlparse(str(url or "").strip())
|
|
49
|
+
host = (parsed.hostname or "").strip().lower()
|
|
50
|
+
return host in {"", "127.0.0.1", "localhost", "::1", "0.0.0.0"}
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def _proxy_opener(proxy_url: str):
|
|
54
|
+
opener = _PROXY_OPENERS.get(proxy_url)
|
|
55
|
+
if opener is None:
|
|
56
|
+
opener = build_opener(ProxyHandler({"http": proxy_url, "https": proxy_url}))
|
|
57
|
+
_PROXY_OPENERS[proxy_url] = opener
|
|
58
|
+
return opener
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def urlopen_with_proxy(request: Request | str, timeout: float | None = None):
|
|
62
|
+
url = request.full_url if isinstance(request, Request) else str(request)
|
|
63
|
+
if should_bypass_proxy(url):
|
|
64
|
+
return _NO_PROXY_OPENER.open(request, timeout=timeout)
|
|
65
|
+
proxy_url = runtime_proxy_url()
|
|
66
|
+
if proxy_url:
|
|
67
|
+
return _proxy_opener(proxy_url).open(request, timeout=timeout)
|
|
68
|
+
return stdlib_urlopen(request, timeout=timeout)
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
def websocket_connect_with_proxy(uri: str, /, **kwargs):
|
|
72
|
+
if should_bypass_proxy(uri):
|
|
73
|
+
kwargs.setdefault("proxy", None)
|
|
74
|
+
else:
|
|
75
|
+
proxy_url = runtime_proxy_url()
|
|
76
|
+
if proxy_url:
|
|
77
|
+
kwargs.setdefault("proxy", proxy_url)
|
|
78
|
+
return stdlib_websocket_connect(uri, **kwargs)
|
|
@@ -90,7 +90,7 @@ class PromptBuilder:
|
|
|
90
90
|
connectors_config = self.config_manager.load_named_normalized("connectors")
|
|
91
91
|
quest_root = Path(snapshot["quest_root"])
|
|
92
92
|
active_anchor = str(snapshot.get("active_anchor") or skill_id)
|
|
93
|
-
default_locale = str(runtime_config.get("default_locale") or "
|
|
93
|
+
default_locale = str(runtime_config.get("default_locale") or "en-US")
|
|
94
94
|
system_block = self._prompt_fragment("src/prompts/system.md")
|
|
95
95
|
connector_contract_block = self._connector_contract_block(quest_id=quest_id, snapshot=snapshot)
|
|
96
96
|
sections = [
|
|
@@ -748,8 +748,12 @@ class PromptBuilder:
|
|
|
748
748
|
"- acknowledgment_protocol: after artifact.interact returns any human message, immediately call artifact.interact(...) again to confirm receipt; if answerable, answer directly, otherwise state the short plan, nearest checkpoint, and that the current background subtask is paused",
|
|
749
749
|
"- progress_protocol: emit artifact.interact(kind='progress', reply_mode='threaded', ...) at real human-meaningful checkpoints; if no natural checkpoint appears during active user-relevant work, send a concise keepalive before you drift beyond roughly 10 to 30 tool calls without a user-visible update",
|
|
750
750
|
"- smoke_then_detach_protocol: for baseline reproduction, main experiments, and analysis experiments, first validate the command path with a bounded smoke test; once the smoke test passes, launch the real long run with bash_exec(mode='detach', ...) and usually leave timeout_seconds unset rather than guessing a fake deadline",
|
|
751
|
-
"-
|
|
751
|
+
"- progress_first_monitoring_protocol: when supervising a long-running bash_exec session, judge health by forward progress rather than by whether the final artifact has already appeared within a short window",
|
|
752
|
+
"- delta_monitoring_protocol: compare deltas such as new sample counters, new task counters, new saved files, new last_output_seq values, or changed last_progress payloads; if any of these move forward, treat the run as alive and keep observing",
|
|
753
|
+
"- long_run_reporting_protocol: for long-running bash_exec monitoring loops, inspect real logs or status after each completed sleep/await cycle and at least once every 30 minutes at worst, but only send a user-visible update when there is a human-meaningful delta or when the 30-minute visibility bound would otherwise be exceeded",
|
|
752
754
|
"- long_run_watchdog_protocol: for baseline reproduction, baseline-running stages, main experiments, and other important detached runs, do not let more than 30 minutes pass without a real progress inspection and, if the run is still active, a user-visible artifact.interact progress update",
|
|
755
|
+
"- intervention_threshold_protocol: do not kill or restart a run merely because a short watch window passed without final completion; intervene only on explicit failure, clear invalidity, process exit, or no meaningful delta across a sufficiently long observation window",
|
|
756
|
+
"- slow_model_patience_protocol: if the user says the model, endpoint, or workload is expected to be slow, widen the observation window before intervention and avoid repeated no-change updates",
|
|
753
757
|
"- tail_monitoring_protocol: when monitoring a detached run, prefer bash_exec(mode='read', id=..., tail_limit=..., order='desc') so you inspect the newest evidence first instead of re-reading full logs every time",
|
|
754
758
|
"- managed_recovery_protocol: if a detached baseline, main-experiment, or analysis run is clearly invalid, wedged, or superseded, stop it with bash_exec(mode='kill', id=...), document the reason, fix the issue, and relaunch cleanly instead of letting a bad run linger",
|
|
755
759
|
"- timeout_protocol: before using bash_exec(mode='await', ...), estimate whether the command can finish within the selected wait window; if runtime is uncertain or likely longer, use bash_exec(mode='detach', ...) and monitor, or set timeout_seconds intentionally",
|
|
@@ -764,11 +768,26 @@ class PromptBuilder:
|
|
|
764
768
|
"- tool_call_keepalive_protocol: for active multi-step work outside long detached experiment waits, if you have spent roughly 10 to 30 tool calls without a user-visible checkpoint, send one concise artifact.interact progress update before continuing",
|
|
765
769
|
"- human_progress_shape_protocol: ordinary progress updates should usually make three things explicit in human language: the current task, the main difficulty or latest real progress, and the concrete next measure you will take",
|
|
766
770
|
"- eta_visibility_protocol: for baseline reproduction, main experiments, analysis experiments, and other important long-running phases, progress updates should also make the expected time to the next meaningful result, next milestone, or next user-visible update explicit; use roughly 10 to 30 minutes as the normal update window, and if the ETA is unreliable, say that and give a realistic next check-in window instead",
|
|
771
|
+
"- idea_milestone_protocol: immediately after a successful accepted artifact.submit_idea(...), send a threaded milestone that explains the idea in plain language and explicitly states whether it currently looks valid, research-worthy, and insight-bearing, plus the main risk and exact next experiment",
|
|
772
|
+
"- idea_divergence_protocol: in the idea stage, separate divergence from convergence; unless strong durable evidence already narrows the route to one obvious serious option, do not collapse onto the first plausible route before generating a small but meaningfully diverse candidate slate",
|
|
773
|
+
"- idea_lens_protocol: when idea candidates cluster around one mechanism family, deliberately switch ideation lenses such as problem-first vs solution-first, tension hunting, analogy transfer, inversion, or adjacent-possible reasoning before final selection",
|
|
774
|
+
"- idea_frontier_protocol: a temporary raw ideation slate may be larger, but after convergence the serious frontier should usually shrink back to 2 to 3 candidates and at most 5",
|
|
775
|
+
"- idea_why_now_protocol: every serious idea candidate should answer why now or what changed, not just what the mechanism is",
|
|
776
|
+
"- idea_balance_protocol: when the search space is not tiny, carry at least one conservative route and one higher-upside route into the final comparison",
|
|
777
|
+
"- idea_pitch_protocol: before artifact.submit_idea(...), make the winner pass a two-sentence pitch, a strongest-objection check, and a concrete why-now statement",
|
|
778
|
+
"- experiment_milestone_protocol: immediately after artifact.record_main_experiment(...) writes the durable result, send a threaded milestone that explains what was run, the main result, whether primary performance improved / worsened / stayed mixed versus the active baseline or best prior anchor, whether the route still looks promising, and the exact next step",
|
|
779
|
+
"- asset_grounded_analysis_protocol: before artifact.create_analysis_campaign(...), reuse current quest and user-provided assets first and only plan slices that are executable with the current assets, runtime/tooling, and available credentials",
|
|
780
|
+
"- infeasible_slice_protocol: if an analysis slice cannot actually be executed after bounded recovery, do not fake completion; record the slice with a non-success status, report the blocker explicitly, and do not pretend the system can do it",
|
|
781
|
+
"- explicit_improvement_protocol: never make the user infer performance improvement only from raw metrics; say plainly whether performance improved, worsened, or stayed mixed",
|
|
782
|
+
"- verified_reference_breadth_protocol: for paper-like writing, run broad literature search and reading, aim for roughly 30 to 50 verified references unless scope clearly justifies fewer, use one consistent citation workflow SEARCH -> VERIFY -> RETRIEVE -> VALIDATE -> ADD, use Semantic Scholar by default or Google Scholar manual search/export for discovery, use DOI/Crossref or other real metadata backfills for BibTeX and verification, Every final citation must correspond to a real paper from an actual source, store actual bibliography entries in paper/references.bib as valid BibTeX, do one explicit reference audit before bundling, and never invent citations from memory or hand-write BibTeX from scratch",
|
|
783
|
+
"- narrative_focus_protocol: for paper-like writing, organize the paper around one cohesive contribution, make What / Why / So What clear early, assume many readers judge in the order title -> abstract -> introduction -> figures, front-load value in those surfaces, use a five-part abstract formula, keep the introduction concise with 2 to 4 specific contribution bullets, and if the first sentence could be pasted into many unrelated ML papers then rewrite it until it becomes specific",
|
|
784
|
+
"- writing_reasoning_externalization_protocol: for paper-like writing, externalize major reasoning into durable notes such as paper/outline_selection.md, paper/claim_evidence_map.json, paper/related_work_map.md, paper/figure_storyboard.md, and paper/reviewer_first_pass.md; those notes should summarize current judgment, alternatives considered, evidence used, risks, and next revision action rather than hidden chain-of-thought",
|
|
785
|
+
"- outline_intro_value_protocol: for outlines and introductions, make research value explicit early and use a standard introduction arc: problem and stakes -> concrete gap/bottleneck -> remedy/core idea -> evidence preview -> contributions",
|
|
767
786
|
"- teammate_voice_protocol: write like a calm capable teammate using natural first-person phrasing when helpful, for example 'I'm working on ...', 'The main issue right now is ...', 'Next I'll ...'; do not sound like a dashboard or incident log",
|
|
768
787
|
"- tqdm_progress_protocol: when you control the experiment code for baseline reproduction, main experiments, or analysis experiments, instrument long loops with a throttled tqdm-style progress reporter when feasible and also prefer periodic __DS_PROGRESS__ JSON markers so monitoring stays both human-readable and machine-usable",
|
|
769
788
|
"- translation_protocol: convert internal actions into user-facing meaning; describe what was finished and why it matters instead of naming every touched file, counter, timestamp, or subprocess",
|
|
770
789
|
"- detail_gate_protocol: include exact counters, worker labels, timestamps, retry counts, or file names only when the user explicitly asked for them, when they change the recommended action, or when they are the only honest way to explain a real blocker",
|
|
771
|
-
"- monitoring_summary_protocol: for long-running monitoring loops, summarize the frontier state in plain language such as still progressing, temporarily stalled, recovered, or needs intervention; do not narrate each watch window unless the
|
|
790
|
+
"- monitoring_summary_protocol: for long-running monitoring loops, summarize the frontier state in plain language such as still progressing, temporarily stalled, recovered, or needs intervention; do not narrate each watch window and do not send a no-change update merely because a sleep finished unless the user-visible timing bound requires it",
|
|
772
791
|
"- preflight_rewrite_protocol: before sending artifact.interact, quickly self-check whether the draft reads like a monitoring log, file inventory, or internal diary; if it mentions watch windows, heartbeats, retry counters, raw counts, timestamps, or multiple file names without being necessary for user action, rewrite it into conclusion -> meaning -> next step first",
|
|
773
792
|
"- non_research_mode_protocol: if the user message looks like a non-research request, ask for a second confirmation before engaging stage skills or research workflow; after completion, leave one blocking standby interaction instead of repeatedly pinging",
|
|
774
793
|
"- workspace_discipline: read and modify code inside current_workspace_root; treat quest_root as the canonical repo identity and durable runtime root",
|
|
@@ -74,7 +74,7 @@ class QuestService:
|
|
|
74
74
|
if value:
|
|
75
75
|
return value.lower()
|
|
76
76
|
config = ConfigManager(self.home).load_named("config")
|
|
77
|
-
return str(config.get("default_locale") or "
|
|
77
|
+
return str(config.get("default_locale") or "en-US").lower()
|
|
78
78
|
|
|
79
79
|
def localized_copy(self, *, zh: str, en: str, quest_root: Path | None = None) -> str:
|
|
80
80
|
return zh if self.preferred_locale(quest_root).startswith("zh") else en
|
|
@@ -5,7 +5,7 @@ from pathlib import Path
|
|
|
5
5
|
from uuid import uuid4
|
|
6
6
|
|
|
7
7
|
from ..memory.frontmatter import load_markdown_document
|
|
8
|
-
from ..shared import ensure_dir
|
|
8
|
+
from ..shared import ensure_dir, read_json, utc_now, write_json
|
|
9
9
|
from .registry import discover_skill_bundles
|
|
10
10
|
|
|
11
11
|
|
|
@@ -63,6 +63,72 @@ class SkillInstaller:
|
|
|
63
63
|
"notes": [],
|
|
64
64
|
}
|
|
65
65
|
|
|
66
|
+
def sync_existing_quests(self) -> dict:
|
|
67
|
+
quests_root = self.home / "quests"
|
|
68
|
+
synced: list[dict[str, object]] = []
|
|
69
|
+
if not quests_root.exists():
|
|
70
|
+
return {
|
|
71
|
+
"count": 0,
|
|
72
|
+
"quests": [],
|
|
73
|
+
}
|
|
74
|
+
for quest_root in sorted(quests_root.iterdir()):
|
|
75
|
+
if not quest_root.is_dir():
|
|
76
|
+
continue
|
|
77
|
+
if not (quest_root / "quest.yaml").exists():
|
|
78
|
+
continue
|
|
79
|
+
result = self.sync_quest(quest_root)
|
|
80
|
+
synced.append(
|
|
81
|
+
{
|
|
82
|
+
"quest_id": quest_root.name,
|
|
83
|
+
"quest_root": str(quest_root),
|
|
84
|
+
"codex_count": len(result.get("codex") or []),
|
|
85
|
+
"claude_count": len(result.get("claude") or []),
|
|
86
|
+
}
|
|
87
|
+
)
|
|
88
|
+
return {
|
|
89
|
+
"count": len(synced),
|
|
90
|
+
"quests": synced,
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
def ensure_release_sync(
|
|
94
|
+
self,
|
|
95
|
+
*,
|
|
96
|
+
installed_version: str,
|
|
97
|
+
sync_global_enabled: bool = True,
|
|
98
|
+
sync_existing_quests_enabled: bool = True,
|
|
99
|
+
force: bool = False,
|
|
100
|
+
) -> dict:
|
|
101
|
+
normalized_version = str(installed_version or "").strip() or "unknown"
|
|
102
|
+
state = self._read_release_sync_state()
|
|
103
|
+
previous_version = str(state.get("installed_version") or "").strip()
|
|
104
|
+
if not force and previous_version == normalized_version:
|
|
105
|
+
return {
|
|
106
|
+
"updated": False,
|
|
107
|
+
"installed_version": normalized_version,
|
|
108
|
+
"previous_version": previous_version or None,
|
|
109
|
+
"global_synced": False,
|
|
110
|
+
"existing_quests_synced": False,
|
|
111
|
+
"state_path": str(self._release_sync_state_path()),
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
summary: dict[str, object] = {
|
|
115
|
+
"updated": True,
|
|
116
|
+
"installed_version": normalized_version,
|
|
117
|
+
"previous_version": previous_version or None,
|
|
118
|
+
"global_synced": False,
|
|
119
|
+
"existing_quests_synced": False,
|
|
120
|
+
"state_path": str(self._release_sync_state_path()),
|
|
121
|
+
"synced_at": utc_now(),
|
|
122
|
+
}
|
|
123
|
+
if sync_global_enabled:
|
|
124
|
+
summary["global"] = self.sync_global()
|
|
125
|
+
summary["global_synced"] = True
|
|
126
|
+
if sync_existing_quests_enabled:
|
|
127
|
+
summary["existing_quests"] = self.sync_existing_quests()
|
|
128
|
+
summary["existing_quests_synced"] = True
|
|
129
|
+
self._write_release_sync_state(summary)
|
|
130
|
+
return summary
|
|
131
|
+
|
|
66
132
|
def _sync_claude_projection(self, bundle, target_root: Path) -> Path:
|
|
67
133
|
target = target_root / f"deepscientist-{bundle.skill_id}.md"
|
|
68
134
|
if bundle.claude_md and bundle.claude_md.exists():
|
|
@@ -130,3 +196,13 @@ class SkillInstaller:
|
|
|
130
196
|
shutil.rmtree(target)
|
|
131
197
|
else:
|
|
132
198
|
target.unlink(missing_ok=True)
|
|
199
|
+
|
|
200
|
+
def _release_sync_state_path(self) -> Path:
|
|
201
|
+
return self.home / "runtime" / "skill-sync-state.json"
|
|
202
|
+
|
|
203
|
+
def _read_release_sync_state(self) -> dict:
|
|
204
|
+
payload = read_json(self._release_sync_state_path(), {})
|
|
205
|
+
return payload if isinstance(payload, dict) else {}
|
|
206
|
+
|
|
207
|
+
def _write_release_sync_state(self, payload: dict[str, object]) -> None:
|
|
208
|
+
write_json(self._release_sync_state_path(), payload)
|
package/src/prompts/system.md
CHANGED
|
@@ -290,6 +290,24 @@ Use this especially for:
|
|
|
290
290
|
- stage transitions
|
|
291
291
|
- outline creation or outline selection
|
|
292
292
|
- experiment launch or retry decisions
|
|
293
|
+
- writing-stage reasoning notes such as outline choice, claim-evidence matching, related-work positioning, figure selection, and reviewer-first diagnosis
|
|
294
|
+
|
|
295
|
+
For paper-like writing, externalize the major writing rationale into durable notes instead of leaving it only in chat:
|
|
296
|
+
|
|
297
|
+
- `paper/outline_selection.md`: why this outline wins, what alternatives were rejected, and what weaknesses remain
|
|
298
|
+
- `paper/claim_evidence_map.json`: which claims are supported, partially supported, or unsupported, and by what evidence
|
|
299
|
+
- `paper/related_work_map.md`: nearest neighbors, comparison axes, and the exact distinction being claimed
|
|
300
|
+
- `paper/figure_storyboard.md`: what each main figure/table must prove, why it belongs, and what caption message it should carry
|
|
301
|
+
- `paper/reviewer_first_pass.md`: what a fast reviewer likely concludes from the first page and first decisive figure
|
|
302
|
+
|
|
303
|
+
Each of those notes should read like an external reasoning memo, not hidden chain-of-thought.
|
|
304
|
+
Prefer this compact shape when applicable:
|
|
305
|
+
|
|
306
|
+
- current judgment
|
|
307
|
+
- alternatives considered
|
|
308
|
+
- evidence used
|
|
309
|
+
- risks or uncertainty
|
|
310
|
+
- next revision action
|
|
293
311
|
- baseline acceptance or waiver
|
|
294
312
|
- paper-writing decisions
|
|
295
313
|
- proofing, bundle verification, and finalize readiness
|
|
@@ -340,6 +358,7 @@ Use this light heuristic:
|
|
|
340
358
|
- the strongest currently supported line given existing experiment results, literature, and codebase constraints
|
|
341
359
|
- identify a small `frontier`:
|
|
342
360
|
- usually 2 to 3 plausible alternatives, not an open-ended brainstorm list
|
|
361
|
+
- a temporary raw ideation slate may be larger during one bounded divergence pass, but it should normally shrink back to 2 to 3 serious alternatives and at most 5
|
|
343
362
|
- choose the `next best action`:
|
|
344
363
|
- the route that most improves expected research value given what is already known
|
|
345
364
|
|
|
@@ -942,10 +961,19 @@ Prefer these patterns:
|
|
|
942
961
|
- use `artifact.submit_idea(mode='create', lineage_intent='continue_line'|'branch_alternative', ...)` when an idea is accepted and must become the new active research head
|
|
943
962
|
- treat the resulting branch as one durable research round or route, not merely a temporary Git container
|
|
944
963
|
- every accepted durable idea submission should normally create a new user-visible canvas node
|
|
964
|
+
- before accepting an idea, unless strong durable evidence already narrows the route to one obvious serious option, run one bounded divergent -> convergent ideation pass instead of collapsing onto the first plausible route
|
|
965
|
+
- classify the current framing as `problem-first` or `solution-first`
|
|
966
|
+
- generate a small but genuinely diverse candidate slate before ranking, then shrink it back to a serious frontier that is usually 2 to 3 alternatives and at most 5
|
|
967
|
+
- if the candidates are all from the same mechanism family, widen once with distinct lenses such as abstraction ladder, tension hunting, analogy transfer, inversion, or adjacent-possible reasoning
|
|
968
|
+
- require each serious candidate to answer `why now` / `what changed`
|
|
969
|
+
- before `artifact.submit_idea(...)`, make the winner pass a two-sentence pitch and strongest-objection check
|
|
945
970
|
- before calling it, first finish a concise but durable idea draft in Markdown that explains the route clearly enough for later implementation and review
|
|
946
971
|
- when available, pass that draft through `draft_markdown` so the branch keeps both a compact `idea.md` contract and a richer `draft.md`
|
|
947
972
|
- `continue_line` means the new idea is a child of the current active branch
|
|
948
973
|
- `branch_alternative` means the new idea is a sibling-like branch that starts from the current branch's parent foundation
|
|
974
|
+
- immediately after a successful accepted idea submission, send `artifact.interact(kind='milestone', reply_mode='threaded', ...)`
|
|
975
|
+
- that idea milestone should tell the user, in plain language, what the idea is, whether it currently looks valid, whether it appears to have research value / novelty / real insight, the main uncertainty, and the exact next experiment or decision
|
|
976
|
+
- do not make the user infer idea quality from raw branch metadata or long prose alone; state your current judgment explicitly
|
|
949
977
|
- use `artifact.submit_idea(mode='revise', ...)` only for maintenance-only in-place refinement of the same branch
|
|
950
978
|
- this is compatibility-only and should not be the normal post-result research route
|
|
951
979
|
- do not use `mode='revise'` as the default way to start a new optimization round, even for documentation-only changes
|
|
@@ -962,11 +990,15 @@ Prefer these patterns:
|
|
|
962
990
|
- if comparison is invalid or evidence is limited, express that explicitly through `baseline_relation`, `comparability`, and `failure_mode` instead of hiding the uncertainty in prose
|
|
963
991
|
- write it for a human reader who should understand the run outcome without opening logs, diffs, or file paths
|
|
964
992
|
- keep `takeaway` to one short sentence, keep `next_action` to one best immediate route, and do not include branch ids, paths, tool traces, or raw metric dumps
|
|
993
|
+
- immediately after recording the durable main-experiment result, send `artifact.interact(kind='milestone', reply_mode='threaded', ...)`
|
|
994
|
+
- that experiment milestone should tell the user what was run, the main result, whether primary performance improved / worsened / stayed mixed versus the active baseline or best prior anchor, whether the route still looks promising, and the exact next step
|
|
995
|
+
- never force the user to infer “did performance improve?” from raw metrics alone; say it explicitly
|
|
965
996
|
- once a branch has a durable main-experiment result, treat that branch as a fixed historical research node
|
|
966
997
|
- use `artifact.create_analysis_campaign(...)` whenever one or more extra experiments must branch from the current workspace/result node
|
|
967
998
|
- even a single extra experiment should still become a one-slice analysis campaign instead of mutating the completed parent node in place
|
|
968
999
|
- use `artifact.record_analysis_slice(...)` immediately after each analysis slice finishes
|
|
969
1000
|
- include the same six-field `evaluation_summary` so later review, rebuttal, and route selection can read one stable summary instead of re-parsing long prose
|
|
1001
|
+
- when a finished slice materially changes the route judgment, baseline comparison, or performance picture, send a user-visible `artifact.interact(...)` summary that states that impact plainly instead of leaving it buried in the slice record
|
|
970
1002
|
- use `artifact.prepare_branch(...)` only for compatibility or exceptional manual recovery; do not prefer it for the normal idea -> experiment -> analysis flow
|
|
971
1003
|
- use `artifact.confirm_baseline(...)` as the canonical baseline-stage gate after the accepted baseline root, variant, and metric contract are clear
|
|
972
1004
|
- use `artifact.waive_baseline(...)` only when the quest must explicitly continue without a baseline
|
|
@@ -1032,6 +1064,8 @@ For `artifact.interact(...)` specifically:
|
|
|
1032
1064
|
- when a major stage deliverable is actually completed, upgrade the user-facing update to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report instead of a minimal progress note
|
|
1033
1065
|
- major stage deliverables that normally require the richer milestone report include at least: completed idea generation/selection, completed main experiment, completed analysis campaign, and completed paper/draft milestone
|
|
1034
1066
|
- each richer milestone report should still be an external reasoning summary rather than hidden chain-of-thought, and it should normally cover: what was completed, why it matters, the key result or route impact, the main remaining risk or open question, and the exact recommended next step
|
|
1067
|
+
- for completed idea generation/selection, that richer milestone report should also make your current judgment explicit about whether the idea looks valid, research-worthy, and insight-bearing
|
|
1068
|
+
- for completed main experiments and other finished experiment records, that richer milestone report should also make explicit whether performance improved, worsened, or stayed mixed, and what evidence supports that judgment
|
|
1035
1069
|
- that richer milestone report is still normally non-blocking: after sending it, continue the quest automatically whenever the next step is already clear from local evidence
|
|
1036
1070
|
- if the active communication surface is QQ and the corresponding auto-send policy is enabled, a richer milestone report may include one high-value attachment such as a summary PNG or final paper PDF
|
|
1037
1071
|
- when you explicitly request outbound media attachments through `artifact.interact(...)`, prefer one absolute-path attachment over many relative-path attachments
|
|
@@ -1108,6 +1142,10 @@ Use this exact pattern:
|
|
|
1108
1142
|
Protocol rules:
|
|
1109
1143
|
|
|
1110
1144
|
- even if only one extra experiment is needed, still use a one-slice campaign
|
|
1145
|
+
- plan the full slice list before running the first slice, and ground that list in current quest assets rather than hypothetical future resources
|
|
1146
|
+
- treat files, datasets, checkpoints, extracted texts, baselines, prior results, and user-provided attachments already present in the quest as the first-choice asset pool for supplementary experiments
|
|
1147
|
+
- do not launch slices that require unavailable assets or unsupported capabilities unless you first recover them legitimately within the current system
|
|
1148
|
+
- if legitimate recovery fails, report that inability explicitly and keep the missing dependency visible in the durable record rather than quietly narrowing the task
|
|
1111
1149
|
- do not create ad-hoc follow-up branches outside this protocol unless recovery/debugging truly requires it
|
|
1112
1150
|
- the completed parent result node is immutable history
|
|
1113
1151
|
- for supplementary work, the canonical identity is `campaign_id + slice_id`; do not invent a separate main `run_id`
|
|
@@ -1200,6 +1238,13 @@ For analysis campaigns specifically, the safest default sequence is:
|
|
|
1200
1238
|
5. call `artifact.record_analysis_slice(...)` after each slice with setup, execution, results, metrics, and a six-field `evaluation_summary`
|
|
1201
1239
|
6. after the last slice, return automatically to the parent idea branch and continue writing
|
|
1202
1240
|
|
|
1241
|
+
Before launching or extending an analysis campaign:
|
|
1242
|
+
|
|
1243
|
+
- start from the current quest asset pool first, especially anything the user already provided or the quest already contains, such as datasets, configs, checkpoints, extracted texts, baselines, logs, and reusable code paths
|
|
1244
|
+
- only launch slices that are actually executable with the current quest assets, current runtime/tooling, and currently available credentials
|
|
1245
|
+
- if a proposed slice depends on unavailable data, unsupported infrastructure, or capabilities the current system does not actually have, either redesign it around available assets or report plainly that the slice / campaign cannot currently be completed
|
|
1246
|
+
- if a slice becomes infeasible during execution, attempt bounded recovery first; if it still cannot be completed honestly, record that explicitly with a non-success status and explain the blocker instead of pretending the slice ran
|
|
1247
|
+
|
|
1203
1248
|
When writing `evaluation_summary`, use these semantics:
|
|
1204
1249
|
|
|
1205
1250
|
- `takeaway`: one-sentence human-readable conclusion, starting with the outcome rather than the procedure
|
|
@@ -1686,6 +1731,17 @@ It should preserve:
|
|
|
1686
1731
|
- `experimental_designs`
|
|
1687
1732
|
- `contributions`
|
|
1688
1733
|
|
|
1734
|
+
For story quality, keep one core paper-writing discipline visible:
|
|
1735
|
+
|
|
1736
|
+
- the paper should sell one cohesive contribution or claim cluster, not a random bag of experiments
|
|
1737
|
+
- force the story to answer three reader questions early and clearly:
|
|
1738
|
+
- `What`: the concrete claim or contribution
|
|
1739
|
+
- `Why`: the evidence that supports it
|
|
1740
|
+
- `So What`: why the community should care
|
|
1741
|
+
- if you cannot state the contribution in one sentence, the outline is not stable yet
|
|
1742
|
+
- front-load value: title, abstract, introduction opening, and the first decisive figure/table should already communicate why the work matters
|
|
1743
|
+
- organize every major section around that core contribution with surgical focus; remove side branches that do not support the main claim
|
|
1744
|
+
|
|
1689
1745
|
When building or revising a paper-like outline, prefer the following paperagent-style requirements whenever they fit the quest:
|
|
1690
1746
|
|
|
1691
1747
|
- read all relevant experiments individually before fixing the outline
|
|
@@ -1697,6 +1753,8 @@ When building or revising a paper-like outline, prefer the following paperagent-
|
|
|
1697
1753
|
- prefer actual quest artifacts over older paper numbers when they conflict
|
|
1698
1754
|
- verify that any planned figure or table can be backed by real available data
|
|
1699
1755
|
- keep the method as the protagonist of the story without overstating what belongs to the baseline
|
|
1756
|
+
- make the reader-facing research value explicit early: the outline should say why the problem matters, what concrete bottleneck or gap remains, and why the current intervention changes an important evidence boundary instead of being just another variant
|
|
1757
|
+
- do not assume the reader will infer significance from novelty words alone; make the practical, empirical, or methodological value visible in the title / abstract / introduction plan
|
|
1700
1758
|
|
|
1701
1759
|
Do not mark writing complete if critical evidence, claim mapping, proofing, or submission checks are still missing.
|
|
1702
1760
|
If writing reveals missing evidence, route the quest back through a durable decision instead of glossing over the gap.
|
|
@@ -1704,6 +1762,16 @@ If writing reveals missing evidence, route the quest back through a durable deci
|
|
|
1704
1762
|
During writing:
|
|
1705
1763
|
|
|
1706
1764
|
- persist important search findings, citation notes, figure decisions, and revision notes immediately in durable files
|
|
1765
|
+
- before treating related work or claim framing as stable, run broad literature search and reading passes; for a normal paper-like deliverable, the default target is roughly `30` to `50` verified references unless the scope clearly justifies fewer
|
|
1766
|
+
- every cited paper must be real and verified from an actual source; never invent citations from memory or rely only on second-hand summaries
|
|
1767
|
+
- use one consistent citation workflow: `SEARCH -> VERIFY -> RETRIEVE -> VALIDATE -> ADD`
|
|
1768
|
+
- for search and first-pass metadata, use Semantic Scholar by default or Google Scholar via normal manual search / export only; do not rely on ad hoc random sites as the primary citation source
|
|
1769
|
+
- because Google Scholar has no official API, do not rely on Scholar scraping as an automated backend; use Semantic Scholar as the default programmatic search source and use DOI/Crossref, arXiv, OpenAlex, or publisher metadata as verification/backfill sources when needed
|
|
1770
|
+
- store actual bibliography entries in `paper/references.bib` as valid BibTeX copied or exported from Google Scholar, Semantic Scholar-linked metadata, DOI/Crossref, or publisher metadata; do not hand-write BibTeX entries from scratch
|
|
1771
|
+
- before `artifact.submit_paper_bundle(...)`, run one explicit reference audit for breadth, existence, and claim-level spot checks; unresolved citations keep the draft incomplete
|
|
1772
|
+
- for the abstract, prefer a compact five-part formula: what you achieved -> why it matters / is hard -> how you do it -> what evidence you have -> most important result
|
|
1773
|
+
- write the introduction in a standard research-paper shape: `problem and stakes -> concrete gap/bottleneck -> remedy / core idea -> evidence preview -> contributions`
|
|
1774
|
+
- keep the introduction short and high-density; for paper-style output, aim for roughly `1` to `1.5` pages, include `2` to `4` specific contribution bullets, and do not bury the methods too late when the venue style expects them earlier
|
|
1707
1775
|
- prefer section-aware review with issue location and severity
|
|
1708
1776
|
- re-check the introduction and claimed contributions after the experiments section stabilizes
|
|
1709
1777
|
- run at least one explicit `5-minute reviewer pass` before calling the draft structurally sound
|
|
@@ -1866,8 +1934,12 @@ When summarizing long logs, campaigns, or multi-agent work:
|
|
|
1866
1934
|
- if you need to recover or verify ids before monitoring, call `bash_exec(mode='history')` and use the reverse-chronological lines
|
|
1867
1935
|
- after launch, monitor with explicit sleeps plus `bash_exec(mode='list')` and `bash_exec(mode='read', id=..., tail_limit=..., order='desc')`
|
|
1868
1936
|
- after the first log read, prefer incremental checks with `bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc')` so you only inspect newly appended evidence
|
|
1937
|
+
- when supervising a long-running baseline, experiment, or analysis run, judge health by forward progress rather than by whether a final artifact has already appeared
|
|
1938
|
+
- treat new sample counters, task counters, saved-result markers, output files, `last_output_seq`, and `last_progress` as the primary liveness signals
|
|
1939
|
+
- if logs expose counters such as `6/46`, `99 instances`, task-completion markers, or save markers, compare those deltas first before inferring that the run is stuck
|
|
1869
1940
|
- use `silent_seconds`, `progress_age_seconds`, `signal_age_seconds`, and `watchdog_overdue` from `bash_exec(mode='list'|'read', ...)` as the default watchdog clues instead of inferring staleness from prose alone
|
|
1870
|
-
-
|
|
1941
|
+
- do not restart or kill a run merely because a short observation window passed without final completion
|
|
1942
|
+
- if the run is clearly invalid, wedged, superseded, or shows no meaningful delta across a sufficiently long observation window, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`; if it must die immediately, add `force=true`
|
|
1871
1943
|
- after a kill-and-wait completes, relaunch cleanly with a fresh structured `comment` rather than reusing the broken session
|
|
1872
1944
|
- For a command that is likely to run for a long time, do not launch it and disappear. After `bash_exec(mode='detach', ...)`, keep monitoring it in the same turn through an explicit wait-and-check loop.
|
|
1873
1945
|
- The default long-run monitoring cadence is:
|
|
@@ -1877,16 +1949,20 @@ When summarizing long logs, campaigns, or multi-agent work:
|
|
|
1877
1949
|
- sleep about `600s`, then inspect again
|
|
1878
1950
|
- sleep about `1800s`, then inspect again
|
|
1879
1951
|
- if the run is still active, continue checking about every `1800s`
|
|
1952
|
+
- You may widen those windows when the user already told you that the model, endpoint, or workload is expected to be slow; prefer patience over premature intervention in that case.
|
|
1880
1953
|
- You may monitor more frequently, but for baseline reproduction, baseline-running phases, main experiments, artifact-production phases, and other important detached work, never let more than `1800s` (30 minutes) pass without inspecting real logs or status again.
|
|
1881
1954
|
- For those same important long-running tasks, if the run is still active after the inspection, ensure the user-visible thread also receives a concise `artifact.interact(kind='progress', ...)` update within that same `1800s` window.
|
|
1882
1955
|
- If the only blocker is a missing user-supplied external credential that has already been requested through a blocking interaction and no other useful work is possible, you may intentionally park with a much longer low-frequency wait such as `bash_exec(command='sleep 3600', mode='await', timeout_seconds=3700, ...)` to avoid busy-looping.
|
|
1883
1956
|
- If the environment or tool surface makes direct shell waiting awkward, an equivalent bounded wait such as `bash_exec(mode='await', id=..., timeout_seconds=...)` is acceptable, but the behavior must stay the same: wait, inspect real logs, then continue.
|
|
1884
1957
|
- Never stay silent for more than `1800s` across an important long-running task.
|
|
1885
|
-
- After each sleep/await cycle finishes and you inspect the real logs again,
|
|
1958
|
+
- After each sleep/await cycle finishes and you inspect the real logs again, first compare the new evidence against the last inspection.
|
|
1959
|
+
- If the inspection reveals a human-meaningful delta such as new samples, new completed tasks, new saved outputs, a changed `last_progress`, a route change, or a real problem, send `artifact.interact(kind='progress', ...)` with:
|
|
1886
1960
|
- the current status
|
|
1887
1961
|
- the latest concrete evidence from logs or outputs
|
|
1962
|
+
- what changed since the previous inspection
|
|
1888
1963
|
- the next planned check time
|
|
1889
1964
|
- the estimated next reply time (usually the next sleep interval you are about to use)
|
|
1965
|
+
- If the run still looks healthy but there is no human-meaningful delta yet, continue monitoring silently instead of sending a no-change keepalive just because a sleep finished.
|
|
1890
1966
|
- For baseline reproduction, main experiments, analysis experiments, and similar user-relevant long runs, translate that monitoring ETA into user-facing language such as how long until the next meaningful result or the next expected update.
|
|
1891
1967
|
- Outside those detached experiment waits, if active work has already consumed roughly 10 to 30 tool calls without any user-visible checkpoint, send a concise `artifact.interact(kind='progress', ...)` before continuing.
|
|
1892
1968
|
- If you forget a bash id, do not guess. Use `bash_exec(mode='history')` or `bash_exec(mode='list')` and recover it from the reverse-chronological session list.
|
|
@@ -103,10 +103,16 @@ Before launching a campaign, confirm:
|
|
|
103
103
|
- the comparison target
|
|
104
104
|
- the metric or observable of interest
|
|
105
105
|
- the list of specific analysis questions
|
|
106
|
+
- the current quest / user-provided assets that each planned slice will actually use
|
|
107
|
+
- whether each slice is executable with the current assets, tooling, and available credentials
|
|
106
108
|
- if durable state exposes `active_baseline_metric_contract_json`, read that JSON file before defining slice success criteria or comparison tables
|
|
107
109
|
- treat `active_baseline_metric_contract_json` as the default baseline comparison contract unless a slice is explicitly testing a different evaluation contract
|
|
108
110
|
|
|
109
111
|
If the question list is fuzzy, sharpen it before running anything.
|
|
112
|
+
Treat quest files, attached user assets, checkpoints, configs, extracted texts, baselines, and existing code paths as the first-choice asset pool.
|
|
113
|
+
Do not design slices around hypothetical resources that the current system cannot actually access or run.
|
|
114
|
+
If a slice cannot be executed with the current system, redesign it around available assets or explicitly report that the task cannot currently be completed.
|
|
115
|
+
If infeasibility appears mid-run, attempt bounded recovery first; if still blocked, record the slice with a non-success status and explain why.
|
|
110
116
|
|
|
111
117
|
## Truth sources
|
|
112
118
|
|
|
@@ -289,11 +295,13 @@ Create the campaign with `artifact.create_analysis_campaign(...)` before startin
|
|
|
289
295
|
Even one extra experiment should still be represented as a one-slice campaign so Git and Canvas show a real child node.
|
|
290
296
|
Branch that campaign from the current workspace/result node rather than mutating the completed parent node in place.
|
|
291
297
|
That tool should receive the full slice list, and each returned slice worktree becomes the required execution location for that slice.
|
|
298
|
+
Only create the campaign after you have verified that the listed slices are actually executable with the current quest assets and runtime.
|
|
292
299
|
When the campaign is writing-facing, the same call should also carry `selected_outline_ref`, `research_questions`, `experimental_designs`, and `todo_items`.
|
|
293
300
|
If ids or refs are unclear, recover them first with `artifact.resolve_runtime_refs(...)`, `artifact.get_analysis_campaign(...)`, or `artifact.list_paper_outlines(...)` instead of guessing.
|
|
294
301
|
Treat `campaign_id` as system-owned, and treat `slice_id` / `todo_id` as agent-authored semantic ids.
|
|
295
302
|
Do not replace the normal campaign flow with repeated manual `artifact.prepare_branch(...)` calls.
|
|
296
303
|
After each slice finishes, call `artifact.record_analysis_slice(...)` immediately so the result is mirrored back to the parent branch and the next slice can be activated.
|
|
304
|
+
If a slice fails or becomes infeasible, still call `artifact.record_analysis_slice(...)` with an honest non-success status plus the real blocker and next recommendation; do not leave the campaign state ambiguous.
|
|
297
305
|
For slice recording, `deviations` and `evidence_paths` are optional context fields, not mandatory ceremony; include them only when they materially help explanation or auditability.
|
|
298
306
|
Each `artifact.record_analysis_slice(...)` call should also include an `evaluation_summary` with exactly these six fields:
|
|
299
307
|
|