EvoScientist 0.0.1.dev2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. EvoScientist/EvoScientist.py +157 -0
  2. EvoScientist/__init__.py +24 -0
  3. EvoScientist/__main__.py +4 -0
  4. EvoScientist/backends.py +392 -0
  5. EvoScientist/cli.py +1553 -0
  6. EvoScientist/middleware.py +35 -0
  7. EvoScientist/prompts.py +277 -0
  8. EvoScientist/skills/accelerate/SKILL.md +332 -0
  9. EvoScientist/skills/accelerate/references/custom-plugins.md +453 -0
  10. EvoScientist/skills/accelerate/references/megatron-integration.md +489 -0
  11. EvoScientist/skills/accelerate/references/performance.md +525 -0
  12. EvoScientist/skills/bitsandbytes/SKILL.md +411 -0
  13. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +521 -0
  14. EvoScientist/skills/bitsandbytes/references/qlora-training.md +521 -0
  15. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +447 -0
  16. EvoScientist/skills/find-skills/SKILL.md +133 -0
  17. EvoScientist/skills/find-skills/scripts/install_skill.py +211 -0
  18. EvoScientist/skills/flash-attention/SKILL.md +367 -0
  19. EvoScientist/skills/flash-attention/references/benchmarks.md +215 -0
  20. EvoScientist/skills/flash-attention/references/transformers-integration.md +293 -0
  21. EvoScientist/skills/llama-cpp/SKILL.md +258 -0
  22. EvoScientist/skills/llama-cpp/references/optimization.md +89 -0
  23. EvoScientist/skills/llama-cpp/references/quantization.md +213 -0
  24. EvoScientist/skills/llama-cpp/references/server.md +125 -0
  25. EvoScientist/skills/lm-evaluation-harness/SKILL.md +490 -0
  26. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  27. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  28. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  29. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  30. EvoScientist/skills/ml-paper-writing/SKILL.md +937 -0
  31. EvoScientist/skills/ml-paper-writing/references/checklists.md +361 -0
  32. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  33. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  34. EvoScientist/skills/ml-paper-writing/references/sources.md +159 -0
  35. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +476 -0
  36. EvoScientist/skills/ml-paper-writing/templates/README.md +251 -0
  37. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  38. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  39. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  40. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  41. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  42. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  43. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +50 -0
  44. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  45. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  46. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  47. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  48. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  49. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  50. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  51. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  52. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  53. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  54. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  55. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  59. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  60. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  61. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  62. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  63. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  64. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  67. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  68. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  69. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  70. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  71. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  72. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  76. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  77. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  78. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  79. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  80. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  81. EvoScientist/skills/peft/SKILL.md +431 -0
  82. EvoScientist/skills/peft/references/advanced-usage.md +514 -0
  83. EvoScientist/skills/peft/references/troubleshooting.md +480 -0
  84. EvoScientist/skills/ray-data/SKILL.md +326 -0
  85. EvoScientist/skills/ray-data/references/integration.md +82 -0
  86. EvoScientist/skills/ray-data/references/transformations.md +83 -0
  87. EvoScientist/skills/skill-creator/LICENSE.txt +202 -0
  88. EvoScientist/skills/skill-creator/SKILL.md +356 -0
  89. EvoScientist/skills/skill-creator/references/output-patterns.md +82 -0
  90. EvoScientist/skills/skill-creator/references/workflows.md +28 -0
  91. EvoScientist/skills/skill-creator/scripts/init_skill.py +303 -0
  92. EvoScientist/skills/skill-creator/scripts/package_skill.py +110 -0
  93. EvoScientist/skills/skill-creator/scripts/quick_validate.py +95 -0
  94. EvoScientist/stream/__init__.py +53 -0
  95. EvoScientist/stream/emitter.py +94 -0
  96. EvoScientist/stream/formatter.py +168 -0
  97. EvoScientist/stream/tracker.py +115 -0
  98. EvoScientist/stream/utils.py +255 -0
  99. EvoScientist/subagent.yaml +147 -0
  100. EvoScientist/tools.py +135 -0
  101. EvoScientist/utils.py +207 -0
  102. evoscientist-0.0.1.dev2.dist-info/METADATA +227 -0
  103. evoscientist-0.0.1.dev2.dist-info/RECORD +107 -0
  104. evoscientist-0.0.1.dev2.dist-info/WHEEL +5 -0
  105. evoscientist-0.0.1.dev2.dist-info/entry_points.txt +5 -0
  106. evoscientist-0.0.1.dev2.dist-info/licenses/LICENSE +21 -0
  107. evoscientist-0.0.1.dev2.dist-info/top_level.txt +1 -0
@@ -0,0 +1,168 @@
1
+ """
2
+ ToolResultFormatter - content-aware tool result formatting with Rich.
3
+
4
+ Detects content type (success/error/json/markdown/text) and formats accordingly.
5
+ """
6
+
7
+ import json
8
+ from dataclasses import dataclass
9
+ from enum import Enum
10
+ from typing import Any, List
11
+
12
+ from rich.panel import Panel
13
+ from rich.syntax import Syntax
14
+ from rich.text import Text
15
+ from rich.markdown import Markdown
16
+
17
+ from .utils import SUCCESS_PREFIX, FAILURE_PREFIX, is_success as _is_success, truncate
18
+
19
+
20
+ class ContentType(Enum):
21
+ """Content type categories."""
22
+ SUCCESS = "success"
23
+ ERROR = "error"
24
+ JSON = "json"
25
+ MARKDOWN = "markdown"
26
+ TEXT = "text"
27
+
28
+
29
+ @dataclass
30
+ class FormattedResult:
31
+ """Formatted result container."""
32
+ content_type: ContentType
33
+ elements: List[Any] # Rich renderable elements
34
+ success: bool = True
35
+
36
+
37
+ class ToolResultFormatter:
38
+ """Tool result formatter with content type detection.
39
+
40
+ Usage:
41
+ formatter = ToolResultFormatter()
42
+ result = formatter.format("execute", output, max_length=800)
43
+ for elem in result.elements:
44
+ console.print(elem)
45
+ """
46
+
47
+ def detect_type(self, content: str) -> ContentType:
48
+ """Detect content type."""
49
+ content = content.strip()
50
+
51
+ if content.startswith(SUCCESS_PREFIX):
52
+ body = self._extract_body(content)
53
+ if self._is_json(body):
54
+ return ContentType.JSON
55
+ return ContentType.SUCCESS
56
+
57
+ if content.startswith(FAILURE_PREFIX):
58
+ return ContentType.ERROR
59
+
60
+ if self._is_json(content):
61
+ return ContentType.JSON
62
+
63
+ if self._is_error(content):
64
+ return ContentType.ERROR
65
+
66
+ if self._is_markdown(content):
67
+ return ContentType.MARKDOWN
68
+
69
+ return ContentType.TEXT
70
+
71
+ def is_success(self, content: str) -> bool:
72
+ """Check if content indicates successful execution."""
73
+ return _is_success(content)
74
+
75
+ def format(self, name: str, content: str, max_length: int = 800) -> FormattedResult:
76
+ """Format tool result based on detected content type."""
77
+ content_type = self.detect_type(content)
78
+ success = self.is_success(content)
79
+
80
+ formatter_map = {
81
+ ContentType.SUCCESS: self._format_success,
82
+ ContentType.ERROR: self._format_error,
83
+ ContentType.JSON: self._format_json,
84
+ ContentType.MARKDOWN: self._format_markdown,
85
+ ContentType.TEXT: self._format_text,
86
+ }
87
+
88
+ formatter = formatter_map.get(content_type, self._format_text)
89
+ elements = formatter(name, content, max_length)
90
+
91
+ return FormattedResult(content_type=content_type, elements=elements, success=success)
92
+
93
+ def _extract_body(self, content: str) -> str:
94
+ """Extract body after status prefix."""
95
+ lines = content.split("\n", 2)
96
+ return lines[2].strip() if len(lines) > 2 else ""
97
+
98
+ def _is_json(self, content: str) -> bool:
99
+ content = content.strip()
100
+ if not content:
101
+ return False
102
+ if (content.startswith('{') and content.endswith('}')) or \
103
+ (content.startswith('[') and content.endswith(']')):
104
+ try:
105
+ json.loads(content)
106
+ return True
107
+ except (json.JSONDecodeError, ValueError):
108
+ pass
109
+ return False
110
+
111
+ def _is_error(self, content: str) -> bool:
112
+ error_patterns = [
113
+ 'Traceback (most recent call last)',
114
+ 'Exception:',
115
+ 'Error:',
116
+ ]
117
+ return any(pattern in content for pattern in error_patterns)
118
+
119
+ def _is_markdown(self, content: str) -> bool:
120
+ md_patterns = ['```', '**', '##', '- **']
121
+ return content.startswith('#') or any(p in content for p in md_patterns)
122
+
123
+ def _format_success(self, name: str, content: str, max_length: int) -> List[Any]:
124
+ display = truncate(content, max_length)
125
+ return [Panel(
126
+ Text(display, style="green"),
127
+ title=f"{name} OK",
128
+ border_style="green",
129
+ )]
130
+
131
+ def _format_error(self, name: str, content: str, max_length: int) -> List[Any]:
132
+ display = truncate(content, max_length)
133
+ return [Panel(
134
+ Text(display, style="red"),
135
+ title=f"{name} FAILED",
136
+ border_style="red",
137
+ )]
138
+
139
+ def _format_json(self, name: str, content: str, max_length: int) -> List[Any]:
140
+ json_content = content
141
+ if content.startswith(SUCCESS_PREFIX):
142
+ json_content = self._extract_body(content)
143
+
144
+ try:
145
+ data = json.loads(json_content)
146
+ formatted = json.dumps(data, indent=2, ensure_ascii=False)
147
+ formatted = truncate(formatted, max_length)
148
+ return [
149
+ Text(f"{name} OK", style="cyan bold"),
150
+ Syntax(formatted, "json", theme="monokai", line_numbers=False),
151
+ ]
152
+ except (json.JSONDecodeError, ValueError):
153
+ return self._format_text(name, content, max_length)
154
+
155
+ def _format_markdown(self, name: str, content: str, max_length: int) -> List[Any]:
156
+ display = truncate(content, max_length)
157
+ return [Panel(
158
+ Markdown(display),
159
+ title=f"{name}",
160
+ border_style="cyan dim",
161
+ )]
162
+
163
+ def _format_text(self, name: str, content: str, max_length: int) -> List[Any]:
164
+ display = truncate(content, max_length)
165
+ return [
166
+ Text(f"{name}:", style="cyan bold"),
167
+ Text(f" {display}", style="dim"),
168
+ ]
@@ -0,0 +1,115 @@
1
+ """
2
+ ToolCallTracker - manages incremental JSON parsing for tool parameters.
3
+
4
+ Handles tool_use blocks where arguments arrive in fragments via input_json_delta.
5
+ """
6
+
7
+ import json
8
+ from dataclasses import dataclass, field
9
+ from typing import Dict, Optional
10
+
11
+
12
+ @dataclass
13
+ class ToolCallInfo:
14
+ """Tool call information."""
15
+ id: str
16
+ name: str
17
+ args: Dict = field(default_factory=dict)
18
+ emitted: bool = False
19
+ args_complete: bool = False
20
+ _json_buffer: str = ""
21
+
22
+
23
+ class ToolCallTracker:
24
+ """Tool call tracker for incremental argument parsing.
25
+
26
+ Usage:
27
+ tracker = ToolCallTracker()
28
+ tracker.update(tool_id, name="execute")
29
+ tracker.append_json_delta('{"command')
30
+ tracker.append_json_delta('": "ls"}')
31
+ tracker.finalize_all()
32
+ info = tracker.get(tool_id)
33
+ yield emitter.tool_call(info.name, info.args)
34
+ """
35
+
36
+ def __init__(self):
37
+ self._calls: Dict[str, ToolCallInfo] = {}
38
+ self._last_tool_id: Optional[str] = None
39
+
40
+ def update(
41
+ self,
42
+ tool_id: str,
43
+ name: Optional[str] = None,
44
+ args: Optional[Dict] = None,
45
+ args_complete: bool = False,
46
+ ) -> None:
47
+ """Update tool call info (accumulative)."""
48
+ if tool_id not in self._calls:
49
+ self._calls[tool_id] = ToolCallInfo(
50
+ id=tool_id,
51
+ name=name or "",
52
+ args=args or {},
53
+ args_complete=args_complete,
54
+ )
55
+ self._last_tool_id = tool_id
56
+ else:
57
+ info = self._calls[tool_id]
58
+ if name:
59
+ info.name = name
60
+ if args:
61
+ info.args = args
62
+ if args_complete:
63
+ info.args_complete = True
64
+
65
+ def append_json_delta(self, partial_json: str, index: int = 0) -> None:
66
+ """Accumulate input_json_delta fragment."""
67
+ tool_id = self._last_tool_id
68
+ if tool_id and tool_id in self._calls:
69
+ self._calls[tool_id]._json_buffer += partial_json
70
+
71
+ def finalize_all(self) -> None:
72
+ """Finalize all tool calls: parse accumulated JSON and mark complete."""
73
+ for info in self._calls.values():
74
+ if info._json_buffer:
75
+ try:
76
+ info.args = json.loads(info._json_buffer)
77
+ except json.JSONDecodeError:
78
+ pass
79
+ info._json_buffer = ""
80
+ info.args_complete = True
81
+
82
+ def is_ready(self, tool_id: str) -> bool:
83
+ """Check if a tool call is ready to emit (has name and not yet emitted)."""
84
+ if tool_id not in self._calls:
85
+ return False
86
+ info = self._calls[tool_id]
87
+ return bool(info.name) and not info.emitted
88
+
89
+ def get_all(self) -> list[ToolCallInfo]:
90
+ """Get all tool calls."""
91
+ return list(self._calls.values())
92
+
93
+ def mark_emitted(self, tool_id: str) -> None:
94
+ """Mark a tool call as emitted."""
95
+ if tool_id in self._calls:
96
+ self._calls[tool_id].emitted = True
97
+
98
+ def get(self, tool_id: str) -> Optional[ToolCallInfo]:
99
+ """Get tool call info by ID."""
100
+ return self._calls.get(tool_id)
101
+
102
+ def get_pending(self) -> list[ToolCallInfo]:
103
+ """Get all unemitted tool calls."""
104
+ return [info for info in self._calls.values() if not info.emitted]
105
+
106
+ def emit_all_pending(self) -> list[ToolCallInfo]:
107
+ """Emit all pending tool calls and mark them."""
108
+ pending = self.get_pending()
109
+ for info in pending:
110
+ info.emitted = True
111
+ return pending
112
+
113
+ def clear(self) -> None:
114
+ """Clear all tracked tool calls."""
115
+ self._calls.clear()
@@ -0,0 +1,255 @@
1
+ """
2
+ Stream utility functions and constants.
3
+
4
+ Provides tool status indicators, display limits, and formatting helpers
5
+ adapted for deepagents tool names.
6
+ """
7
+
8
+ import sys
9
+ from pathlib import PurePath
10
+ from enum import Enum
11
+
12
+
13
+ # === Status marker constants ===
14
+ SUCCESS_PREFIX = "[OK]"
15
+ FAILURE_PREFIX = "[FAILED]"
16
+
17
+
18
+ # === Tool status indicators ===
19
+ class ToolStatus(str, Enum):
20
+ """Tool execution status indicators."""
21
+ RUNNING = "\u25cf" # Running - yellow
22
+ SUCCESS = "\u25cf" # Success - green
23
+ ERROR = "\u25cf" # Failed - red
24
+ PENDING = "\u25cb" # Pending - gray
25
+
26
+
27
+ def get_status_symbol(status: ToolStatus) -> str:
28
+ """Get status symbol with ASCII fallback for terminals without Unicode."""
29
+ try:
30
+ supports_unicode = (
31
+ sys.stdout.encoding
32
+ and 'utf' in sys.stdout.encoding.lower()
33
+ )
34
+ except Exception:
35
+ supports_unicode = False
36
+
37
+ if supports_unicode:
38
+ return status.value
39
+
40
+ fallback = {
41
+ ToolStatus.RUNNING: "*",
42
+ ToolStatus.SUCCESS: "+",
43
+ ToolStatus.ERROR: "x",
44
+ ToolStatus.PENDING: "-",
45
+ }
46
+ return fallback.get(status, "?")
47
+
48
+
49
+ # === Display limit constants ===
50
+ class DisplayLimits:
51
+ """Display length limits."""
52
+ THINKING_STREAM = 1000
53
+ THINKING_FINAL = 2000
54
+ ARGS_INLINE = 100
55
+ ARGS_FORMATTED = 300
56
+ TOOL_RESULT_STREAM = 500
57
+ TOOL_RESULT_FINAL = 800
58
+ TOOL_RESULT_MAX = 2000
59
+
60
+
61
+ def has_args(args) -> bool:
62
+ """Check if args has content (handles empty dict falsy issue)."""
63
+ return args is not None and args != {}
64
+
65
+
66
+ def is_success(content: str) -> bool:
67
+ """Determine if tool output indicates successful execution."""
68
+ content = content.strip()
69
+ if content.startswith(SUCCESS_PREFIX):
70
+ return True
71
+ if content.startswith(FAILURE_PREFIX):
72
+ return False
73
+ error_patterns = [
74
+ 'Traceback (most recent call last)',
75
+ 'Exception:',
76
+ 'Error:',
77
+ ]
78
+ return not any(pattern in content for pattern in error_patterns)
79
+
80
+
81
+ def truncate(content: str, max_length: int, suffix: str = "\n... (truncated)") -> str:
82
+ """Truncate content to specified length."""
83
+ if len(content) > max_length:
84
+ return content[:max_length] + suffix
85
+ return content
86
+
87
+
88
+ # === Compact formatting for deepagents tools ===
89
+
90
+ def _shorten_path(path: str, max_len: int = 40) -> str:
91
+ """Shorten a file path for display."""
92
+ if len(path) <= max_len:
93
+ return path
94
+ path_obj = PurePath(path)
95
+ parts = path_obj.parts
96
+ if len(parts) > 2:
97
+ return ".../" + "/".join(parts[-2:])
98
+ return path
99
+
100
+
101
+ def format_tool_compact(name: str, args: dict | None) -> str:
102
+ """Format as compact tool call string: ToolName(key_arg).
103
+
104
+ Adapted for deepagents tool names: execute, read_file, write_file,
105
+ edit_file, grep, glob, ls, write_todos, read_todos, task, load_skill,
106
+ tavily_search, think_tool.
107
+ """
108
+ if not args:
109
+ return f"{name}()"
110
+
111
+ name_lower = name.lower()
112
+
113
+ # Shell execution
114
+ if name_lower == "execute":
115
+ cmd = args.get("command", "")
116
+ if len(cmd) > 50:
117
+ cmd = cmd[:47] + "..."
118
+ return f"execute({cmd})"
119
+
120
+ # File operations
121
+ if name_lower == "read_file":
122
+ path = _shorten_path(args.get("path", ""))
123
+ return f"read_file({path})"
124
+
125
+ if name_lower == "write_file":
126
+ path = _shorten_path(args.get("path", ""))
127
+ return f"write_file({path})"
128
+
129
+ if name_lower == "edit_file":
130
+ path = _shorten_path(args.get("path", ""))
131
+ return f"edit_file({path})"
132
+
133
+ # Search operations
134
+ if name_lower == "glob":
135
+ pattern = args.get("pattern", "")
136
+ if len(pattern) > 40:
137
+ pattern = pattern[:37] + "..."
138
+ return f"glob({pattern})"
139
+
140
+ if name_lower == "grep":
141
+ pattern = args.get("pattern", "")
142
+ path = args.get("path", ".")
143
+ if len(pattern) > 30:
144
+ pattern = pattern[:27] + "..."
145
+ return f"grep({pattern}, {path})"
146
+
147
+ # Directory listing
148
+ if name_lower == "ls":
149
+ path = args.get("path", ".")
150
+ return f"ls({path})"
151
+
152
+ # Todo management
153
+ if name_lower == "write_todos":
154
+ todos = args.get("todos", [])
155
+ if isinstance(todos, list):
156
+ return f"write_todos({len(todos)} items)"
157
+ return "write_todos(...)"
158
+
159
+ if name_lower == "read_todos":
160
+ return "read_todos()"
161
+
162
+ # Sub-agent delegation — display as "Cooking with {agent}" instead of "task()"
163
+ if name_lower == "task":
164
+ sa_type = args.get("subagent_type", "").strip()
165
+ task_desc = args.get("description", args.get("task", "")).strip()
166
+ if sa_type:
167
+ if task_desc:
168
+ if len(task_desc) > 50:
169
+ task_desc = task_desc[:47] + "..."
170
+ return f"Cooking with {sa_type} — {task_desc}"
171
+ return f"Cooking with {sa_type}"
172
+ # Fallback if no subagent_type
173
+ if task_desc:
174
+ if len(task_desc) > 50:
175
+ task_desc = task_desc[:47] + "..."
176
+ return f"Cooking with sub-agent — {task_desc}"
177
+ return "Cooking with sub-agent"
178
+
179
+ # Skills
180
+ if name_lower == "load_skill":
181
+ skill_name = args.get("skill_name", args.get("name", ""))
182
+ return f"load_skill({skill_name})"
183
+
184
+ # Web search
185
+ if name_lower in ("tavily_search", "internet_search"):
186
+ query = args.get("query", "")
187
+ if len(query) > 40:
188
+ query = query[:37] + "..."
189
+ return f"{name}({query})"
190
+
191
+ # Think/reflection
192
+ if name_lower == "think_tool":
193
+ reflection = args.get("reflection", "")
194
+ if len(reflection) > 40:
195
+ reflection = reflection[:37] + "..."
196
+ return f"think_tool({reflection})"
197
+
198
+ # Default: show first few params
199
+ params = []
200
+ for k, v in list(args.items())[:2]:
201
+ v_str = str(v)
202
+ if len(v_str) > 20:
203
+ v_str = v_str[:17] + "..."
204
+ params.append(f"{k}={v_str}")
205
+
206
+ params_str = ", ".join(params)
207
+ if len(params_str) > 50:
208
+ params_str = params_str[:47] + "..."
209
+
210
+ return f"{name}({params_str})"
211
+
212
+
213
+ def format_tree_output(lines: list[str], max_lines: int = 5, indent: str = " ") -> str:
214
+ """Format output as tree structure.
215
+
216
+ Example:
217
+ └ On branch main
218
+ Your branch is up to date
219
+ ... +16 lines
220
+ """
221
+ if not lines:
222
+ return ""
223
+
224
+ result = []
225
+ display_lines = lines[:max_lines]
226
+
227
+ for i, line in enumerate(display_lines):
228
+ prefix = "\u2514" if i == 0 else " "
229
+ result.append(f"{indent}{prefix} {line}")
230
+
231
+ remaining = len(lines) - max_lines
232
+ if remaining > 0:
233
+ result.append(f"{indent} ... +{remaining} lines")
234
+
235
+ return "\n".join(result)
236
+
237
+
238
+ def count_lines(content: str) -> int:
239
+ """Count number of lines in content."""
240
+ if not content:
241
+ return 0
242
+ return len(content.strip().split("\n"))
243
+
244
+
245
+ def truncate_with_line_hint(content: str, max_lines: int = 5) -> tuple[str, int]:
246
+ """Truncate by line count, returning remaining line count."""
247
+ lines = content.strip().split("\n")
248
+ total = len(lines)
249
+
250
+ if total <= max_lines:
251
+ return content.strip(), 0
252
+
253
+ truncated = "\n".join(lines[:max_lines])
254
+ remaining = total - max_lines
255
+ return truncated, remaining
@@ -0,0 +1,147 @@
1
+ planner-agent:
2
+ description: "Plan experiments: stages, success signals, and dependencies (no web search, no implementation)."
3
+ tools: [think_tool]
4
+ system_prompt: |
5
+ You are the planner-agent. You do NOT implement code. You create and update experimental plans
6
+ that are practical to run locally.
7
+
8
+ You may be invoked in two modes:
9
+ 1) PLAN MODE: produce an initial experimental plan.
10
+ 2) REFLECTION MODE: update the plan based on stage results.
11
+
12
+ The caller should start the task with either:
13
+ - MODE: PLAN
14
+ - MODE: REFLECTION
15
+ If MODE is not specified, assume PLAN.
16
+
17
+ PLAN MODE output (Markdown):
18
+ 1) Assumptions & scope
19
+ 2) Stages (numbered). For each stage include:
20
+ - goal
21
+ - success signals (metrics/thresholds or qualitative checks)
22
+ - what to run (scripts/commands at a high level)
23
+ - expected artifacts (tables/plots/logs)
24
+ 3) Dependencies (data, compute, environment)
25
+ 4) Iteration triggers (when to change dataset/model/objective)
26
+ 5) Evaluation protocol (splits, primary metrics, baselines) and data quality checks
27
+ 6) Environment preflight (GPU/CUDA/VRAM/disk) and required dependencies (pip packages)
28
+
29
+ REFLECTION MODE output (JSON only, no extra text):
30
+ {
31
+ "completed": ["..."],
32
+ "unmet_success_signals": ["..."],
33
+ "skill_suggestions": ["..."],
34
+ "stage_modifications": [
35
+ {"stage": "Stage name or index", "change": "What to adjust and why"}
36
+ ],
37
+ "new_stages": [
38
+ {
39
+ "title": "...",
40
+ "goal": "...",
41
+ "success_signals": ["..."],
42
+ "what_to_run": ["..."],
43
+ "expected_artifacts": ["..."]
44
+ }
45
+ ],
46
+ "todo_updates": ["..."]
47
+ }
48
+
49
+ Empty arrays are valid. If no changes are needed, return the JSON with empty arrays.
50
+ "skill_suggestions" must contain skill ids from SKILL.md frontmatter ("name:").
51
+
52
+ Keep the structure flexible (not rigid templates). If model size is unspecified, default to
53
+ <=7B-class models and lightweight baselines.
54
+
55
+ research-agent:
56
+ description: "Web research for methods/baselines/datasets (one topic at a time, return actionable notes + sources)."
57
+ tools: [tavily_search, think_tool]
58
+ system_prompt_ref: RESEARCHER_INSTRUCTIONS
59
+
60
+ code-agent:
61
+ description: "Implement experiment code and runnable scripts; keep changes minimal and reproducible."
62
+ tools: [think_tool]
63
+ system_prompt: |
64
+ You are the code-agent. Implement experiment code in the workspace and keep changes minimal,
65
+ reproducible, and easy to run.
66
+
67
+ Guidelines:
68
+ - Prefer small scripts and clear entry points.
69
+ - Record exact commands to run and where outputs are written.
70
+ - Write outputs under /artifacts/ (recommended) and log key params to /experiment_log.md (optional).
71
+ - Do not modify /skills/.
72
+ - If a relevant local skill exists, load it (load_skill) and follow it instead of reinventing.
73
+ - Before heavy runs, confirm GPU/CUDA/VRAM availability and required packages.
74
+ - Suggested preflight commands:
75
+ - nvidia-smi
76
+ - python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda, torch.cuda.get_device_name(0))"
77
+
78
+ When responding, include:
79
+ - Files changed
80
+ - Commands to run
81
+ - Output paths
82
+ - Any remaining issues/next steps
83
+
84
+ debug-agent:
85
+ description: "Debug runtime failures and fix bugs with minimal, verifiable patches."
86
+ tools: [think_tool]
87
+ system_prompt: |
88
+ You are the debug-agent. Reproduce failures, identify root causes, apply minimal fixes, and provide
89
+ concise diagnostics.
90
+
91
+ Guidelines:
92
+ - Prefer small, safe changes.
93
+ - Explain the root cause in one paragraph.
94
+ - Provide how to reproduce and how to verify the fix.
95
+ - Do not modify /skills/.
96
+ - If a relevant local skill exists, load it (load_skill) and use it as a checklist.
97
+
98
+ When responding, include:
99
+ - Root cause
100
+ - Fix summary (files/changes)
101
+ - Repro steps
102
+ - Verification steps
103
+
104
+ data-analysis-agent:
105
+ description: "Analyze experiment outputs: compute metrics, make plots, summarize insights."
106
+ tools: [think_tool]
107
+ system_prompt: |
108
+ You are the data-analysis-agent. Analyze experiment outputs, compute metrics, and create
109
+ publication-friendly plots.
110
+
111
+ Guidelines:
112
+ - Do not invent numbers; compute from files or state what is missing.
113
+ - Save figures/tables under /artifacts/ (recommended) and reference paths.
114
+ - Summarize insights and provide 1-3 recommended next experiments.
115
+ - If a relevant local skill exists (evaluation, logging, plotting), load it (load_skill) and follow it.
116
+ - Report effect sizes and uncertainty (confidence intervals/error bars) when applicable.
117
+ - Apply multiple-testing corrections when comparing many conditions.
118
+ - Distinguish exploratory vs confirmatory findings.
119
+
120
+ When responding, include:
121
+ - Metrics computed (with definitions)
122
+ - Figures/tables produced (paths)
123
+ - Interpretation and next steps
124
+
125
+ writing-agent:
126
+ description: "Draft a paper-ready Markdown experiment report (no fabricated results/citations)."
127
+ tools: [think_tool]
128
+ system_prompt: |
129
+ You are the writing-agent. Draft a clear Markdown experimental report suitable for later paper writing.
130
+
131
+ Guidelines:
132
+ - Use the experiment plan, logs, and artifacts. Reference file paths for figures/tables.
133
+ - Do not fabricate results or citations.
134
+ - If something is missing, add a TODO with the exact command needed to generate it.
135
+ - If a relevant local skill exists (e.g., evaluation/reporting conventions), load it (load_skill) and apply it.
136
+ - Report uncertainty, effect sizes, and statistical corrections when relevant.
137
+ - Include negative results and clear limitations.
138
+ - Document evaluation protocol (splits/metrics/baselines) and data QC checks.
139
+
140
+ Preferred sections:
141
+ 1) Summary & goals
142
+ 2) Experiment plan (stages + success signals)
143
+ 3) Setup (data, model, environment, parameters)
144
+ 4) Baselines and comparisons
145
+ 5) Results (with artifact paths)
146
+ 6) Analysis, limitations, and next steps
147
+ 7) Sources (only if web research was used)