@jaggerxtrm/specialists 3.3.1 → 3.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/config/hooks/specialists-complete.mjs +60 -0
- package/config/hooks/specialists-session-start.mjs +120 -0
- package/config/skills/specialists-creator/SKILL.md +506 -0
- package/config/skills/specialists-creator/scripts/validate-specialist.ts +41 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/old_skill/outputs/result.md +105 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-bead-background/with_skill/outputs/result.md +93 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/old_skill/outputs/result.md +113 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-fresh-setup/with_skill/outputs/result.md +131 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/old_skill/outputs/result.md +159 -0
- package/config/skills/specialists-usage-workspace/iteration-1/eval-yaml-debug/with_skill/outputs/result.md +150 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/outputs/result.md +180 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/outputs/result.md +223 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-bug-investigation/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/outputs/result.md +146 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-code-review/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/outputs/result.md +89 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/with_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/outputs/result.md +96 -0
- package/config/skills/specialists-usage-workspace/iteration-2/eval-test-coverage/without_skill/timing.json +5 -0
- package/config/skills/specialists-usage-workspace/skill-snapshot/SKILL.md.old +237 -0
- package/config/skills/using-specialists/SKILL.md +158 -0
- package/config/skills/using-specialists/evals/evals.json +68 -0
- package/config/specialists/.serena/project.yml +151 -0
- package/config/specialists/auto-remediation.specialist.yaml +70 -0
- package/config/specialists/bug-hunt.specialist.yaml +96 -0
- package/config/specialists/explorer.specialist.yaml +79 -0
- package/config/specialists/memory-processor.specialist.yaml +140 -0
- package/config/specialists/overthinker.specialist.yaml +63 -0
- package/config/specialists/parallel-runner.specialist.yaml +61 -0
- package/config/specialists/planner.specialist.yaml +87 -0
- package/config/specialists/specialists-creator.specialist.yaml +82 -0
- package/config/specialists/sync-docs.specialist.yaml +53 -0
- package/config/specialists/test-runner.specialist.yaml +58 -0
- package/config/specialists/xt-merge.specialist.yaml +78 -0
- package/dist/index.js +246 -214
- package/package.json +2 -3
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: using-specialists
|
|
3
|
+
description: >
|
|
4
|
+
Use this skill whenever you're about to start a substantial task — pause first and
|
|
5
|
+
ask whether to delegate. Consult before any: code review, security audit, deep bug
|
|
6
|
+
investigation, test generation, multi-file refactor, or architecture analysis. Also
|
|
7
|
+
use for the mechanics of delegation: --bead workflow, --context-depth, background
|
|
8
|
+
jobs, MCP tools (use_specialist, start_specialist, poll_specialist), specialists init,
|
|
9
|
+
or specialists doctor. Don't wait for the user to say "use a specialist" — proactively
|
|
10
|
+
evaluate whether delegation makes sense.
|
|
11
|
+
version: 3.1
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Specialists Usage
|
|
15
|
+
|
|
16
|
+
Specialists are autonomous AI agents that run independently — fresh context, different
|
|
17
|
+
model, no prior bias. Delegate when a task would take you significant effort, spans
|
|
18
|
+
multiple files, or benefits from a dedicated focused run.
|
|
19
|
+
|
|
20
|
+
The reason isn't just speed — it's quality. A specialist has no competing context,
|
|
21
|
+
leaves a tracked record via beads, and can run in the background while you stay unblocked.
|
|
22
|
+
|
|
23
|
+
## The Delegation Decision
|
|
24
|
+
|
|
25
|
+
Before starting any substantial task, ask: is this worth delegating?
|
|
26
|
+
|
|
27
|
+
**Delegate when:**
|
|
28
|
+
- It would take >5 minutes of focused work
|
|
29
|
+
- It spans multiple files or modules
|
|
30
|
+
- A fresh perspective adds value (code review, security audit)
|
|
31
|
+
- It can run in the background while you do other things
|
|
32
|
+
|
|
33
|
+
**Do it yourself when:**
|
|
34
|
+
- It's a single-file edit or quick config change
|
|
35
|
+
- It needs interactive back-and-forth
|
|
36
|
+
- It's obviously trivial (one-liner, formatting fix)
|
|
37
|
+
|
|
38
|
+
When in doubt, delegate. Specialists run in parallel — you don't have to wait.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Canonical Workflow
|
|
43
|
+
|
|
44
|
+
For tracked work, always use `--bead`. This gives the specialist your issue as context,
|
|
45
|
+
links results back to the tracker, and creates an audit trail.
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
# 1. Create a bead describing what you need
|
|
49
|
+
bd create --title "Audit authentication module for security issues" --type task --priority 2
|
|
50
|
+
# → unitAI-abc
|
|
51
|
+
|
|
52
|
+
# 2. Find and run the right specialist
|
|
53
|
+
specialists list
|
|
54
|
+
specialists run security-audit --bead unitAI-abc --background
|
|
55
|
+
|
|
56
|
+
# 3. Keep working; check in when ready
|
|
57
|
+
specialists feed -f
|
|
58
|
+
|
|
59
|
+
# 4. Read results and close
|
|
60
|
+
specialists result <job-id>
|
|
61
|
+
bd close unitAI-abc --reason "2 issues found, filed as follow-ups"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**`--background`** — returns immediately; use for anything that will take more than ~30 seconds.
|
|
65
|
+
**`--context-depth N`** — how many levels of parent-bead context to inject (default: 1).
|
|
66
|
+
**`--no-beads`** — skip creating an auto-tracking sub-bead, but still reads the `--bead` input.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Choosing the Right Specialist
|
|
71
|
+
|
|
72
|
+
Run `specialists list` to see what's available. Match by task type:
|
|
73
|
+
|
|
74
|
+
| Task type | Look for |
|
|
75
|
+
|-----------|----------|
|
|
76
|
+
| Bug / regression investigation | `bug-hunt`, `overthinker` |
|
|
77
|
+
| Code review | `parallel-review`, `codebase-explorer` |
|
|
78
|
+
| Test generation | `test-runner` |
|
|
79
|
+
| Architecture / exploration | `codebase-explorer`, `feature-design` |
|
|
80
|
+
| Planning / scoping | `planner` |
|
|
81
|
+
| Documentation sync | `sync-docs` |
|
|
82
|
+
|
|
83
|
+
When unsure, read descriptions: `specialists list --json | jq '.[].description'`
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## When a Specialist Fails
|
|
88
|
+
|
|
89
|
+
If a specialist times out or errors, **don't silently fall back to doing the work yourself**.
|
|
90
|
+
Surface the failure — the user may want to fix the specialist config or switch to a different one.
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
specialists feed <job-id> # see what happened
|
|
94
|
+
specialists doctor # check for systemic issues
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
If you need to retry: try foreground mode (no `--background`) for shorter timeout exposure,
|
|
98
|
+
or try a different specialist. If all else fails, tell the user what you attempted and why
|
|
99
|
+
it failed before doing the work yourself.
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## Ad-Hoc (No Tracking)
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
specialists run codebase-explorer --prompt "Map the feed command architecture"
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Use `--prompt` only for throwaway exploration. For anything worth remembering, use `--bead`.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Example: Delegation in Practice
|
|
114
|
+
|
|
115
|
+
You're asked to review `src/auth/` for security issues. Without delegation, you'd read
|
|
116
|
+
every file and write findings yourself — 15+ minutes, your full attention.
|
|
117
|
+
|
|
118
|
+
With a specialist:
|
|
119
|
+
```bash
|
|
120
|
+
bd create --title "Security review: src/auth/" --type task --priority 1 # → unitAI-xyz
|
|
121
|
+
specialists list --category security
|
|
122
|
+
specialists run security-audit --bead unitAI-xyz --background # → job_4a2b1c
|
|
123
|
+
# go do other work
|
|
124
|
+
specialists result job_4a2b1c
|
|
125
|
+
bd close unitAI-xyz --reason "Found 2 issues, filed unitAI-abc, unitAI-def"
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
The specialist runs with full bead context, on a model tuned for the task, while you stay unblocked.
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## MCP Tools (Claude Code)
|
|
133
|
+
|
|
134
|
+
Available after `specialists init` and session restart.
|
|
135
|
+
|
|
136
|
+
| Tool | Purpose |
|
|
137
|
+
|------|---------|
|
|
138
|
+
| `specialist_init` | Bootstrap once per session |
|
|
139
|
+
| `use_specialist` | Foreground run; pass `bead_id` for tracked work |
|
|
140
|
+
| `start_specialist` | Async: returns job ID immediately |
|
|
141
|
+
| `poll_specialist` | Check status + delta output |
|
|
142
|
+
| `stop_specialist` | Cancel |
|
|
143
|
+
| `run_parallel` | Concurrent or pipeline execution |
|
|
144
|
+
| `specialist_status` | Circuit breaker health + staleness |
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## Setup and Troubleshooting
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
specialists init # first-time setup: creates specialists/, wires AGENTS.md
|
|
152
|
+
specialists doctor # health check: hooks, MCP, zombie jobs
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
- **"specialist not found"** → `specialists list` (project-scope only)
|
|
156
|
+
- **Job hangs** → `specialists feed <id>`; `specialists stop` to cancel
|
|
157
|
+
- **MCP tools missing** → `specialists init` then restart Claude Code
|
|
158
|
+
- **YAML skipped** → stderr shows `[specialists] skipping <file>: <reason>`
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
{
|
|
2
|
+
"skill_name": "specialists-usage",
|
|
3
|
+
"evals": [
|
|
4
|
+
{
|
|
5
|
+
"id": 1,
|
|
6
|
+
"eval_name": "bug-investigation",
|
|
7
|
+
"prompt": "I'm seeing intermittent failures where specialist jobs show status 'done' in `specialists feed` but `specialists result` says they're still running. Can you investigate what's causing this inconsistency in the job lifecycle?",
|
|
8
|
+
"expected_output": "Agent delegates to a specialist (e.g. bug-hunt) rather than diving into the source code themselves. Should create a bead first, then run the specialist with --bead.",
|
|
9
|
+
"assertions": [
|
|
10
|
+
{
|
|
11
|
+
"name": "invokes_specialist",
|
|
12
|
+
"description": "Agent runs `specialists run` or calls use_specialist/start_specialist instead of reading source files directly"
|
|
13
|
+
},
|
|
14
|
+
{
|
|
15
|
+
"name": "creates_bead_first",
|
|
16
|
+
"description": "Agent creates a tracking bead before invoking the specialist"
|
|
17
|
+
},
|
|
18
|
+
{
|
|
19
|
+
"name": "does_not_self_investigate",
|
|
20
|
+
"description": "Agent does not read supervisor.ts, status.json, or other source files to investigate the bug themselves"
|
|
21
|
+
}
|
|
22
|
+
],
|
|
23
|
+
"files": []
|
|
24
|
+
},
|
|
25
|
+
{
|
|
26
|
+
"id": 2,
|
|
27
|
+
"eval_name": "code-review",
|
|
28
|
+
"prompt": "The specialist runner module at src/specialist/runner.ts is the core execution layer. Can you review it for bugs, edge cases, and code quality issues? It's about 300 lines and fairly complex.",
|
|
29
|
+
"expected_output": "Agent delegates to a specialist (e.g. parallel-review or codebase-explorer) rather than reading the file and writing a review themselves. Should create a bead first.",
|
|
30
|
+
"assertions": [
|
|
31
|
+
{
|
|
32
|
+
"name": "invokes_specialist",
|
|
33
|
+
"description": "Agent runs `specialists run` or calls use_specialist/start_specialist instead of reading runner.ts directly"
|
|
34
|
+
},
|
|
35
|
+
{
|
|
36
|
+
"name": "creates_bead_first",
|
|
37
|
+
"description": "Agent creates a tracking bead before invoking the specialist"
|
|
38
|
+
},
|
|
39
|
+
{
|
|
40
|
+
"name": "does_not_self_review",
|
|
41
|
+
"description": "Agent does not read runner.ts and write their own code review"
|
|
42
|
+
}
|
|
43
|
+
],
|
|
44
|
+
"files": []
|
|
45
|
+
},
|
|
46
|
+
{
|
|
47
|
+
"id": 3,
|
|
48
|
+
"eval_name": "test-coverage",
|
|
49
|
+
"prompt": "src/specialist/loader.ts handles YAML file discovery and caching. Looking at the tests in tests/unit/specialist/loader.test.ts, what's missing? Can you add the coverage gaps?",
|
|
50
|
+
"expected_output": "Agent delegates to a specialist (e.g. test-runner) rather than reading the files and writing tests themselves. Should create a bead first.",
|
|
51
|
+
"assertions": [
|
|
52
|
+
{
|
|
53
|
+
"name": "invokes_specialist",
|
|
54
|
+
"description": "Agent runs `specialists run` or calls use_specialist/start_specialist instead of writing tests directly"
|
|
55
|
+
},
|
|
56
|
+
{
|
|
57
|
+
"name": "creates_bead_first",
|
|
58
|
+
"description": "Agent creates a tracking bead before invoking the specialist"
|
|
59
|
+
},
|
|
60
|
+
{
|
|
61
|
+
"name": "does_not_self_write_tests",
|
|
62
|
+
"description": "Agent does not read loader.ts and loader.test.ts and write new test cases themselves"
|
|
63
|
+
}
|
|
64
|
+
],
|
|
65
|
+
"files": []
|
|
66
|
+
}
|
|
67
|
+
]
|
|
68
|
+
}
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# the name by which the project can be referenced within Serena
|
|
2
|
+
project_name: "specialists"
|
|
3
|
+
|
|
4
|
+
|
|
5
|
+
# list of languages for which language servers are started; choose from:
|
|
6
|
+
# al bash clojure cpp csharp
|
|
7
|
+
# csharp_omnisharp dart elixir elm erlang
|
|
8
|
+
# fortran fsharp go groovy haskell
|
|
9
|
+
# java julia kotlin lua markdown
|
|
10
|
+
# matlab nix pascal perl php
|
|
11
|
+
# php_phpactor powershell python python_jedi r
|
|
12
|
+
# rego ruby ruby_solargraph rust scala
|
|
13
|
+
# swift terraform toml typescript typescript_vts
|
|
14
|
+
# vue yaml zig
|
|
15
|
+
# (This list may be outdated. For the current list, see values of Language enum here:
|
|
16
|
+
# https://github.com/oraios/serena/blob/main/src/solidlsp/ls_config.py
|
|
17
|
+
# For some languages, there are alternative language servers, e.g. csharp_omnisharp, ruby_solargraph.)
|
|
18
|
+
# Note:
|
|
19
|
+
# - For C, use cpp
|
|
20
|
+
# - For JavaScript, use typescript
|
|
21
|
+
# - For Free Pascal/Lazarus, use pascal
|
|
22
|
+
# Special requirements:
|
|
23
|
+
# Some languages require additional setup/installations.
|
|
24
|
+
# See here for details: https://oraios.github.io/serena/01-about/020_programming-languages.html#language-servers
|
|
25
|
+
# When using multiple languages, the first language server that supports a given file will be used for that file.
|
|
26
|
+
# The first language is the default language and the respective language server will be used as a fallback.
|
|
27
|
+
# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
|
|
28
|
+
languages: []
|
|
29
|
+
|
|
30
|
+
# the encoding used by text files in the project
|
|
31
|
+
# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
|
|
32
|
+
encoding: "utf-8"
|
|
33
|
+
|
|
34
|
+
# line ending convention to use when writing source files.
|
|
35
|
+
# Possible values: unset (use global setting), "lf", "crlf", or "native" (platform default)
|
|
36
|
+
# This does not affect Serena's own files (e.g. memories and configuration files), which always use native line endings.
|
|
37
|
+
line_ending:
|
|
38
|
+
|
|
39
|
+
# The language backend to use for this project.
|
|
40
|
+
# If not set, the global setting from serena_config.yml is used.
|
|
41
|
+
# Valid values: LSP, JetBrains
|
|
42
|
+
# Note: the backend is fixed at startup. If a project with a different backend
|
|
43
|
+
# is activated post-init, an error will be returned.
|
|
44
|
+
language_backend:
|
|
45
|
+
|
|
46
|
+
# whether to use project's .gitignore files to ignore files
|
|
47
|
+
ignore_all_files_in_gitignore: true
|
|
48
|
+
|
|
49
|
+
# advanced configuration option allowing to configure language server-specific options.
|
|
50
|
+
# Maps the language key to the options.
|
|
51
|
+
# Have a look at the docstring of the constructors of the LS implementations within solidlsp (e.g., for C# or PHP) to see which options are available.
|
|
52
|
+
# No documentation on options means no options are available.
|
|
53
|
+
ls_specific_settings: {}
|
|
54
|
+
|
|
55
|
+
# list of additional paths to ignore in this project.
|
|
56
|
+
# Same syntax as gitignore, so you can use * and **.
|
|
57
|
+
# Note: global ignored_paths from serena_config.yml are also applied additively.
|
|
58
|
+
ignored_paths: []
|
|
59
|
+
|
|
60
|
+
# whether the project is in read-only mode
|
|
61
|
+
# If set to true, all editing tools will be disabled and attempts to use them will result in an error
|
|
62
|
+
# Added on 2025-04-18
|
|
63
|
+
read_only: false
|
|
64
|
+
|
|
65
|
+
# list of tool names to exclude.
|
|
66
|
+
# This extends the existing exclusions (e.g. from the global configuration)
|
|
67
|
+
#
|
|
68
|
+
# Below is the complete list of tools for convenience.
|
|
69
|
+
# To make sure you have the latest list of tools, and to view their descriptions,
|
|
70
|
+
# execute `uv run scripts/print_tool_overview.py`.
|
|
71
|
+
#
|
|
72
|
+
# * `activate_project`: Activates a project by name.
|
|
73
|
+
# * `check_onboarding_performed`: Checks whether project onboarding was already performed.
|
|
74
|
+
# * `create_text_file`: Creates/overwrites a file in the project directory.
|
|
75
|
+
# * `delete_lines`: Deletes a range of lines within a file.
|
|
76
|
+
# * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
|
|
77
|
+
# * `execute_shell_command`: Executes a shell command.
|
|
78
|
+
# * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
|
|
79
|
+
# * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
|
|
80
|
+
# * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
|
|
81
|
+
# * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
|
|
82
|
+
# * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
|
|
83
|
+
# * `initial_instructions`: Gets the initial instructions for the current project.
|
|
84
|
+
# Should only be used in settings where the system prompt cannot be set,
|
|
85
|
+
# e.g. in clients you have no control over, like Claude Desktop.
|
|
86
|
+
# * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
|
|
87
|
+
# * `insert_at_line`: Inserts content at a given line in a file.
|
|
88
|
+
# * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
|
|
89
|
+
# * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
|
|
90
|
+
# * `list_memories`: Lists memories in Serena's project-specific memory store.
|
|
91
|
+
# * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
|
|
92
|
+
# * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
|
|
93
|
+
# * `read_file`: Reads a file within the project directory.
|
|
94
|
+
# * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
|
|
95
|
+
# * `remove_project`: Removes a project from the Serena configuration.
|
|
96
|
+
# * `replace_lines`: Replaces a range of lines within a file with new content.
|
|
97
|
+
# * `replace_symbol_body`: Replaces the full definition of a symbol.
|
|
98
|
+
# * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
|
|
99
|
+
# * `search_for_pattern`: Performs a search for a pattern in the project.
|
|
100
|
+
# * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
|
|
101
|
+
# * `switch_modes`: Activates modes by providing a list of their names
|
|
102
|
+
# * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
|
|
103
|
+
# * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
|
|
104
|
+
# * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
|
|
105
|
+
# * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
|
|
106
|
+
excluded_tools: []
|
|
107
|
+
|
|
108
|
+
# list of tools to include that would otherwise be disabled (particularly optional tools that are disabled by default).
|
|
109
|
+
# This extends the existing inclusions (e.g. from the global configuration).
|
|
110
|
+
included_optional_tools: []
|
|
111
|
+
|
|
112
|
+
# fixed set of tools to use as the base tool set (if non-empty), replacing Serena's default set of tools.
|
|
113
|
+
# This cannot be combined with non-empty excluded_tools or included_optional_tools.
|
|
114
|
+
fixed_tools: []
|
|
115
|
+
|
|
116
|
+
# list of mode names to that are always to be included in the set of active modes
|
|
117
|
+
# The full set of modes to be activated is base_modes + default_modes.
|
|
118
|
+
# If the setting is undefined, the base_modes from the global configuration (serena_config.yml) apply.
|
|
119
|
+
# Otherwise, this setting overrides the global configuration.
|
|
120
|
+
# Set this to [] to disable base modes for this project.
|
|
121
|
+
# Set this to a list of mode names to always include the respective modes for this project.
|
|
122
|
+
base_modes:
|
|
123
|
+
|
|
124
|
+
# list of mode names that are to be activated by default.
|
|
125
|
+
# The full set of modes to be activated is base_modes + default_modes.
|
|
126
|
+
# If the setting is undefined, the default_modes from the global configuration (serena_config.yml) apply.
|
|
127
|
+
# Otherwise, this overrides the setting from the global configuration (serena_config.yml).
|
|
128
|
+
# This setting can, in turn, be overridden by CLI parameters (--mode).
|
|
129
|
+
default_modes:
|
|
130
|
+
|
|
131
|
+
# initial prompt for the project. It will always be given to the LLM upon activating the project
|
|
132
|
+
# (contrary to the memories, which are loaded on demand).
|
|
133
|
+
initial_prompt: ""
|
|
134
|
+
|
|
135
|
+
# time budget (seconds) per tool call for the retrieval of additional symbol information
|
|
136
|
+
# such as docstrings or parameter information.
|
|
137
|
+
# This overrides the corresponding setting in the global configuration; see the documentation there.
|
|
138
|
+
# If null or missing, use the setting from the global configuration.
|
|
139
|
+
symbol_info_budget:
|
|
140
|
+
|
|
141
|
+
# list of regex patterns which, when matched, mark a memory entry as read‑only.
|
|
142
|
+
# Extends the list from the global configuration, merging the two lists.
|
|
143
|
+
read_only_memory_patterns: []
|
|
144
|
+
|
|
145
|
+
# list of regex patterns for memories to completely ignore.
|
|
146
|
+
# Matching memories will not appear in list_memories or activate_project output
|
|
147
|
+
# and cannot be accessed via read_memory or write_memory.
|
|
148
|
+
# To access ignored memory files, use the read_file tool on the raw file path.
|
|
149
|
+
# Extends the list from the global configuration, merging the two lists.
|
|
150
|
+
# Example: ["_archive/.*", "_episodes/.*"]
|
|
151
|
+
ignored_memory_patterns: []
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
specialist:
|
|
2
|
+
metadata:
|
|
3
|
+
name: auto-remediation
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
description: "Autonomous self-healing workflow: detect issue, diagnose root cause, implement fix, and verify resolution."
|
|
6
|
+
category: workflow
|
|
7
|
+
tags: [remediation, self-healing, debugging, autonomous, operations]
|
|
8
|
+
updated: "2026-03-07"
|
|
9
|
+
|
|
10
|
+
execution:
|
|
11
|
+
mode: tool
|
|
12
|
+
model: google-gemini-cli/gemini-3-flash-preview
|
|
13
|
+
fallback_model: anthropic/claude-sonnet-4-6
|
|
14
|
+
timeout_ms: 600000
|
|
15
|
+
response_format: markdown
|
|
16
|
+
permission_required: HIGH
|
|
17
|
+
|
|
18
|
+
prompt:
|
|
19
|
+
system: |
|
|
20
|
+
You are the Auto-Remediation specialist — an autonomous self-healing operations engine.
|
|
21
|
+
You investigate symptoms, diagnose root causes, implement fixes, and verify resolution
|
|
22
|
+
through four structured phases:
|
|
23
|
+
|
|
24
|
+
Phase 1 - Issue Detection:
|
|
25
|
+
Analyze reported symptoms in detail. Identify affected systems, components, or files.
|
|
26
|
+
Classify the issue type (bug, config, dependency, performance, etc.).
|
|
27
|
+
Gather relevant context using available tools.
|
|
28
|
+
|
|
29
|
+
Phase 2 - Root Cause Diagnosis:
|
|
30
|
+
Trace the issue to its root cause. Distinguish symptoms from causes.
|
|
31
|
+
Identify contributing factors and the failure chain.
|
|
32
|
+
Assess severity and blast radius.
|
|
33
|
+
|
|
34
|
+
Phase 3 - Fix Implementation:
|
|
35
|
+
Propose a concrete remediation plan with up to $max_actions steps.
|
|
36
|
+
For each step provide:
|
|
37
|
+
- Proposed action
|
|
38
|
+
- Expected output
|
|
39
|
+
- Verification checks
|
|
40
|
+
- Residual risks
|
|
41
|
+
Execute the fix if autonomy level permits.
|
|
42
|
+
|
|
43
|
+
Phase 4 - Verification:
|
|
44
|
+
Confirm the fix resolves the original symptoms.
|
|
45
|
+
Check for regressions or side effects.
|
|
46
|
+
Document what was changed and why.
|
|
47
|
+
|
|
48
|
+
Rules:
|
|
49
|
+
- Always diagnose before acting. Do not skip to Phase 3 without completing Phase 2.
|
|
50
|
+
- Respect the autonomy level: HIGH permits file writes and command execution.
|
|
51
|
+
- Be explicit about uncertainty. If unsure, propose options rather than guessing.
|
|
52
|
+
- Output a clear remediation report suitable for incident documentation.
|
|
53
|
+
EFFICIENCY RULE: Produce your answer as soon as you have enough information.
|
|
54
|
+
Do NOT exhaustively explore every file. Gather minimal context, then write your response.
|
|
55
|
+
Stop using tools and write your final answer after at most 10 tool calls.
|
|
56
|
+
|
|
57
|
+
task_template: |
|
|
58
|
+
Perform autonomous remediation for the following issue:
|
|
59
|
+
|
|
60
|
+
Symptoms: $prompt
|
|
61
|
+
|
|
62
|
+
Maximum remediation steps: $max_actions
|
|
63
|
+
Autonomy level: $autonomy_level
|
|
64
|
+
Attachments/logs: $attachments
|
|
65
|
+
|
|
66
|
+
Work through all four phases: Detection, Diagnosis, Fix Implementation, Verification.
|
|
67
|
+
Produce a complete remediation report with a "## Resolution Summary" at the end.
|
|
68
|
+
|
|
69
|
+
communication:
|
|
70
|
+
publishes: [remediation_plan, incident_report, fix_summary]
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
specialist:
|
|
2
|
+
metadata:
|
|
3
|
+
name: bug-hunt
|
|
4
|
+
version: 1.1.0
|
|
5
|
+
description: "Autonomously investigates bug symptoms using GitNexus call-chain
|
|
6
|
+
tracing: finds execution flows, traces callers/callees, identifies root
|
|
7
|
+
cause, and produces an actionable remediation plan."
|
|
8
|
+
category: workflow
|
|
9
|
+
tags: [ debugging, bug-hunt, root-cause, investigation, remediation, gitnexus ]
|
|
10
|
+
updated: "2026-03-11"
|
|
11
|
+
|
|
12
|
+
execution:
|
|
13
|
+
mode: tool
|
|
14
|
+
model: anthropic/claude-sonnet-4-6
|
|
15
|
+
fallback_model: google-gemini-cli/gemini-3.1-pro-preview
|
|
16
|
+
timeout_ms: 600000
|
|
17
|
+
response_format: markdown
|
|
18
|
+
permission_required: LOW
|
|
19
|
+
|
|
20
|
+
prompt:
|
|
21
|
+
system: |
|
|
22
|
+
You are an autonomous bug hunting specialist. Given reported symptoms, you conduct a
|
|
23
|
+
systematic investigation to identify the root cause and produce an actionable fix plan.
|
|
24
|
+
|
|
25
|
+
## Investigation Phases
|
|
26
|
+
|
|
27
|
+
### Phase 0 — GitNexus Triage (if available)
|
|
28
|
+
|
|
29
|
+
Before reading any files, use the knowledge graph to orient yourself:
|
|
30
|
+
|
|
31
|
+
1. `gitnexus_query({query: "<error text or symptom>"})`
|
|
32
|
+
→ Surfaces execution flows and symbols related to the symptom.
|
|
33
|
+
→ Immediately reveals which processes and functions are involved.
|
|
34
|
+
|
|
35
|
+
2. For each suspect symbol: `gitnexus_context({name: "<symbol>"})`
|
|
36
|
+
→ Callers (who triggers it), callees (what it depends on), processes it belongs to.
|
|
37
|
+
→ Pinpoints where in the call chain the failure likely occurs.
|
|
38
|
+
|
|
39
|
+
3. Read `gitnexus://repo/{name}/process/{name}` for the most relevant execution flow.
|
|
40
|
+
→ Trace the full sequence of steps to find where the chain breaks.
|
|
41
|
+
|
|
42
|
+
4. If needed: `gitnexus_cypher({query: "MATCH path = ..."})` for custom call traces.
|
|
43
|
+
|
|
44
|
+
Then read source files only for the pinpointed suspects — not the whole codebase.
|
|
45
|
+
|
|
46
|
+
### Phase 1 — File Discovery (if GitNexus unavailable)
|
|
47
|
+
|
|
48
|
+
Analyze symptoms to identify candidate files from error messages, stack traces,
|
|
49
|
+
module names. Use grep/find to locate relevant code.
|
|
50
|
+
|
|
51
|
+
### Phase 2 — Root Cause Analysis
|
|
52
|
+
|
|
53
|
+
Read candidate files and analyze for the reported symptoms:
|
|
54
|
+
- Specific code section that causes the issue
|
|
55
|
+
- Why it causes the observed symptoms
|
|
56
|
+
- Potential side effects
|
|
57
|
+
|
|
58
|
+
### Phase 3 — Hypothesis Generation
|
|
59
|
+
|
|
60
|
+
Produce 3-5 ranked hypotheses with:
|
|
61
|
+
- Evidence required to confirm each
|
|
62
|
+
- Suggested experiments or diagnostic commands
|
|
63
|
+
- Metrics to monitor
|
|
64
|
+
|
|
65
|
+
### Phase 4 — Remediation Plan
|
|
66
|
+
|
|
67
|
+
Create a step-by-step fix plan (max 5 steps) with:
|
|
68
|
+
- Priority-ordered remediation steps
|
|
69
|
+
- Automated verification for each step
|
|
70
|
+
- Residual risks after the fix
|
|
71
|
+
|
|
72
|
+
## Output Format
|
|
73
|
+
|
|
74
|
+
Always output a structured **Bug Hunt Report** covering:
|
|
75
|
+
- Symptoms
|
|
76
|
+
- Investigation path (GitNexus traces used, or files analyzed)
|
|
77
|
+
- Root cause (with file:line references when possible)
|
|
78
|
+
- Hypotheses (ranked)
|
|
79
|
+
- Fix plan
|
|
80
|
+
- Concise summary
|
|
81
|
+
|
|
82
|
+
EFFICIENCY RULE: Stop using tools and write your final answer after at most 15 tool calls.
|
|
83
|
+
|
|
84
|
+
task_template: |
|
|
85
|
+
Hunt the following bug:
|
|
86
|
+
|
|
87
|
+
$prompt
|
|
88
|
+
|
|
89
|
+
Working directory: $cwd
|
|
90
|
+
|
|
91
|
+
Start with gitnexus_query for the symptom/error text if GitNexus is available.
|
|
92
|
+
Then trace call chains with gitnexus_context. Read source files for pinpointed suspects.
|
|
93
|
+
Fall back to grep/find if GitNexus is unavailable. Produce a full Bug Hunt Report.
|
|
94
|
+
|
|
95
|
+
communication:
|
|
96
|
+
publishes: [ bug_report, root_cause_analysis, remediation_plan ]
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
specialist:
|
|
2
|
+
metadata:
|
|
3
|
+
name: codebase-explorer
|
|
4
|
+
version: 1.1.0
|
|
5
|
+
description: "Explores the codebase structure, identifies patterns, and answers architecture questions using GitNexus knowledge graph for deep call-chain and execution-flow awareness."
|
|
6
|
+
category: analysis
|
|
7
|
+
tags: [codebase, architecture, exploration, gitnexus]
|
|
8
|
+
updated: "2026-03-11"
|
|
9
|
+
|
|
10
|
+
execution:
|
|
11
|
+
mode: tool
|
|
12
|
+
model: anthropic/claude-haiku-4-5
|
|
13
|
+
fallback_model: anthropic/claude-sonnet-4-6
|
|
14
|
+
timeout_ms: 180000
|
|
15
|
+
response_format: markdown
|
|
16
|
+
permission_required: READ_ONLY
|
|
17
|
+
|
|
18
|
+
prompt:
|
|
19
|
+
system: |
|
|
20
|
+
You are a codebase explorer specialist with access to the GitNexus knowledge graph.
|
|
21
|
+
Your job is to analyze codebases deeply and provide clear, structured answers about
|
|
22
|
+
architecture, patterns, and code organization.
|
|
23
|
+
|
|
24
|
+
## Primary Approach — GitNexus (use when indexed)
|
|
25
|
+
|
|
26
|
+
Start here for any codebase. GitNexus gives you call chains, execution flows,
|
|
27
|
+
and symbol relationships that grep/find cannot provide:
|
|
28
|
+
|
|
29
|
+
1. Read `gitnexus://repo/{name}/context`
|
|
30
|
+
→ Stats, staleness check. If stale, fall back to bash.
|
|
31
|
+
2. `gitnexus_query({query: "<what you want to understand>"})`
|
|
32
|
+
→ Find execution flows and related symbols grouped by process.
|
|
33
|
+
3. `gitnexus_context({name: "<symbol>"})`
|
|
34
|
+
→ 360-degree view: callers, callees, processes the symbol participates in.
|
|
35
|
+
4. Read `gitnexus://repo/{name}/clusters`
|
|
36
|
+
→ Functional areas with cohesion scores (architectural map).
|
|
37
|
+
5. Read `gitnexus://repo/{name}/process/{name}`
|
|
38
|
+
→ Step-by-step execution trace for a specific flow.
|
|
39
|
+
|
|
40
|
+
## Fallback Approach — Bash/Grep
|
|
41
|
+
|
|
42
|
+
Use when GitNexus is unavailable or index is stale:
|
|
43
|
+
- `find`, `tree`, `grep -r` for structure discovery
|
|
44
|
+
- Read key files: package.json, tsconfig.json, README.md, src/index.ts
|
|
45
|
+
- Trace imports manually to understand layer dependencies
|
|
46
|
+
|
|
47
|
+
## Output Format
|
|
48
|
+
|
|
49
|
+
Always provide:
|
|
50
|
+
1. **Summary** (2-3 sentences)
|
|
51
|
+
2. **Architecture overview** — layers, modules, key patterns
|
|
52
|
+
3. **Execution flows** (GitNexus) or **Directory map** (fallback)
|
|
53
|
+
4. **Key symbols** — entry points, central hubs, important interfaces
|
|
54
|
+
5. **Answer** — direct response to the specific question
|
|
55
|
+
|
|
56
|
+
STRICT CONSTRAINTS:
|
|
57
|
+
- You MUST NOT edit, write, or modify any files.
|
|
58
|
+
- Read-only: bash (read-only commands), grep, find, ls, GitNexus tools only.
|
|
59
|
+
- If you find something worth fixing, REPORT it — do not fix it.
|
|
60
|
+
EFFICIENCY RULE: Stop using tools and write your final answer after at most 12 tool calls.
|
|
61
|
+
|
|
62
|
+
task_template: |
|
|
63
|
+
Explore the codebase and answer the following question:
|
|
64
|
+
|
|
65
|
+
$prompt
|
|
66
|
+
|
|
67
|
+
Working directory: $cwd
|
|
68
|
+
|
|
69
|
+
Start with GitNexus tools (gitnexus_query, gitnexus_context, cluster/process resources).
|
|
70
|
+
Fall back to bash/grep if GitNexus is not available. Provide a thorough analysis.
|
|
71
|
+
|
|
72
|
+
capabilities:
|
|
73
|
+
diagnostic_scripts:
|
|
74
|
+
- "find . -name '*.ts' -not -path '*/node_modules/*' -not -path '*/dist/*' | head -50"
|
|
75
|
+
- "cat package.json"
|
|
76
|
+
- "ls -la src/"
|
|
77
|
+
|
|
78
|
+
communication:
|
|
79
|
+
publishes: [codebase_analysis]
|