@songsid/agend 0.0.17-beta.7 → 0.0.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/fleet-manager.d.ts +1 -1
- package/dist/fleet-manager.js +40 -13
- package/dist/fleet-manager.js.map +1 -1
- package/dist/general-knowledge/skills/fleet-config/SKILL.md +65 -0
- package/dist/general-knowledge/skills/fleet-health/SKILL.md +37 -0
- package/dist/general-knowledge/skills/fleet-restart/SKILL.md +48 -0
- package/dist/general-knowledge/skills/instance-lifecycle/SKILL.md +20 -0
- package/dist/general-knowledge/skills/model-discovery/SKILL.md +34 -0
- package/dist/general-knowledge/skills/session-management/SKILL.md +66 -0
- package/dist/general-knowledge/steering/core-rules.md +56 -0
- package/package.json +1 -1
- package/dist/general-knowledge/skills.md +0 -297
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fleet-config
|
|
3
|
+
description: fleet.yaml and classicBot.yaml structure, validation, common mistakes
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Configuration Quick Reference
|
|
7
|
+
|
|
8
|
+
**fleet.yaml structure:**
|
|
9
|
+
```yaml
|
|
10
|
+
channel: # Telegram/Discord connection
|
|
11
|
+
defaults: # Shared defaults for all instances
|
|
12
|
+
backend: kiro-cli
|
|
13
|
+
startup:
|
|
14
|
+
concurrency: 6 # Max simultaneous instance startups
|
|
15
|
+
stagger_delay_ms: 2000 # Delay between startup batches
|
|
16
|
+
instances: # Per-instance config (topic_id, working_directory, etc.)
|
|
17
|
+
templates: # Reusable fleet deployment templates
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**classicBot.yaml** — manages classic bot channels (separate from fleet.yaml):
|
|
21
|
+
- `defaults.allowed_guilds` — Discord server whitelist
|
|
22
|
+
- `defaults.allowed_groups` — Telegram group whitelist
|
|
23
|
+
- `channels` — per-channel backend override
|
|
24
|
+
- Hot-reloads every 30 seconds (no restart needed)
|
|
25
|
+
|
|
26
|
+
**Key config locations:**
|
|
27
|
+
- Fleet config: `~/.agend/fleet.yaml`
|
|
28
|
+
- Classic bot: `~/.agend/classicBot.yaml`
|
|
29
|
+
- Environment: `~/.agend/.env` (bot tokens, API keys)
|
|
30
|
+
- Instance logs: `~/.agend/instances/<name>/output.log`
|
|
31
|
+
- Fleet log: `~/.agend/fleet.log`
|
|
32
|
+
|
|
33
|
+
## Config Validation
|
|
34
|
+
|
|
35
|
+
**Before editing fleet.yaml or classicBot.yaml, always validate after:**
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# Validate fleet.yaml syntax
|
|
39
|
+
agend fleet start --dry-run 2>&1 | head -5
|
|
40
|
+
# Or simply:
|
|
41
|
+
node -e "const yaml = require('js-yaml'); const fs = require('fs'); yaml.load(fs.readFileSync('$HOME/.agend/fleet.yaml', 'utf-8')); console.log('✓ valid YAML')"
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Common fleet.yaml mistakes:**
|
|
45
|
+
- Missing `channel.mode` field → error on start
|
|
46
|
+
- Wrong indentation (YAML is indent-sensitive)
|
|
47
|
+
- `topic_id` as string vs number (both work, but be consistent)
|
|
48
|
+
- `backend` typo (valid: `claude-code`, `gemini-cli`, `codex`, `opencode`, `kiro-cli`, `antigravity`)
|
|
49
|
+
- `model` using wrong format for the backend
|
|
50
|
+
|
|
51
|
+
**classicBot.yaml validation:**
|
|
52
|
+
```bash
|
|
53
|
+
node -e "const yaml = require('js-yaml'); const fs = require('fs'); yaml.load(fs.readFileSync('$HOME/.agend/classicBot.yaml', 'utf-8')); console.log('✓ valid YAML')"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Common classicBot.yaml mistakes:**
|
|
57
|
+
- `allowed_guilds` values must be strings (Discord IDs are too large for YAML integers)
|
|
58
|
+
- Channel IDs as keys must be quoted strings
|
|
59
|
+
- Missing `defaults` section (optional but recommended)
|
|
60
|
+
|
|
61
|
+
**After editing config:**
|
|
62
|
+
```bash
|
|
63
|
+
agend reload # hot-reload (SIGHUP) — adds/removes instances without restart
|
|
64
|
+
agend fleet restart # if channel/defaults changed — needs full restart
|
|
65
|
+
```
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fleet-health
|
|
3
|
+
description: Check instance health via tmux, detect stuck agents, fleet-wide health scan
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Instance Health Check via tmux
|
|
7
|
+
|
|
8
|
+
When user asks to check an instance's status or what it's doing:
|
|
9
|
+
- Use `execute_bash` to run: `tmux capture-pane -t agend:<instance-name> -p | tail -20`
|
|
10
|
+
- This shows the actual CLI screen (what the agent sees right now)
|
|
11
|
+
- More useful than just "running/stopped" status
|
|
12
|
+
- If the instance appears stuck, suggest `/raw /compact` or restart
|
|
13
|
+
|
|
14
|
+
## Fleet Health Check
|
|
15
|
+
|
|
16
|
+
Check all instances for stuck/error state:
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
for win in $(tmux list-windows -t agend -F '#{window_name}' | grep -v bash); do
|
|
20
|
+
last=$(tmux capture-pane -t "agend:$win" -p | tail -3 | tr '\n' ' ')
|
|
21
|
+
if echo "$last" | grep -q "!>"; then
|
|
22
|
+
echo "✅ $win — idle"
|
|
23
|
+
elif echo "$last" | grep -q "error:"; then
|
|
24
|
+
echo "❌ $win — ERROR"
|
|
25
|
+
else
|
|
26
|
+
echo "⏳ $win — busy"
|
|
27
|
+
fi
|
|
28
|
+
done
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
States:
|
|
32
|
+
- ✅ idle — prompt visible (X% !>), ready for input
|
|
33
|
+
- ⏳ busy — processing a task, wait for it to finish
|
|
34
|
+
- ❌ error — check tmux pane for details, may need restart
|
|
35
|
+
|
|
36
|
+
If an instance is stuck (busy for >10 minutes with no output), restart it:
|
|
37
|
+
- `restart_instance("<instance-name>")`
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fleet-restart
|
|
3
|
+
description: Fleet restart types, recovery from tmux crash, rate limit handling, safe update
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Fleet Restart & Recovery
|
|
7
|
+
|
|
8
|
+
**Restart types:**
|
|
9
|
+
- `agend fleet restart` — full stop + start (picks up new code after build & link)
|
|
10
|
+
- `agend reload` — SIGHUP hot-reload, reconciles instances without restarting the fleet process
|
|
11
|
+
- `restart_instance("<name>")` — single instance restart, reloads fleet.yaml first
|
|
12
|
+
|
|
13
|
+
**After tmux crash:**
|
|
14
|
+
- Fleet auto-detects tmux server death and triggers circuit breaker (30s pause)
|
|
15
|
+
- Some instances may fail to restart due to rate limits from simultaneous startup
|
|
16
|
+
- Fix: manually restart failed instances, or do another `agend fleet restart`
|
|
17
|
+
- Check failed instances: `agend ls` shows "stopped" status
|
|
18
|
+
|
|
19
|
+
**Rate limit recovery:**
|
|
20
|
+
- If you see "PTY error: Rate limit reached" or "crash loop — respawn paused", wait 1-2 minutes
|
|
21
|
+
- Then `restart_instance` the affected instance
|
|
22
|
+
- Do NOT restart all instances simultaneously — this worsens rate limits
|
|
23
|
+
|
|
24
|
+
## Safe Update & Restart
|
|
25
|
+
|
|
26
|
+
**Update AgEnD to latest version:**
|
|
27
|
+
```bash
|
|
28
|
+
agend update # update to latest
|
|
29
|
+
agend update --version 0.0.6 # pin specific version
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The `agend update` command automatically:
|
|
33
|
+
- Detects if sudo is needed (switches to nvm if so)
|
|
34
|
+
- Installs new version
|
|
35
|
+
- Verifies installation succeeded
|
|
36
|
+
- Updates service file (ExecStart path)
|
|
37
|
+
- Restarts fleet
|
|
38
|
+
|
|
39
|
+
**Manual restart (if update isn't needed):**
|
|
40
|
+
```bash
|
|
41
|
+
agend fleet restart # graceful restart (SIGUSR2) — keeps sessions, reloads config
|
|
42
|
+
agend fleet stop && agend fleet start # full restart — new code takes effect
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**NEVER do:**
|
|
46
|
+
- `kill -9` on the fleet process (corrupts state)
|
|
47
|
+
- Edit fleet.yaml while fleet is restarting
|
|
48
|
+
- Run `agend update` while another update is in progress
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: instance-lifecycle
|
|
3
|
+
description: Replace vs restart instances, monitoring state, when to use each
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Instance Lifecycle Management
|
|
7
|
+
|
|
8
|
+
**Replace vs Restart:**
|
|
9
|
+
- `restart_instance` — keeps session, reloads config. Use when config changed.
|
|
10
|
+
- `replace_instance` — kills old, creates fresh with handover context. Use when context is polluted or instance is stuck in a loop.
|
|
11
|
+
|
|
12
|
+
**When to replace (not restart):**
|
|
13
|
+
- Instance keeps hallucinating or referencing stale information
|
|
14
|
+
- Instance is stuck in a tool-call loop
|
|
15
|
+
- Context is reported >80% full and responses are degrading (only applicable to backends that report context usage)
|
|
16
|
+
|
|
17
|
+
**Monitoring instance state:**
|
|
18
|
+
- `describe_instance("<name>")` — shows status, last activity, description
|
|
19
|
+
- `tmux capture-pane -t agend:<name> -p | tail -20` — see actual CLI screen
|
|
20
|
+
- Look for `X% !>` prompt = idle, `Thinking...` = busy, `error` = needs attention
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: model-discovery
|
|
3
|
+
description: List available models per backend, configure model in fleet.yaml
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Model Names by Backend
|
|
7
|
+
|
|
8
|
+
Models are specified in fleet.yaml `defaults.model` or per-instance `model` field.
|
|
9
|
+
|
|
10
|
+
| Backend | How to list models | Default |
|
|
11
|
+
|---------|-------------------|---------|
|
|
12
|
+
| **kiro-cli** | In tmux: send `/model` + Enter → read model list → Esc to close | auto (latest) |
|
|
13
|
+
| **claude-code** | `sonnet`, `opus`, `haiku`, `opusplan`, `best`, `sonnet[1m]`, `opus[1m]` | sonnet |
|
|
14
|
+
| **antigravity** | Run `agy models` to see available models | Gemini 3.5 Flash (Medium) |
|
|
15
|
+
| **codex** | `gpt-4o`, `o3`, `o4-mini` | gpt-4o |
|
|
16
|
+
| **opencode** | `opencode models` | depends on provider |
|
|
17
|
+
|
|
18
|
+
**To discover available models for a backend, run the CLI's model listing command:**
|
|
19
|
+
- `agy models` — lists all available models for antigravity
|
|
20
|
+
- `opencode models` — lists all available models for opencode
|
|
21
|
+
- `codex` — check config.toml
|
|
22
|
+
|
|
23
|
+
**Important:** Model names vary by backend. Always check the actual CLI output rather than guessing names. For antigravity, use the exact display name shown by `agy models` (e.g. "Gemini 3.5 Flash (High)").
|
|
24
|
+
|
|
25
|
+
Example fleet.yaml:
|
|
26
|
+
```yaml
|
|
27
|
+
defaults:
|
|
28
|
+
backend: kiro-cli
|
|
29
|
+
model: claude-sonnet-4-20250514
|
|
30
|
+
|
|
31
|
+
instances:
|
|
32
|
+
heavy-task:
|
|
33
|
+
model: claude-opus-4-20250514
|
|
34
|
+
```
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: session-management
|
|
3
|
+
description: Save/load/fork sessions, batch backup, reviewer session setup
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Reviewer Session Management
|
|
7
|
+
|
|
8
|
+
For reviewer instances using kiro-cli:
|
|
9
|
+
- Recommend setting `pre_task_command: "/chat load reviewer-base.json"` in fleet.yaml
|
|
10
|
+
- This loads a base session with review guidelines on every restart
|
|
11
|
+
- Help user create the base session:
|
|
12
|
+
1. Attach to reviewer: `agend attach <reviewer-instance>`
|
|
13
|
+
2. Set up review context and guidelines
|
|
14
|
+
3. Save: `/chat save reviewer-base.json -f`
|
|
15
|
+
4. Add to fleet.yaml under the reviewer instance config:
|
|
16
|
+
```yaml
|
|
17
|
+
instances:
|
|
18
|
+
reviewer-xxx:
|
|
19
|
+
pre_task_command: "/chat load reviewer-base.json"
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Fork Instance (Session Cloning)
|
|
23
|
+
|
|
24
|
+
When user wants to fork/clone an instance's session to a new instance:
|
|
25
|
+
|
|
26
|
+
Steps:
|
|
27
|
+
1. Wait for source instance to be idle (check with tmux capture-pane, look for "X% !>" prompt)
|
|
28
|
+
|
|
29
|
+
2. Save current session on source instance via tmux:
|
|
30
|
+
- `execute_bash`: `tmux send-keys -t agend:<source-instance> '/chat save YYYYMMDD.json -f' Enter`
|
|
31
|
+
- Wait a few seconds for save to complete
|
|
32
|
+
|
|
33
|
+
3. Create new instance:
|
|
34
|
+
- `create_instance` with same backend and working_directory (or new one)
|
|
35
|
+
|
|
36
|
+
4. Copy session file to new instance workspace:
|
|
37
|
+
- `execute_bash`: `cp ~/.agend/workspaces/<source>/YYYYMMDD.json ~/.agend/workspaces/<target>/`
|
|
38
|
+
|
|
39
|
+
5. Wait for new instance to be idle, then load session via tmux:
|
|
40
|
+
- `execute_bash`: `tmux send-keys -t agend:<new-instance-name> '/chat load YYYYMMDD.json' Enter`
|
|
41
|
+
- Or configure `pre_task_command: "/chat load YYYYMMDD.json"` for auto-load on restart
|
|
42
|
+
|
|
43
|
+
## Batch Session Backup
|
|
44
|
+
|
|
45
|
+
Save all instances' sessions to a dated backup directory:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
DATE=$(date +%Y%m%d)
|
|
49
|
+
BACKUP_DIR="$HOME/.agend/session-backups/$DATE"
|
|
50
|
+
mkdir -p "$BACKUP_DIR"
|
|
51
|
+
MY_NAME="<your-own-instance-name>" # skip yourself to avoid paste collision
|
|
52
|
+
for win in $(tmux list-windows -t agend -F '#{window_name}' | grep -v bash); do
|
|
53
|
+
if [ "$win" = "$MY_NAME" ]; then continue; fi
|
|
54
|
+
tmux send-keys -t "agend:$win" "/chat save $BACKUP_DIR/${win}.json -f" Enter
|
|
55
|
+
sleep 3
|
|
56
|
+
done
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Important:
|
|
60
|
+
- Skip your own instance (the one executing this) to avoid paste collision
|
|
61
|
+
- Use `sleep 3` between saves
|
|
62
|
+
- Run fleet health check first — only backup idle instances
|
|
63
|
+
- Do NOT backup while instances are busy
|
|
64
|
+
|
|
65
|
+
Restore a single instance:
|
|
66
|
+
- `tmux send-keys -t agend:<instance> '/chat load /path/to/backup.json' Enter`
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Core Rules
|
|
2
|
+
|
|
3
|
+
> **These rules are mandatory for all general instances.**
|
|
4
|
+
|
|
5
|
+
## Instance Creation Safety
|
|
6
|
+
|
|
7
|
+
When creating a new instance with `create_instance`:
|
|
8
|
+
|
|
9
|
+
**Pre-checks (mandatory):**
|
|
10
|
+
1. **Check for duplicate working directory** — Run `list_instances` and verify no existing instance uses the same `working_directory`. Two instances sharing a directory causes file conflicts and race conditions.
|
|
11
|
+
2. **Verify the directory exists** — If specifying a directory, confirm it exists on disk before calling create.
|
|
12
|
+
3. **Use unique names** — The instance name is derived from `topic_name` or `basename(directory)`. Avoid generic names like "dev" or "test" that may collide.
|
|
13
|
+
|
|
14
|
+
**Post-checks (mandatory):**
|
|
15
|
+
4. **Confirm topic/channel creation** — After `create_instance` returns, verify the response includes a valid `topic_id`. This confirms the Discord channel or Telegram topic was actually created.
|
|
16
|
+
5. **Verify instance is running** — Use `describe_instance` to confirm the new instance reached "running" status. If it shows "stopped" or errors, check the output log.
|
|
17
|
+
|
|
18
|
+
**Common mistakes to avoid:**
|
|
19
|
+
- Do NOT create an instance pointing to another instance's worktree path
|
|
20
|
+
- Do NOT reuse a `topic_name` that already exists (Discord will create a duplicate channel)
|
|
21
|
+
- Do NOT omit `topic_name` when `directory` is not provided — it will error
|
|
22
|
+
|
|
23
|
+
## What NOT to Do (Dangerous Operations)
|
|
24
|
+
|
|
25
|
+
- **Don't delete `~/.agend/fleet.yaml`** while fleet is running
|
|
26
|
+
- **Don't delete `~/.agend/fleet.pid`** manually — use `agend fleet stop`
|
|
27
|
+
- **Don't kill tmux server** (`tmux kill-server`) — kills all agent sessions
|
|
28
|
+
- **Don't edit instance output.log** — it's actively written by the daemon
|
|
29
|
+
- **Don't run two fleet processes** on the same AGEND_HOME — port/socket conflicts
|
|
30
|
+
- **Don't change `channel.group_id`** without re-creating all topics — routing breaks
|
|
31
|
+
- **Don't remove an instance from fleet.yaml** that has active work — stop it first
|
|
32
|
+
|
|
33
|
+
## Access Mode Reference
|
|
34
|
+
|
|
35
|
+
fleet.yaml `channel.access.mode` valid values:
|
|
36
|
+
|
|
37
|
+
| Mode | Behavior |
|
|
38
|
+
|------|----------|
|
|
39
|
+
| `locked` | Only `allowed_users` can interact (default) |
|
|
40
|
+
| `pairing` | Users can request access via `/pair` command |
|
|
41
|
+
| `open` | All users can interact, no restrictions |
|
|
42
|
+
|
|
43
|
+
Example:
|
|
44
|
+
```yaml
|
|
45
|
+
channel:
|
|
46
|
+
access:
|
|
47
|
+
mode: open # everyone can use
|
|
48
|
+
# mode: locked # whitelist only (add allowed_users)
|
|
49
|
+
# mode: pairing # users self-register via /pair
|
|
50
|
+
allowed_users: [123456789] # only needed for locked/pairing
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**When to use each:**
|
|
54
|
+
- `locked` — production, private bot, security-sensitive
|
|
55
|
+
- `pairing` — semi-open, users request access with admin approval
|
|
56
|
+
- `open` — public demo, shared team bot, testing
|
package/package.json
CHANGED
|
@@ -1,297 +0,0 @@
|
|
|
1
|
-
# Advanced Skills
|
|
2
|
-
|
|
3
|
-
> **Note: These skills apply to kiro-cli backend instances only.**
|
|
4
|
-
|
|
5
|
-
## 1. Instance Health Check via tmux
|
|
6
|
-
|
|
7
|
-
When user asks to check an instance's status or what it's doing:
|
|
8
|
-
- Use `execute_bash` to run: `tmux capture-pane -t agend:<instance-name> -p | tail -20`
|
|
9
|
-
- This shows the actual CLI screen (what the agent sees right now)
|
|
10
|
-
- More useful than just "running/stopped" status
|
|
11
|
-
- If the instance appears stuck, suggest `/raw /compact` or restart
|
|
12
|
-
|
|
13
|
-
## 2. Reviewer Session Management
|
|
14
|
-
|
|
15
|
-
For reviewer instances using kiro-cli:
|
|
16
|
-
- Recommend setting `pre_task_command: "/chat load reviewer-base.json"` in fleet.yaml
|
|
17
|
-
- This loads a base session with review guidelines on every restart
|
|
18
|
-
- Help user create the base session:
|
|
19
|
-
1. Attach to reviewer: `agend attach <reviewer-instance>`
|
|
20
|
-
2. Set up review context and guidelines
|
|
21
|
-
3. Save: `/chat save reviewer-base.json -f`
|
|
22
|
-
4. Add to fleet.yaml under the reviewer instance config:
|
|
23
|
-
```yaml
|
|
24
|
-
instances:
|
|
25
|
-
reviewer-xxx:
|
|
26
|
-
pre_task_command: "/chat load reviewer-base.json"
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
## 3. Fork Instance (Session Cloning)
|
|
30
|
-
|
|
31
|
-
When user wants to fork/clone an instance's session to a new instance:
|
|
32
|
-
|
|
33
|
-
Steps:
|
|
34
|
-
1. Wait for source instance to be idle (check with tmux capture-pane, look for "X% !>" prompt)
|
|
35
|
-
|
|
36
|
-
2. Save current session on source instance via tmux:
|
|
37
|
-
- `execute_bash`: `tmux send-keys -t agend:<source-instance> '/chat save YYYYMMDD.json -f' Enter`
|
|
38
|
-
- Wait a few seconds for save to complete
|
|
39
|
-
|
|
40
|
-
3. Create new instance:
|
|
41
|
-
- `create_instance` with same backend and working_directory (or new one)
|
|
42
|
-
|
|
43
|
-
4. Copy session file to new instance workspace:
|
|
44
|
-
- `execute_bash`: `cp ~/.agend/workspaces/<source>/YYYYMMDD.json ~/.agend/workspaces/<target>/`
|
|
45
|
-
|
|
46
|
-
5. Wait for new instance to be idle, then load session via tmux:
|
|
47
|
-
- `execute_bash`: `tmux send-keys -t agend:<new-instance-name> '/chat load YYYYMMDD.json' Enter`
|
|
48
|
-
- Or configure `pre_task_command: "/chat load YYYYMMDD.json"` for auto-load on restart
|
|
49
|
-
|
|
50
|
-
## 4. Batch Session Backup
|
|
51
|
-
|
|
52
|
-
Save all instances' sessions to a dated backup directory:
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
DATE=$(date +%Y%m%d)
|
|
56
|
-
BACKUP_DIR="$HOME/.agend/session-backups/$DATE"
|
|
57
|
-
mkdir -p "$BACKUP_DIR"
|
|
58
|
-
MY_NAME="<your-own-instance-name>" # skip yourself to avoid paste collision
|
|
59
|
-
for win in $(tmux list-windows -t agend -F '#{window_name}' | grep -v bash); do
|
|
60
|
-
if [ "$win" = "$MY_NAME" ]; then continue; fi
|
|
61
|
-
tmux send-keys -t "agend:$win" "/chat save $BACKUP_DIR/${win}.json -f" Enter
|
|
62
|
-
sleep 3
|
|
63
|
-
done
|
|
64
|
-
```
|
|
65
|
-
|
|
66
|
-
Important:
|
|
67
|
-
- Skip your own instance (the one executing this) to avoid paste collision
|
|
68
|
-
- Use `sleep 3` between saves
|
|
69
|
-
- Run fleet health check first — only backup idle instances
|
|
70
|
-
- Do NOT backup while instances are busy
|
|
71
|
-
|
|
72
|
-
Restore a single instance:
|
|
73
|
-
- `tmux send-keys -t agend:<instance> '/chat load /path/to/backup.json' Enter`
|
|
74
|
-
|
|
75
|
-
## 5. Fleet Health Check
|
|
76
|
-
|
|
77
|
-
Check all instances for stuck/error state:
|
|
78
|
-
|
|
79
|
-
```bash
|
|
80
|
-
for win in $(tmux list-windows -t agend -F '#{window_name}' | grep -v bash); do
|
|
81
|
-
last=$(tmux capture-pane -t "agend:$win" -p | tail -3 | tr '\n' ' ')
|
|
82
|
-
if echo "$last" | grep -q "!>"; then
|
|
83
|
-
echo "✅ $win — idle"
|
|
84
|
-
elif echo "$last" | grep -q "error:"; then
|
|
85
|
-
echo "❌ $win — ERROR"
|
|
86
|
-
else
|
|
87
|
-
echo "⏳ $win — busy"
|
|
88
|
-
fi
|
|
89
|
-
done
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
States:
|
|
93
|
-
- ✅ idle — prompt visible (X% !>), ready for input
|
|
94
|
-
- ⏳ busy — processing a task, wait for it to finish
|
|
95
|
-
- ❌ error — check tmux pane for details, may need restart
|
|
96
|
-
|
|
97
|
-
If an instance is stuck (busy for >10 minutes with no output), restart it:
|
|
98
|
-
- `restart_instance("<instance-name>")`
|
|
99
|
-
|
|
100
|
-
## 6. Instance Creation Safety
|
|
101
|
-
|
|
102
|
-
When creating a new instance with `create_instance`:
|
|
103
|
-
|
|
104
|
-
**Pre-checks (mandatory):**
|
|
105
|
-
1. **Check for duplicate working directory** — Run `list_instances` and verify no existing instance uses the same `working_directory`. Two instances sharing a directory causes file conflicts and race conditions.
|
|
106
|
-
2. **Verify the directory exists** — If specifying a directory, confirm it exists on disk before calling create.
|
|
107
|
-
3. **Use unique names** — The instance name is derived from `topic_name` or `basename(directory)`. Avoid generic names like "dev" or "test" that may collide.
|
|
108
|
-
|
|
109
|
-
**Post-checks (mandatory):**
|
|
110
|
-
4. **Confirm topic/channel creation** — After `create_instance` returns, verify the response includes a valid `topic_id`. This confirms the Discord channel or Telegram topic was actually created.
|
|
111
|
-
5. **Verify instance is running** — Use `describe_instance` to confirm the new instance reached "running" status. If it shows "stopped" or errors, check the output log.
|
|
112
|
-
|
|
113
|
-
**Common mistakes to avoid:**
|
|
114
|
-
- Do NOT create an instance pointing to another instance's worktree path
|
|
115
|
-
- Do NOT reuse a `topic_name` that already exists (Discord will create a duplicate channel)
|
|
116
|
-
- Do NOT omit `topic_name` when `directory` is not provided — it will error
|
|
117
|
-
|
|
118
|
-
## 7. Fleet Restart & Recovery
|
|
119
|
-
|
|
120
|
-
**Restart types:**
|
|
121
|
-
- `agend fleet restart` — full stop + start (picks up new code after build & link)
|
|
122
|
-
- `agend reload` — SIGHUP hot-reload, reconciles instances without restarting the fleet process
|
|
123
|
-
- `restart_instance("<name>")` — single instance restart, reloads fleet.yaml first
|
|
124
|
-
|
|
125
|
-
**After tmux crash:**
|
|
126
|
-
- Fleet auto-detects tmux server death and triggers circuit breaker (30s pause)
|
|
127
|
-
- Some instances may fail to restart due to rate limits from simultaneous startup
|
|
128
|
-
- Fix: manually restart failed instances, or do another `agend fleet restart`
|
|
129
|
-
- Check failed instances: `agend ls` shows "stopped" status
|
|
130
|
-
|
|
131
|
-
**Rate limit recovery:**
|
|
132
|
-
- If you see "PTY error: Rate limit reached" or "crash loop — respawn paused", wait 1-2 minutes
|
|
133
|
-
- Then `restart_instance` the affected instance
|
|
134
|
-
- Do NOT restart all instances simultaneously — this worsens rate limits
|
|
135
|
-
|
|
136
|
-
## 8. Configuration Quick Reference
|
|
137
|
-
|
|
138
|
-
**fleet.yaml structure:**
|
|
139
|
-
```yaml
|
|
140
|
-
channel: # Telegram/Discord connection
|
|
141
|
-
defaults: # Shared defaults for all instances
|
|
142
|
-
backend: kiro-cli
|
|
143
|
-
startup:
|
|
144
|
-
concurrency: 6 # Max simultaneous instance startups
|
|
145
|
-
stagger_delay_ms: 2000 # Delay between startup batches
|
|
146
|
-
instances: # Per-instance config (topic_id, working_directory, etc.)
|
|
147
|
-
templates: # Reusable fleet deployment templates
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
**classicBot.yaml** — manages classic bot channels (separate from fleet.yaml):
|
|
151
|
-
- `defaults.allowed_guilds` — Discord server whitelist
|
|
152
|
-
- `defaults.allowed_groups` — Telegram group whitelist
|
|
153
|
-
- `channels` — per-channel backend override
|
|
154
|
-
- Hot-reloads every 30 seconds (no restart needed)
|
|
155
|
-
|
|
156
|
-
**Key config locations:**
|
|
157
|
-
- Fleet config: `~/.agend/fleet.yaml`
|
|
158
|
-
- Classic bot: `~/.agend/classicBot.yaml`
|
|
159
|
-
- Environment: `~/.agend/.env` (bot tokens, API keys)
|
|
160
|
-
- Instance logs: `~/.agend/instances/<name>/output.log`
|
|
161
|
-
- Fleet log: `~/.agend/fleet.log`
|
|
162
|
-
|
|
163
|
-
## 9. Instance Lifecycle Management
|
|
164
|
-
|
|
165
|
-
**Replace vs Restart:**
|
|
166
|
-
- `restart_instance` — keeps session, reloads config. Use when config changed.
|
|
167
|
-
- `replace_instance` — kills old, creates fresh with handover context. Use when context is polluted or instance is stuck in a loop.
|
|
168
|
-
|
|
169
|
-
**When to replace (not restart):**
|
|
170
|
-
- Instance keeps hallucinating or referencing stale information
|
|
171
|
-
- Instance is stuck in a tool-call loop
|
|
172
|
-
- Context is reported >80% full and responses are degrading (only applicable to backends that report context usage)
|
|
173
|
-
|
|
174
|
-
**Monitoring instance state:**
|
|
175
|
-
- `describe_instance("<name>")` — shows status, last activity, description
|
|
176
|
-
- `tmux capture-pane -t agend:<name> -p | tail -20` — see actual CLI screen
|
|
177
|
-
- Look for `X% !>` prompt = idle, `Thinking...` = busy, `error` = needs attention
|
|
178
|
-
|
|
179
|
-
## 10. Safe Update & Restart
|
|
180
|
-
|
|
181
|
-
**Update AgEnD to latest version:**
|
|
182
|
-
```bash
|
|
183
|
-
agend update # update to latest
|
|
184
|
-
agend update --version 0.0.6 # pin specific version
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
The `agend update` command automatically:
|
|
188
|
-
- Detects if sudo is needed (switches to nvm if so)
|
|
189
|
-
- Installs new version
|
|
190
|
-
- Verifies installation succeeded
|
|
191
|
-
- Updates service file (ExecStart path)
|
|
192
|
-
- Restarts fleet
|
|
193
|
-
|
|
194
|
-
**Manual restart (if update isn't needed):**
|
|
195
|
-
```bash
|
|
196
|
-
agend fleet restart # graceful restart (SIGUSR2) — keeps sessions, reloads config
|
|
197
|
-
agend fleet stop && agend fleet start # full restart — new code takes effect
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
**NEVER do:**
|
|
201
|
-
- `kill -9` on the fleet process (corrupts state)
|
|
202
|
-
- Edit fleet.yaml while fleet is restarting
|
|
203
|
-
- Run `agend update` while another update is in progress
|
|
204
|
-
|
|
205
|
-
## 11. Model Names by Backend
|
|
206
|
-
|
|
207
|
-
Models are specified in fleet.yaml `defaults.model` or per-instance `model` field.
|
|
208
|
-
|
|
209
|
-
| Backend | Model Names | Default |
|
|
210
|
-
|---------|-------------|---------|
|
|
211
|
-
| **kiro-cli** | `claude-sonnet-4-20250514`, `claude-opus-4-20250514`, `claude-haiku-3-20250307` | auto (latest) |
|
|
212
|
-
| **claude-code** | `sonnet`, `opus`, `haiku`, `opusplan`, `best`, `sonnet[1m]`, `opus[1m]` | sonnet |
|
|
213
|
-
| **gemini-cli** | `gemini-2.5-pro`, `gemini-2.5-flash` | auto |
|
|
214
|
-
| **codex** | `gpt-4o`, `o3`, `o4-mini` | gpt-4o |
|
|
215
|
-
| **opencode** | depends on provider config | — |
|
|
216
|
-
|
|
217
|
-
**Important:** kiro-cli uses FULL model IDs (e.g. `claude-sonnet-4-20250514`), NOT short names like `sonnet`. Claude Code uses short names. Don't mix them up.
|
|
218
|
-
|
|
219
|
-
Example fleet.yaml:
|
|
220
|
-
```yaml
|
|
221
|
-
defaults:
|
|
222
|
-
backend: kiro-cli
|
|
223
|
-
model: claude-sonnet-4-20250514
|
|
224
|
-
|
|
225
|
-
instances:
|
|
226
|
-
heavy-task:
|
|
227
|
-
model: claude-opus-4-20250514
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
## 12. Config Validation
|
|
231
|
-
|
|
232
|
-
**Before editing fleet.yaml or classicBot.yaml, always validate after:**
|
|
233
|
-
|
|
234
|
-
```bash
|
|
235
|
-
# Validate fleet.yaml syntax
|
|
236
|
-
agend fleet start --dry-run 2>&1 | head -5
|
|
237
|
-
# Or simply:
|
|
238
|
-
node -e "const yaml = require('js-yaml'); const fs = require('fs'); yaml.load(fs.readFileSync('$HOME/.agend/fleet.yaml', 'utf-8')); console.log('✓ valid YAML')"
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
**Common fleet.yaml mistakes:**
|
|
242
|
-
- Missing `channel.mode` field → error on start
|
|
243
|
-
- Wrong indentation (YAML is indent-sensitive)
|
|
244
|
-
- `topic_id` as string vs number (both work, but be consistent)
|
|
245
|
-
- `backend` typo (valid: `claude-code`, `gemini-cli`, `codex`, `opencode`, `kiro-cli`)
|
|
246
|
-
- `model` using wrong format for the backend
|
|
247
|
-
|
|
248
|
-
**classicBot.yaml validation:**
|
|
249
|
-
```bash
|
|
250
|
-
node -e "const yaml = require('js-yaml'); const fs = require('fs'); yaml.load(fs.readFileSync('$HOME/.agend/classicBot.yaml', 'utf-8')); console.log('✓ valid YAML')"
|
|
251
|
-
```
|
|
252
|
-
|
|
253
|
-
**Common classicBot.yaml mistakes:**
|
|
254
|
-
- `allowed_guilds` values must be strings (Discord IDs are too large for YAML integers)
|
|
255
|
-
- Channel IDs as keys must be quoted strings
|
|
256
|
-
- Missing `defaults` section (optional but recommended)
|
|
257
|
-
|
|
258
|
-
**After editing config:**
|
|
259
|
-
```bash
|
|
260
|
-
agend reload # hot-reload (SIGHUP) — adds/removes instances without restart
|
|
261
|
-
agend fleet restart # if channel/defaults changed — needs full restart
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
## 13. What NOT to Do (Dangerous Operations)
|
|
265
|
-
|
|
266
|
-
- **Don't delete `~/.agend/fleet.yaml`** while fleet is running
|
|
267
|
-
- **Don't delete `~/.agend/fleet.pid`** manually — use `agend fleet stop`
|
|
268
|
-
- **Don't kill tmux server** (`tmux kill-server`) — kills all agent sessions
|
|
269
|
-
- **Don't edit instance output.log** — it's actively written by the daemon
|
|
270
|
-
- **Don't run two fleet processes** on the same AGEND_HOME — port/socket conflicts
|
|
271
|
-
- **Don't change `channel.group_id`** without re-creating all topics — routing breaks
|
|
272
|
-
- **Don't remove an instance from fleet.yaml** that has active work — stop it first
|
|
273
|
-
|
|
274
|
-
## 14. Access Mode Reference
|
|
275
|
-
|
|
276
|
-
fleet.yaml `channel.access.mode` valid values:
|
|
277
|
-
|
|
278
|
-
| Mode | Behavior |
|
|
279
|
-
|------|----------|
|
|
280
|
-
| `locked` | Only `allowed_users` can interact (default) |
|
|
281
|
-
| `pairing` | Users can request access via `/pair` command |
|
|
282
|
-
| `open` | All users can interact, no restrictions |
|
|
283
|
-
|
|
284
|
-
Example:
|
|
285
|
-
```yaml
|
|
286
|
-
channel:
|
|
287
|
-
access:
|
|
288
|
-
mode: open # everyone can use
|
|
289
|
-
# mode: locked # whitelist only (add allowed_users)
|
|
290
|
-
# mode: pairing # users self-register via /pair
|
|
291
|
-
allowed_users: [123456789] # only needed for locked/pairing
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
**When to use each:**
|
|
295
|
-
- `locked` — production, private bot, security-sensitive
|
|
296
|
-
- `pairing` — semi-open, users request access with admin approval
|
|
297
|
-
- `open` — public demo, shared team bot, testing
|