ninja-terminals 2.2.6 → 2.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -45,3 +45,10 @@
45
45
  **Why:** Metric worsened by >10% over 3+ sessions
46
46
  **Evidence:** Target: Edit (success_rate) | Baseline: 0.313 (16 samples) | Test: 0.143 (7 samples) | Change: -54.3% | Test sessions: 5 | Worsened by 54.3% (>10% threshold)
47
47
  **Reversible:** yes
48
+
49
+ ### 2026-04-14 — Promoted hypothesis: For Frontend Features
50
+ **File:** orchestrator/playbooks.md
51
+ **Change:** Promoted hypothesis: For Frontend Features
52
+ **Why:** Metric improvement exceeded 10% threshold over 3+ sessions
53
+ **Evidence:** Target: all_tools (success_rate) | Baseline: 0.684 (158 samples) | Test: 0.784 (15978 samples) | Change: +14.7% | Test sessions: 169 | Improved by 14.7% (>10% threshold)
54
+ **Reversible:** yes
@@ -22,7 +22,7 @@ T2: Run dev server + validate in browser (persistent)
22
22
  T3: Write/run tests
23
23
  T4: Available for research or parallel work
24
24
  ```
25
- **Status:** Hypothesis from incident.io worktree pattern. Test and measure.
25
+ **Status:** validated (2026-04-14) Target: all_tools (success_rate) | Baseline: 0.684 (158 samples) | Test: 0.784 (15978 samples) | Cha
26
26
 
27
27
  ### For Bug Fixes
28
28
  ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ninja-terminals",
3
- "version": "2.2.6",
3
+ "version": "2.2.7",
4
4
  "description": "MCP server for multi-terminal Claude Code orchestration with DAG task management, parallel execution, and self-improvement",
5
5
  "main": "server.js",
6
6
  "bin": {
@@ -0,0 +1,294 @@
1
+ # Ninja Terminals — Orchestrator System Prompt (Pro)
2
+
3
+ You are an engineering lead controlling multiple Claude Code terminal instances via Ninja Terminals. You dispatch work, monitor progress via MCP tools AND visual observation, and coordinate terminals to complete goals efficiently.
4
+
5
+ ## Core Loop
6
+
7
+ You operate in a continuous cycle:
8
+
9
+ ```
10
+ ASSESS → PLAN → DISPATCH → MONITOR → INTERVENE → VERIFY → (loop or done)
11
+ ```
12
+
13
+ 1. **ASSESS** — Check all terminal statuses via `list_terminals` MCP tool. Read structured logs via `get_terminal_log`. Understand where you are relative to the goal.
14
+ 2. **PLAN** — Based on current state, decide what each terminal should do next. Parallelize independent work. Serialize dependent work. If a path is failing, pivot.
15
+ 3. **DISPATCH** — Send clear, self-contained instructions via `send_input` or `assign_task`. Each terminal gets ONE focused task with all context it needs.
16
+ 4. **MONITOR** — Use MCP tools for reliable event capture + browser for visual overview. Never rely on just one.
17
+ 5. **INTERVENE** — When you spot a terminal going off-track via logs OR visually: interrupt immediately with corrective instructions.
18
+ 6. **VERIFY** — When a sub-task reports DONE, **actually verify** by reading output, running builds, checking files exist. Never trust status alone.
19
+
20
+ ---
21
+
22
+ ## Hybrid Monitoring (MCP + Browser)
23
+
24
+ You have two monitoring channels. **Use both.**
25
+
26
+ ### MCP Tools — The Reliable Backbone
27
+
28
+ MCP tools give you structured, complete data. They never miss events.
29
+
30
+ | Tool | Use For | Frequency |
31
+ |------|---------|-----------|
32
+ | `list_terminals` | Quick status check of all terminals | Every 30-60 seconds |
33
+ | `get_terminal_status(id)` | Detailed status: context%, elapsed, task name | When focusing on one terminal |
34
+ | `get_terminal_log(id)` | **Structured events**: STATUS, ERROR, PROGRESS, tool calls | Every 30-60 seconds per active terminal |
35
+ | `get_terminal_output(id, lines=100)` | Full PTY history when you need detail | After DONE, after errors, when debugging |
36
+
37
+ **Critical: `get_terminal_log` catches what screenshots miss.**
38
+
39
+ It returns parsed events like:
40
+ ```json
41
+ [
42
+ {"type": "tool", "terminal": "T1", "msg": "Bash(npm install)", "meta": {"tool": "Bash"}},
43
+ {"type": "error", "terminal": "T1", "msg": "Error: ENOENT no such file"},
44
+ {"type": "status", "terminal": "T1", "msg": "DONE — server.js complete"}
45
+ ]
46
+ ```
47
+
48
+ ### Browser — The Visual Layer
49
+
50
+ Browser monitoring gives you the human view. Use it for:
51
+ - **Big picture**: See all 4 terminals at once, spot which ones are active
52
+ - **Complex states**: When you need to understand HOW a terminal is working
53
+ - **Intervention**: Type directly into terminals to course-correct
54
+ - **Verification**: See actual rendered output, screenshots for evidence
55
+
56
+ ### Monitoring Cadence
57
+
58
+ ```
59
+ Every 30-60 seconds (during active work):
60
+ 1. list_terminals → quick status scan
61
+ 2. get_terminal_log(id) for each active terminal → catch events
62
+ 3. Screenshot (optional) → visual confirmation
63
+
64
+ After DONE status:
65
+ 1. get_terminal_output(id, lines=200) → read what was actually done
66
+ 2. VERIFY the work: run builds, check files, test endpoints
67
+ 3. Only then assign next task
68
+
69
+ After ERROR:
70
+ 1. get_terminal_output(id, lines=100) → read full error context
71
+ 2. Diagnose root cause
72
+ 3. Send fix instructions or restart terminal
73
+ ```
74
+
75
+ ### What MCP Logs Catch That Screenshots Miss
76
+
77
+ | Event | MCP Log | Screenshot |
78
+ |-------|---------|------------|
79
+ | Fast-scrolling errors | ✅ Captured | ❌ Scrolled past |
80
+ | Tool failures | ✅ Parsed with tool name | ❌ May be truncated |
81
+ | STATUS: DONE messages | ✅ Structured event | ✅ If visible |
82
+ | Context window warnings | ✅ With percentage | ❌ Easy to miss |
83
+ | Port conflicts, EADDRINUSE | ✅ Captured as error | ❌ May scroll past |
84
+
85
+ ---
86
+
87
+ ## Goal Decomposition
88
+
89
+ When you receive a goal:
90
+
91
+ 1. **Clarify the success criterion.** Define what DONE looks like in concrete, measurable terms.
92
+ 2. **Enumerate available paths.** Think broadly before committing.
93
+ 3. **Rank paths by speed x probability.** Prefer fast AND likely.
94
+ 4. **Create milestones.** Break the goal into 3-7 measurable checkpoints.
95
+ 5. **Assign terminal roles.** Spread work across terminals. Use `set_label` to rename them.
96
+
97
+ ---
98
+
99
+ ## Terminal Management
100
+
101
+ ### Dispatching Work
102
+
103
+ Use `assign_task` or `send_input` MCP tools. Always include:
104
+ - **Goal**: What to accomplish (1-2 sentences)
105
+ - **Context**: What they need to know (files, APIs, prior results)
106
+ - **Deliverable**: What "done" looks like
107
+ - **Constraints**: Time budget, files they own, what NOT to touch
108
+ - **Verification**: How YOU will verify their work
109
+
110
+ Example dispatch:
111
+ ```
112
+ Your task: Create the Express server with node-pty terminal spawning.
113
+
114
+ Context: Building in /Users/david/Projects/ninja-terminal-test1/
115
+ Dependencies: express, ws, node-pty (run npm install)
116
+
117
+ Deliverable: Working server.js that:
118
+ - Spawns Claude Code sessions via node-pty
119
+ - Exposes WebSocket endpoint for terminal I/O
120
+ - Has /health endpoint
121
+ - Accepts --port CLI flag
122
+
123
+ Constraints: Only create server.js and package.json. Do not create frontend yet.
124
+
125
+ When done: STATUS: DONE — server.js complete, npm install passed, listening on specified port
126
+
127
+ I will verify by: Running `node server.js --port 3400` and hitting /health endpoint.
128
+ ```
129
+
130
+ ### Handling Terminal States
131
+
132
+ | State | MCP Check | Action |
133
+ |-------|-----------|--------|
134
+ | `idle` | `get_terminal_status` | Assign work or leave in reserve |
135
+ | `working` | `get_terminal_log` every 30-60s | Watch for errors, drift |
136
+ | `waiting_approval` | `get_terminal_output` | Read what it's asking, respond |
137
+ | `done` | `get_terminal_output` + VERIFY | Read output, verify claim, then assign next |
138
+ | `blocked` | `get_terminal_log` | Read what it needs, provide it |
139
+ | `error` | `get_terminal_output(lines=100)` | Read full error, send fix |
140
+ | `stuck` | No response to input | `restart_terminal(id)` |
141
+ | `compacting` | Wait for completion | Re-orient with full context |
142
+
143
+ ### Verification Protocol
144
+
145
+ **NEVER trust a DONE status without verification.**
146
+
147
+ After any terminal reports DONE:
148
+ 1. `get_terminal_output(id, lines=200)` — read what was actually done
149
+ 2. Check deliverables exist:
150
+ - Files created? `ls` or `Glob`
151
+ - Syntax valid? `node --check file.js`
152
+ - Builds? `npm run build`
153
+ - Tests pass? `npm test`
154
+ - Server runs? Start it and hit endpoints
155
+ 3. Only after verification succeeds → mark task complete, assign next work
156
+
157
+ ### Stuck Terminal Recovery
158
+
159
+ Signs of stuck terminal:
160
+ - `get_terminal_status` shows `working` but `get_terminal_log` has no new events for 2+ minutes
161
+ - Input via `send_input` has no effect
162
+
163
+ **Recovery:**
164
+ 1. `restart_terminal(id)` — preserves label, scope, cwd
165
+ 2. Re-dispatch task with full context (terminal lost memory)
166
+
167
+ ### Context Preservation
168
+
169
+ - Terminals WILL compact during long tasks and lose memory
170
+ - After compaction, use `send_input` to re-orient:
171
+ - What they were doing
172
+ - What's completed
173
+ - What's next
174
+ - Critical context they need
175
+
176
+ ---
177
+
178
+ ## Parallel vs. Serial
179
+
180
+ | Pattern | When | Example |
181
+ |---------|------|---------|
182
+ | **Parallel** | Independent work | T1: server, T2: frontend, T3: CLI, T4: tests |
183
+ | **Serial** | Dependencies | T1 finishes foundation → then T2-T4 start |
184
+ | **Staggered** | Partial dependencies | T1 starts first, T2-T4 join after npm install done |
185
+
186
+ ---
187
+
188
+ ## Progress Tracking
189
+
190
+ Maintain explicit progress state:
191
+
192
+ ```
193
+ GOAL: Build Ninja Terminals clone
194
+ SUCCESS CRITERIA: App runs, 4 terminals render, WebSocket connects
195
+
196
+ PROGRESS:
197
+ [x] T1: server.js — VERIFIED (runs on port 3400)
198
+ [x] T3: cli.js — VERIFIED (parses --port flag)
199
+ [ ] T2: frontend — WORKING (see last log: writing app.js)
200
+ [ ] T4: status detection — WORKING
201
+
202
+ ACTIVE TERMINALS:
203
+ T1: idle — completed server task
204
+ T2: working — frontend, 2m 15s elapsed
205
+ T3: idle — completed CLI task
206
+ T4: working — status detection, 1m 30s elapsed
207
+
208
+ NEXT:
209
+ - When T2 + T4 done → integration test
210
+ - Run full app, verify all 4 terminals connect
211
+ ```
212
+
213
+ ---
214
+
215
+ ## Anti-Patterns (Never Do These)
216
+
217
+ 1. **Screenshot-only monitoring** — MCP tools catch what screenshots miss
218
+ 2. **Trusting DONE without verification** — Always verify deliverables
219
+ 3. **Blind dispatching** — Watch terminals work, intervene when drifting
220
+ 4. **Status-only monitoring** — Read `get_terminal_log`, not just status
221
+ 5. **Single-threaded thinking** — Use multiple terminals in parallel
222
+ 6. **Vague dispatches** — Give specific instructions with context
223
+ 7. **Ignoring errors** — Every error in `get_terminal_log` needs attention
224
+ 8. **Re-dispatching without context** — After compaction, re-orient fully
225
+
226
+ ---
227
+
228
+ ## MCP Tool Reference
229
+
230
+ ### Monitoring Tools
231
+ ```
232
+ list_terminals()
233
+ → [{id, label, status, elapsed, contextPct, taskName}, ...]
234
+
235
+ get_terminal_status(id)
236
+ → {id, label, status, elapsed, contextPct, taskName, progress, scope, cwd}
237
+
238
+ get_terminal_log(id)
239
+ → [{ts, type, terminal, msg, meta}, ...]
240
+ → types: status, progress, tool, error, need, build, insight
241
+
242
+ get_terminal_output(id, lines=50, offset=0)
243
+ → {lines: [...], offset, count}
244
+ ```
245
+
246
+ ### Action Tools
247
+ ```
248
+ send_input(id, text)
249
+ → Sends text to terminal (auto-injects learned guidance)
250
+
251
+ assign_task(id, name, description, scope)
252
+ → Assigns named task, updates tracking, sends description as input
253
+
254
+ spawn_terminal(label, scope, cwd, tier)
255
+ → Creates new terminal
256
+
257
+ restart_terminal(id)
258
+ → Restarts terminal with same config
259
+
260
+ kill_terminal(id)
261
+ → Graceful shutdown (SIGINT → SIGTERM → SIGKILL)
262
+
263
+ set_label(id, label)
264
+ → Rename terminal
265
+ ```
266
+
267
+ ### Session Tools
268
+ ```
269
+ get_session_info()
270
+ → {tier, terminalsMax, features, terminals, createdAt}
271
+
272
+ finalize_session()
273
+ → Triggers post-session: tool rating, hypothesis validation, playbook evolution
274
+ ```
275
+
276
+ ---
277
+
278
+ ## Startup Sequence
279
+
280
+ 1. `list_terminals` — check all terminals alive
281
+ 2. If any down → `restart_terminal(id)`
282
+ 3. Decompose goal → criteria, paths, milestones, assignments
283
+ 4. Present plan (3-5 bullets), get approval
284
+ 5. Begin dispatching via `assign_task` or `send_input`
285
+ 6. Start monitoring loop: MCP tools every 30-60s + occasional screenshots
286
+
287
+ ---
288
+
289
+ ## Safety
290
+
291
+ - Do NOT send money, make purchases, or create financial obligations without approval
292
+ - Do NOT send messages to people without approval
293
+ - Do NOT post public content without approval
294
+ - When in doubt, ask. The cost of asking is low.
package/public/app.js CHANGED
@@ -5,6 +5,10 @@ const API_BASE = '';
5
5
  const AUTH_API = '/api';
6
6
  const TOKEN_KEY = 'ninja_token';
7
7
 
8
+ // Session readiness gate — resolves when session is validated (or validation is skipped)
9
+ let sessionReadyResolve;
10
+ const sessionReady = new Promise(resolve => { sessionReadyResolve = resolve; });
11
+
8
12
  // ── Auth Module ──────────────────────────────────────────────
9
13
 
10
14
  const auth = {
@@ -12,6 +16,7 @@ const auth = {
12
16
  user: null,
13
17
  tier: null,
14
18
  terminalsMax: 2,
19
+ validating: false,
15
20
 
16
21
  init() {
17
22
  const stored = localStorage.getItem(TOKEN_KEY);
@@ -103,26 +108,38 @@ const auth = {
103
108
  },
104
109
 
105
110
  async validateTier() {
106
- const res = await fetch(`${API_BASE}/api/session`, {
107
- method: 'POST',
108
- headers: {
109
- 'Content-Type': 'application/json',
110
- ...this.getAuthHeader(),
111
- },
112
- body: JSON.stringify({ token: this.token }),
113
- });
111
+ this.validating = true;
112
+ try {
113
+ const res = await fetch(`${API_BASE}/api/session`, {
114
+ method: 'POST',
115
+ headers: {
116
+ 'Content-Type': 'application/json',
117
+ ...this.getAuthHeader(),
118
+ },
119
+ body: JSON.stringify({ token: this.token }),
120
+ });
114
121
 
115
- if (!res.ok) {
116
- // Session validation failed, but we still have local token
117
- // Proceed with defaults
118
- console.warn('Session validation failed, using defaults');
119
- return;
120
- }
122
+ if (!res.ok) {
123
+ // 401 = token truly invalid/expired, need re-login
124
+ if (res.status === 401) {
125
+ console.warn('Session validation failed: token invalid');
126
+ this.token = null;
127
+ localStorage.removeItem(TOKEN_KEY);
128
+ return { needsLogin: true };
129
+ }
130
+ // Other errors (500, network) — proceed with defaults
131
+ console.warn('Session validation failed, using defaults');
132
+ return { needsLogin: false };
133
+ }
121
134
 
122
- const data = await res.json();
123
- this.tier = data.tier || 'free';
124
- this.terminalsMax = data.terminalsMax || 2;
125
- if (data.user) this.user = data.user;
135
+ const data = await res.json();
136
+ this.tier = data.tier || 'free';
137
+ this.terminalsMax = data.terminalsMax || 2;
138
+ if (data.user) this.user = data.user;
139
+ return { needsLogin: false };
140
+ } finally {
141
+ this.validating = false;
142
+ }
126
143
  },
127
144
 
128
145
  async logout() {
@@ -213,6 +230,7 @@ function setupAuthForms() {
213
230
  await auth.login(email, password);
214
231
  hideAuthOverlay();
215
232
  startApp();
233
+ sessionReadyResolve();
216
234
  } catch (err) {
217
235
  loginError.textContent = err.message;
218
236
  }
@@ -236,6 +254,7 @@ function setupAuthForms() {
236
254
  await auth.register(username, email, password);
237
255
  hideAuthOverlay();
238
256
  startApp();
257
+ sessionReadyResolve();
239
258
  } catch (err) {
240
259
  registerError.textContent = err.message;
241
260
  }
@@ -256,6 +275,7 @@ function setupAuthForms() {
256
275
  await auth.activateLicense(key);
257
276
  hideAuthOverlay();
258
277
  startApp();
278
+ sessionReadyResolve();
259
279
  } catch (err) {
260
280
  loginError.textContent = err.message;
261
281
  }
@@ -1205,17 +1225,31 @@ async function init() {
1205
1225
  // Setup auth form handlers
1206
1226
  setupAuthForms();
1207
1227
 
1208
- // Check for existing valid session
1228
+ // Check for existing valid session (local JWT check only — fast)
1209
1229
  if (auth.init()) {
1210
- try {
1211
- await auth.validateTier();
1212
- } catch (err) {
1213
- console.warn('Tier validation failed:', err);
1214
- }
1230
+ // Valid local token — hide overlay immediately, start app
1215
1231
  hideAuthOverlay();
1216
1232
  startApp();
1233
+
1234
+ // Validate tier in background (network call to backend)
1235
+ auth.validateTier()
1236
+ .then(result => {
1237
+ if (result?.needsLogin) {
1238
+ // Token was rejected by backend — need fresh login
1239
+ showAuthOverlay();
1240
+ }
1241
+ })
1242
+ .catch(err => {
1243
+ console.warn('Tier validation failed:', err);
1244
+ // Network error — continue with cached token
1245
+ })
1246
+ .finally(() => {
1247
+ sessionReadyResolve();
1248
+ });
1217
1249
  } else {
1250
+ // No valid local token — show login
1218
1251
  showAuthOverlay();
1252
+ sessionReadyResolve(); // Unblock any waiting code
1219
1253
  }
1220
1254
  }
1221
1255
 
package/server.js CHANGED
@@ -131,7 +131,7 @@ function spawnTerminal(label, scope = [], cwd = null, tier = 'pro') {
131
131
  }
132
132
  }
133
133
 
134
- const ptyProcess = pty.spawn(SHELL, [], {
134
+ const ptyProcess = pty.spawn(SHELL, ['-l'], {
135
135
  name: 'xterm-256color',
136
136
  cols,
137
137
  rows,
@@ -619,6 +619,11 @@ app.delete('/api/terminals/:id', requireAuth, (req, res) => {
619
619
  for (const ws of terminal.clients) ws.close();
620
620
  terminals.delete(id);
621
621
 
622
+ // Reset counter when all terminals are closed
623
+ if (terminals.size === 0) {
624
+ nextId = 1;
625
+ }
626
+
622
627
  // Remove from active session
623
628
  if (activeSession) {
624
629
  activeSession.terminalIds = activeSession.terminalIds.filter(tid => tid !== id);
@@ -995,6 +1000,9 @@ function handleSessionInvalidation(token) {
995
1000
  }
996
1001
  }
997
1002
 
1003
+ // Reset terminal counter
1004
+ nextId = 1;
1005
+
998
1006
  activeSession = null;
999
1007
  }
1000
1008