@ducci/jarvis 1.0.6 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/agent.md ADDED
@@ -0,0 +1,525 @@
1
+ # Agent System
2
+
3
+ This document defines the v1 agent loop, tool handling, and logging. It is intended as the implementation source of truth.
4
+
5
+ ## Goals
6
+
7
+ - Simple request/response flow (no streaming)
8
+ - Serial tool execution for predictable behavior
9
+ - Clear, minimal logs that explain what happened
10
+
11
+ ## Core Loop (v1)
12
+
13
+ 1. Build a request from the current conversation state and system prompt.
14
+ 2. Send a single model request.
15
+ 3. If the model returns no tool calls, return the response and stop.
16
+ 4. If the model returns tool calls:
17
+ - Execute tools in order, serially (no parallel execution).
18
+ - Capture each tool result or error.
19
+ - Send one follow-up model request that includes the tool results.
20
+ 5. Repeat step 3-4 until no tools are requested or the **max iteration limit** is reached.
21
+
22
+ **Iteration Limit & Handoff**: The default limit is **10 iterations**. If the limit is reached before the task is complete, the server triggers a dedicated wrap-up call:
23
+
24
+ 1. The server appends a one-time, non-stored system note to the conversation and sends one extra model request (does not count toward the iteration limit):
25
+
26
+ ```
27
+ [System: You have reached the iteration limit. This is your final response for this run.
28
+ Respond with your normal JSON, but add a checkpoint field:
29
+
30
+ {
31
+ "response": "Brief message to the user that the task is still in progress.",
32
+ "logSummary": "Human-readable summary of what happened in this run.",
33
+ "checkpoint": {
34
+ "progress": "What has been fully completed so far.",
35
+ "remaining": "What still needs to be done to finish the task."
36
+ }
37
+ }
38
+
39
+ The checkpoint field will be used to automatically resume the task in the next run.]
40
+ ```
41
+
42
+ 2. The server reads `checkpoint.remaining` from the response and uses it as the starting prompt for a fresh agent run.
43
+ 3. The server marks the run status as `checkpoint_reached`.
44
+
45
+ - **Handoff Cap**: To prevent infinite autonomous loops, the server tracks consecutive handoffs in the session `metadata`. If `handoffCount` exceeds `maxHandoffs` (default 5), the server stops and marks the status as `intervention_required` instead of starting a new run.
46
+ - **Handoff Reset**: `handoffCount` resets to `0` whenever the server receives a new user message for that session, since this implies human review.
47
+
48
+ **Completion Logic**: The agent stops when the model returns a final text response instead of tool calls. This response is then returned to the client as the final answer.
49
+
50
+ Tool calls use the provider tool-calling API for reliability. The final user-facing response is structured JSON so we can store a human-readable summary without an extra call.
51
+
52
+ Notes:
53
+
54
+ - The loop is bounded (default 10 iterations) to avoid runaway behavior and context bloat.
55
+ - If a tool fails, its error is recorded and the model still receives the error result.
56
+
57
+ ## Triggering and Execution
58
+
59
+ Jarvis runs only when explicitly triggered. In v1, a run starts when a user sends a message to the `POST /api/chat` endpoint. Each user message creates a single agent run:
60
+
61
+ 1. Receive the user message and `sessionId`.
62
+ 2. **Session Lookup**: Load the full conversation history for the `sessionId` from `~/.jarvis/data/conversations/<sessionId>.json`. If the session doesn't exist, initialize a new history with the system prompt.
63
+ 3. **User Info Injection**: Before sending to the model, replace the `{{user_info}}` placeholder in the system prompt with the current contents of `user-info.json`. If no user info exists, replace with `(none yet)`. This resolved system prompt is sent to the model but never written back to disk — the placeholder is always preserved in the stored history.
64
+ 4. **Append Message**: Append the new user message to the loaded history.
65
+ 5. **Execute Core Loop**: Run the agent loop using the full conversation history.
66
+ 6. **Return Response**: Return the final response and log the summary.
67
+ 7. **Persistence**: Save the updated history (including any tool calls/results) back to disk.
68
+
69
+ There are no automatic background runs unless we add a scheduler later.
70
+
71
+ ## Entry Points (v1)
72
+
73
+ Jarvis accepts messages only via HTTP:
74
+
75
+ - `POST /api/chat` with a JSON body
76
+
77
+ Port: `18008` (default)
78
+
79
+ Request contract (v1):
80
+
81
+ ```json
82
+ {
83
+ "sessionId": "string (optional)",
84
+ "message": "string"
85
+ }
86
+ ```
87
+
88
+ **Notes on Request**:
89
+ - The client sends only the **latest** message.
90
+ - The server is responsible for maintaining and loading the full history via the `sessionId`.
91
+ - `sessionId` is optional. If omitted, the server creates a new session and generates a UUID v4 session ID via `crypto.randomUUID()`.
92
+ - If a `sessionId` is provided, the server loads the existing session. If no session is found for that ID, a new one is created.
93
+
94
+ Response contract (v1):
95
+
96
+ ```json
97
+ {
98
+ "sessionId": "string",
99
+ "response": "string",
100
+ "logSummary": "string",
101
+ "toolCalls": [
102
+ {
103
+ "name": "string",
104
+ "args": {},
105
+ "status": "ok | error",
106
+ "result": "string"
107
+ }
108
+ ]
109
+ }
110
+ ```
111
+
112
+ `toolCalls` is an array of all tool calls made during the run, in execution order. It is always present (empty array if no tools were called). The data is already collected during the agent loop for JSONL logging, so this adds no extra work.
113
+
114
+ The `sessionId` is always returned so the client can use it for follow-up messages. On the first message (no `sessionId` sent), the client must read this value and store it to continue the session.
115
+
116
+ **Channel adapter pattern**: External channels (e.g. a Telegram bot) act as thin adapters. They store a mapping of their own identifier (e.g. Telegram `chat_id`) to a Jarvis `sessionId`. On the first message they omit `sessionId` and store the one returned; on subsequent messages they pass it through. This keeps session management centralized on the server — no adapter needs to implement its own ID generation or session logic.
117
+
118
+ Error responses:
119
+
120
+ - `400 Bad Request` for invalid input
121
+ - `500 Internal Server Error` for runtime failures
122
+
123
+ ```json
124
+ {
125
+ "error": "string",
126
+ "status": "model_error | format_error | tool_failed"
127
+ }
128
+ ```
129
+
130
+ Interaction flow:
131
+
132
+ 1. Client sends `POST /api/chat` with an optional `sessionId` and a `message`.
133
+ 2. Server creates or loads the session, then runs the agent loop.
134
+ 3. Server returns `sessionId`, `response`, and `logSummary` as JSON.
135
+ 4. Server appends a JSONL log entry for the run.
136
+
137
+ ## System Prompt (v1)
138
+
139
+ The authoritative system prompt text lives in [docs/system-prompt.md](./system-prompt.md). It is sent as the first message (`role: "system"`) in every session and stored verbatim in the conversation history.
140
+
141
+ ## Tools
142
+
143
+ All tools — built-in and user-defined — live in a single registry file (`tools.json`) and are executed via the same `new Function()` path. There is no separate execution mechanism for built-ins.
144
+
145
+ **Built-in tools** (seeded into `tools.json` on first server start if missing):
146
+
147
+ - `get_recent_sessions` — returns the most recent sessions (default: last 2, configurable via `limit`)
148
+ - `read_session_log` — returns JSONL log entries for a given session; the agent-accessible way to inspect failures and previous run summaries
149
+ - `save_user_info` — persists user facts to `user-info.json`
150
+ - `read_user_info` — returns all stored user facts
151
+ - `exec` — runs arbitrary shell commands as the server user; no safeguards
152
+
153
+ If a built-in entry is missing from `tools.json` at startup, the server re-seeds it from its default definition. This means built-ins can be inspected and edited in place, and will be restored if accidentally deleted.
154
+
155
+ Tools are powerful (file access, network, shell) and run with the same permissions as the server.
156
+
157
+ ## Tool Registry Format
158
+
159
+ Jarvis uses the same simple tool registry pattern as dai: a JSON file that maps tool names to `definition` and `code`.
160
+
161
+ Path:
162
+
163
+ - `~/.jarvis/data/tools/tools.json`
164
+
165
+ Schema example:
166
+
167
+ ```json
168
+ {
169
+ "read_user_info": {
170
+ "definition": {
171
+ "type": "function",
172
+ "function": {
173
+ "name": "read_user_info",
174
+ "description": "Read all stored user facts.",
175
+ "parameters": { "type": "object", "properties": {}, "required": [] }
176
+ }
177
+ },
178
+ "code": "const raw = await fs.promises.readFile(path.join(process.env.HOME, '.jarvis/data/user-info.json'), 'utf8').catch(() => '{\"items\":[]}'); const { items } = JSON.parse(raw); return { status: 'ok', items };"
179
+ }
180
+ }
181
+ ```
182
+
183
+ At runtime, the server loads each tool and compiles `code` into an async function using `new Function('args', 'fs', 'path', 'process', 'require', code)`, called as `fn(args, fs, path, process, require)`.
184
+
185
+ **Tool descriptions are critical.** The system prompt does not list available tools — the model discovers them exclusively via the `tools` field in each API request. The `description` field in every tool definition is therefore the only guidance the model has for deciding when and how to use a tool. Descriptions must be specific about the tool's purpose, when to call it, and what its output means.
186
+
187
+ **Concurrency**: The server supports concurrent sessions. While tools are executed serially *within* a single run, multiple sessions can be processed in parallel by the server.
188
+
189
+ Seed tool included for sanity checks:
190
+
191
+ - `list_dir`
192
+ - Purpose: list directory contents similar to `ls -la`
193
+ - Input: `{ "path": "." }` (optional; defaults to current working directory)
194
+ - Output should include the resolved path that was actually listed
195
+
196
+ ## Tool Call Contract
197
+
198
+ Jarvis uses the provider tool-calling API:
199
+
200
+ 1. The model returns an assistant message containing a `tool_calls` array.
201
+ 2. Jarvis appends that assistant message to the conversation history as-is.
202
+ 3. Jarvis executes those tools in order, serially.
203
+ 4. Each tool result is appended to the conversation as a `role: "tool"` message with a matching `tool_call_id`.
204
+ 5. Jarvis calls the model again with the updated conversation.
205
+ 6. When no tool calls are returned, the model must respond with JSON `{ response, logSummary }`.
206
+
207
+ Assistant message shape (appended to history when the model requests tools):
208
+
209
+ ```json
210
+ {
211
+ "role": "assistant",
212
+ "content": null,
213
+ "tool_calls": [
214
+ {
215
+ "id": "call_abc123",
216
+ "type": "function",
217
+ "function": {
218
+ "name": "tool_name",
219
+ "arguments": "{\"key\":\"value\"}"
220
+ }
221
+ }
222
+ ]
223
+ }
224
+ ```
225
+
226
+ Tool result message shape (appended to history after execution):
227
+
228
+ ```json
229
+ {
230
+ "role": "tool",
231
+ "tool_call_id": "call_abc123",
232
+ "content": "{\"status\":\"ok\",\"result\":\"...\"}"
233
+ }
234
+ ```
235
+
236
+ The `tool_call_id` in the tool message must match the `id` of the corresponding entry in the assistant's `tool_calls` array. Without this pairing, the provider will reject the request.
237
+
238
+ ## Tool Argument Validation
239
+
240
+ Jarvis does not validate tool arguments against schemas in v1. Tool code receives the raw args and is responsible for handling missing or invalid inputs.
241
+
242
+ ## Exec Tool
243
+
244
+ `exec` runs a shell command as the server user with no safeguards.
245
+
246
+ Input:
247
+
248
+ ```json
249
+ { "cmd": "string" }
250
+ ```
251
+
252
+ Output (stringified JSON in tool message content):
253
+
254
+ ```json
255
+ {
256
+ "status": "ok" | "error",
257
+ "exitCode": 0,
258
+ "stdout": "...",
259
+ "stderr": "..."
260
+ }
261
+ ```
262
+
263
+ ## Conversation Storage
264
+
265
+ Full conversation history is stored per session ID so the agent can keep context across messages. This is separate from the human log.
266
+
267
+ Path:
268
+
269
+ - `~/.jarvis/data/conversations/<sessionId>.json`
270
+
271
+ To support persistent tracking (like `handoffCount`), each file contains a JSON object with `metadata` and an ordered list of `messages`:
272
+
273
+ ```json
274
+ {
275
+ "metadata": {
276
+ "handoffCount": 0,
277
+ "createdAt": "...",
278
+ "updatedAt": "..."
279
+ },
280
+ "messages": [
281
+ { "role": "system", "content": "..." },
282
+ { "role": "user", "content": "What do you know about me?" },
283
+ {
284
+ "role": "assistant",
285
+ "content": null,
286
+ "tool_calls": [
287
+ {
288
+ "id": "call_abc123",
289
+ "type": "function",
290
+ "function": { "name": "read_user_info", "arguments": "{}" }
291
+ }
292
+ ]
293
+ },
294
+ {
295
+ "role": "tool",
296
+ "tool_call_id": "call_abc123",
297
+ "content": "{\"status\":\"ok\",\"items\":[]}"
298
+ },
299
+ {
300
+ "role": "assistant",
301
+ "content": "{\"response\":\"...\",\"logSummary\":\"...\"}"
302
+ }
303
+ ]
304
+ }
305
+ ```
306
+
307
+ The system prompt is stored as the first message in the `messages` array. The full turn sequence — user → assistant (with tool_calls) → tool → assistant (final) — is stored verbatim so that subsequent requests can be sent to the provider without any transformation.
308
+
309
+ ## Provider Message Format
310
+
311
+ When sending the conversation to OpenRouter, messages must follow the OpenAI-compatible chat format.
312
+
313
+ Example tool-call flow:
314
+
315
+ **Step 1 — Initial request:**
316
+
317
+ ```json
318
+ {
319
+ "model": "openrouter/model-name",
320
+ "messages": [
321
+ { "role": "system", "content": "..." },
322
+ { "role": "user", "content": "What do you know about me?" }
323
+ ]
324
+ }
325
+ ```
326
+
327
+ **Step 2 — Provider response (contains tool call):**
328
+
329
+ ```json
330
+ {
331
+ "role": "assistant",
332
+ "content": null,
333
+ "tool_calls": [
334
+ {
335
+ "id": "call_abc123",
336
+ "type": "function",
337
+ "function": {
338
+ "name": "read_user_info",
339
+ "arguments": "{}"
340
+ }
341
+ }
342
+ ]
343
+ }
344
+ ```
345
+
346
+ **Step 3 — Follow-up request (after tool execution):**
347
+
348
+ Jarvis appends the assistant message from Step 2 and then the tool result to the history and sends:
349
+
350
+ ```json
351
+ {
352
+ "model": "openrouter/model-name",
353
+ "messages": [
354
+ { "role": "system", "content": "..." },
355
+ { "role": "user", "content": "What do you know about me?" },
356
+ {
357
+ "role": "assistant",
358
+ "content": null,
359
+ "tool_calls": [
360
+ {
361
+ "id": "call_abc123",
362
+ "type": "function",
363
+ "function": { "name": "read_user_info", "arguments": "{}" }
364
+ }
365
+ ]
366
+ },
367
+ {
368
+ "role": "tool",
369
+ "tool_call_id": "call_abc123",
370
+ "content": "{\"status\":\"ok\",\"items\":[{\"key\":\"timezone\",\"value\":\"Europe/Berlin\"}]}"
371
+ }
372
+ ]
373
+ }
374
+ ```
375
+
376
+ **Step 4 — Final model response (JSON):**
377
+
378
+ ```json
379
+ {
380
+ "response": "I know your timezone is Europe/Berlin.",
381
+ "logSummary": "Read user info and reported timezone."
382
+ }
383
+ ```
384
+
385
+ Internal flow summary:
386
+
387
+ 1. Send request to provider.
388
+ 2. Receive assistant message — if it contains `tool_calls`, append it to the conversation history.
389
+ 3. Execute tool calls locally, in order.
390
+ 4. Append each tool result as a `role: "tool"` message with matching `tool_call_id`.
391
+ 5. Call the model again with the updated conversation.
392
+ 6. Repeat until no tool calls are returned.
393
+
394
+ ## Logging
395
+
396
+ We store a minimal, append-only JSONL log per session for human readability. Each line is one request/response cycle.
397
+
398
+ Path:
399
+
400
+ - `~/.jarvis/logs/session-<sessionId>.jsonl`
401
+
402
+ Each log entry includes only the essentials (no full message history). The model provides a concise `logSummary` alongside the user-facing response.
403
+
404
+ **Logging Philosophy**:
405
+ - **Transparency**: The `logSummary` must be written for a *human observer*. It should explain not just what tools were called, but the reasoning behind them.
406
+ - **Understandability**: A developer should be able to follow the agent's intent and identify where a plan went off-track just by reading the `logSummary` entries.
407
+
408
+ ```json
409
+ {
410
+ "ts": "2026-02-13T12:34:56.789Z",
411
+ "sessionId": "abc123",
412
+ "iteration": 1,
413
+ "model": "openrouter/model-name",
414
+ "userInput": "...",
415
+ "toolCalls": [
416
+ {
417
+ "name": "tool_name",
418
+ "args": { "key": "value" },
419
+ "status": "ok",
420
+ "result": "..."
421
+ }
422
+ ],
423
+ "response": "...",
424
+ "logSummary": "...",
425
+ "status": "ok"
426
+ }
427
+ ```
428
+
429
+ Status values:
430
+
431
+ - `ok`: normal completion
432
+ - `tool_failed`: at least one tool failed
433
+ - `model_error`: model request failed
434
+ - `checkpoint_reached`: max iterations hit; task handed off or paused
435
+ - `intervention_required`: max handoffs reached; human input needed to proceed
436
+ - `format_error`: malformed JSON response
437
+
438
+ This log is meant to be readable without digging through raw prompts.
439
+
440
+ Tool inputs/outputs:
441
+
442
+ - `read_session_log`
443
+ - Input: `{ "sessionId": "string", "limit": 20 }`
444
+ - Output: `{ "status": "ok", "entries": [...] }`
445
+
446
+ ## Error Handling and Retries
447
+
448
+ - Model call failures: try the selected model once, then one fallback model attempt. If both fail, end the run with a `500` error and a clear message.
449
+ - Tool failures: pass the error result back to the model and continue the loop. Best case would be that the next model response include another tool call to fix the previous tool call. All tool errors (especially `exec` failures) must be reported in the `logSummary` with enough detail for a human to understand the cause.
450
+ - Malformed JSON on final response: log the failure and stop the run with a formatted error message.
451
+
452
+ **Error Payload Structure**:
453
+
454
+ ```json
455
+ {
456
+ "error": "Short, human-readable description of the error",
457
+ "details": "Optional stack trace or additional context"
458
+ }
459
+ ```
460
+
461
+ - Use `400 Bad Request` for invalid client inputs.
462
+ - Use `500 Internal Server Error` for API failures, tool runtime errors, or model communication issues.
463
+ - Always append a log entry on failure so the outcome is visible in the session log.
464
+
465
+ Model configuration:
466
+
467
+ - Selected model ID is stored in the same config file created during setup.
468
+ - If the first model call fails, retry once using `fallbackModel`.
469
+
470
+ Config file:
471
+
472
+ - `~/.jarvis/data/config/settings.json`
473
+
474
+ Schema:
475
+
476
+ ```json
477
+ {
478
+ "selectedModel": "openrouter/...",
479
+ "fallbackModel": "openrouter/free"
480
+ }
481
+ ```
482
+
483
+ ## Limits and Timeouts
484
+
485
+ - Max iterations per run: 10 (default).
486
+ - `checkpoint_reached` status is used when the limit is hit to trigger a handoff.
487
+ - **Handoff Safety**: `maxHandoffs` (default 5) limits the number of autonomous restarts for a single user trigger. If reached, status changes to `intervention_required`.
488
+ - No separate wall-clock timeout in v1; iterations are the only limiter.
489
+ - No additional per-run tool-call cap beyond iterations.
490
+ - No token limit enforcement in v1.
491
+
492
+ ## User Info
493
+
494
+ User info is stored as a small JSON file in the Jarvis data directory: `~/.jarvis/data/user-info.json`. `save_user_info` appends to the collection, and `read_user_info` returns the full set.
495
+
496
+ Schema:
497
+
498
+ ```json
499
+ {
500
+ "items": [
501
+ { "key": "string", "value": "string", "ts": "2026-02-13T12:34:56.789Z" }
502
+ ]
503
+ }
504
+ ```
505
+
506
+ Tool inputs/outputs:
507
+
508
+ - `save_user_info`
509
+ - Input: `{ "items": [{ "key": "string", "value": "string" }] }`
510
+ - Behavior: overwrite existing items with the same `key` and update `ts`.
511
+ - Output: `{ "status": "ok", "saved": <number> }`
512
+
513
+ - `read_user_info`
514
+ - Input: `{}`
515
+ - Output: `{ "status": "ok", "items": [...] }`
516
+
517
+ ## Session Titles
518
+
519
+ Session titles are derived from the first `logSummary` in the session log, truncated for display. `get_recent_sessions` should return these titles alongside session IDs to keep results human-readable.
520
+
521
+ Tool inputs/outputs:
522
+
523
+ - `get_recent_sessions`
524
+ - Input: `{ "limit": 2 }`
525
+ - Output: `{ "status": "ok", "sessions": [{ "sessionId": "...", "title": "...", "lastTs": "..." }] }`
package/docs/cli.md ADDED
@@ -0,0 +1,49 @@
1
+ # CLI and Server Lifecycle
2
+
3
+ This document specifies the requirements for the `jarvis` CLI and its relationship with the Jarvis server.
4
+
5
+ ## Architectural Separation
6
+
7
+ Jarvis consists of two distinct components:
8
+
9
+ 1. **The Server**: The core agent system that handles chat requests, triggers tools, and manages persistent memory.
10
+ 2. **The CLI**: A management tool used for configuration (onboarding) and controlling the server process.
11
+
12
+ ## Global Installation Requirements
13
+
14
+ The project must be prepared for global installation via `npm i -g`. This requires the `package.json` to define appropriate `bin` entries:
15
+
16
+ ```json
17
+ {
18
+ "name": "jarvis",
19
+ "bin": {
20
+ "jarvis": "./src/index.js",
21
+ "jarvis-setup": "./src/scripts/onboarding.js"
22
+ }
23
+ }
24
+ ```
25
+
26
+ ## Server Lifecycle Commands
27
+
28
+ Lifecycle management is handled by the CLI using the **programmatic PM2 API** for process stability and fine-grained control.
29
+
30
+ ### `jarvis start`
31
+ - **Pre-flight Check**: Verifies that `.env` and `settings.json` exist. If missing, it prints an error message ("Please run `jarvis setup` first") and exits with code 1. This prevents PM2 from infinite restart loops when no configuration is present.
32
+ - Starts the server as a background process using the PM2 API.
33
+ - The process is named `jarvis-server`.
34
+ - Enables `autorestart` on crash.
35
+ - Merges logs into a single file in the user's data directory.
36
+
37
+ ### `jarvis stop`
38
+ - Stops the background process named `jarvis-server` using PM2.
39
+
40
+ ### `jarvis status`
41
+ - (Planned) Displays the current status of the `jarvis-server` process.
42
+
43
+ ## Local Development
44
+
45
+ For development in the repository, the server can be run in the foreground:
46
+
47
+ ### `npm run dev`
48
+ - Starts the server directly without PM2 or daemonization.
49
+ - Uses the same environment loading logic as production.
@@ -0,0 +1,44 @@
1
+ # Evaluation
2
+
3
+ This document exists so that an AI coding agent (e.g. Claude Code) can evaluate how well Jarvis is working and propose concrete improvements — to the system prompt, tool definitions, or server code.
4
+
5
+ ## How to Use This Document
6
+
7
+ When asked to evaluate Jarvis, an AI coding agent should:
8
+
9
+ 1. Read the relevant files listed below
10
+ 2. Identify signals that indicate problems
11
+ 3. Propose specific improvements (to code, `tools.json`, or the system prompt)
12
+
13
+ The agent should not make changes without user approval.
14
+
15
+ ## Relevant Files
16
+
17
+ - `~/.jarvis/logs/session-*.jsonl` — one log file per session; each line is one agent run
18
+ - `~/.jarvis/data/tools/tools.json` — all tool definitions and their code
19
+ - `~/.jarvis/data/conversations/*.json` — full conversation histories
20
+ - `docs/system-prompt.md` — the system prompt sent to the model on every session
21
+
22
+ ## What "Good" Looks Like
23
+
24
+ - Tasks complete within a small number of iterations (well under the limit of 10)
25
+ - `intervention_required` status is rare
26
+ - `checkpoint_reached` status is occasional but not frequent
27
+ - `logSummary` entries are specific and explain reasoning, not just actions
28
+ - Tool errors (`tool_failed`) are rare and the agent recovers from them
29
+
30
+ ## Problem Signals
31
+
32
+ | Signal | Possible Cause |
33
+ |---|---|
34
+ | Frequent `intervention_required` | Agent loops without making progress; possibly a prompt or tool issue |
35
+ | Frequent `checkpoint_reached` | Tasks are too complex for the iteration limit, or the agent is inefficient |
36
+ | Frequent `tool_failed` | Tool code is broken, or the agent is calling tools with wrong arguments |
37
+ | Vague `logSummary` (e.g. "Called exec.") | System prompt guidance for logSummary is not being followed |
38
+ | Frequent `format_error` | Model is not following the JSON response format; system prompt may need adjustment |
39
+
40
+ ## What Can Be Improved
41
+
42
+ - **System prompt** (`docs/system-prompt.md`) — if the agent is behaving incorrectly or producing poor logSummaries
43
+ - **Tool definitions** (`~/.jarvis/data/tools/tools.json`) — if tools are failing or their descriptions are causing misuse
44
+ - **Server code** (`src/`) — if there are bugs in the agent loop, error handling, or session management
package/docs/setup.md ADDED
@@ -0,0 +1,81 @@
1
+ # Setup and Start
2
+
3
+ This document is the implementation spec for how Jarvis is configured and started.
4
+ Another LLM should be able to implement the CLI and startup behavior from this alone.
5
+
6
+ ## Dependencies
7
+
8
+ Jarvis relies on the following key libraries:
9
+
10
+ - `commander`: CLI framework for handling commands and arguments.
11
+ - `pm2`: For process management and daemonization of the Jarvis server.
12
+ - `dotenv`: To load environment variables from `.env`.
13
+ - `openai`: Official SDK for interacting with the OpenRouter (OpenAI-compatible) API.
14
+ - `inquirer`: For interactive CLI prompts during onboarding.
15
+ - `chalk`: For terminal styling and colored output.
16
+
17
+ ## Commands
18
+
19
+ ### `jarvis setup` (interactive onboarding)
20
+
21
+ Purpose: collect API key and model selection and persist them on disk.
22
+
23
+ Behavior:
24
+
25
+ 1. API key step
26
+ - Read existing `.env` from the Jarvis config directory (see Data Layout).
27
+ - If `OPENROUTER_API_KEY` exists, prompt to keep or replace.
28
+ - If missing, prompt for a new key (password input, min length 10).
29
+ - Write/update `OPENROUTER_API_KEY=...` in `.env`.
30
+
31
+ 2. Model selection step
32
+ - Read current model from `data/config/settings.json` under key `selectedModel`.
33
+ - If a model exists, prompt to keep or change it.
34
+ - If changing, offer two paths:
35
+ - Manual entry of model ID.
36
+ - Browse models from OpenRouter:
37
+ - Fetch `https://openrouter.ai/api/v1/models` with Bearer auth.
38
+ - Sort models with free models first, then alphabetical.
39
+ - Show a paginated list (page size ~20).
40
+
41
+ - Persist the chosen model as `selectedModel` in `data/config/settings.json`.
42
+ - Set `fallbackModel` to `openrouter/free` only if it is not set yet.
43
+
44
+ Notes:
45
+
46
+ - The setup command can be run as many times as desired.
47
+ - The setup command must be usable even if the server is not running.
48
+ - For local development without a global install, run the setup script directly
49
+ (example: `npm run setup` in the server directory).
50
+
51
+ ## Data Layout
52
+
53
+ Jarvis data is always stored in the user home directory and never in the repo.
54
+ There is no environment variable override for this path.
55
+
56
+ Base directory:
57
+
58
+ - `~/.jarvis/`
59
+
60
+ Within this directory:
61
+
62
+ - `.env` (OpenRouter API key)
63
+ - `data/`
64
+ - `config/settings.json`:
65
+ ```json
66
+ {
67
+ "selectedModel": "openrouter/anthropic/claude-3.5-sonnet",
68
+ "fallbackModel": "openrouter/free",
69
+ "maxIterations": 10,
70
+ "maxHandoffs": 5,
71
+ "port": 18008
72
+ }
73
+ ```
74
+ - `logs/`
75
+ - `server.log`
76
+
77
+ ## First Run Flow
78
+
79
+ 1. User installs Jarvis globally.
80
+ 2. User runs `jarvis setup` to configure API keys and model.
81
+ 3. User runs `jarvis start` to launch the background server.
@@ -0,0 +1,53 @@
1
+ # System Prompt (v1)
2
+
3
+ This is the authoritative system prompt sent to the model at the start of every session. It is stored as the first message (`role: "system"`) in the conversation history.
4
+
5
+ Before sending to the model, the server replaces the `{{user_info}}` placeholder with the current contents of `user-info.json`. This happens at runtime on every request — the placeholder is never stored in the conversation history.
6
+
7
+ ---
8
+
9
+ ```
10
+ You are Jarvis, a fully autonomous agent running on a local server. You have access to tools and can execute shell commands on the machine you run on.
11
+
12
+ ## Known User Context
13
+
14
+ {{user_info}}
15
+
16
+ ## Response Format
17
+
18
+ There are two types of responses depending on whether you need to use tools:
19
+
20
+ **While using tools**: respond using the tool-calling protocol. No text content is expected or required — your tool calls speak for themselves.
21
+
22
+ **Final response** (when you have no more tool calls to make): your text content MUST be a JSON object and nothing else:
23
+
24
+ {
25
+ "response": "Your message to the user, in plain text.",
26
+ "logSummary": "A concise explanation of what you did and why, written for a human reading the logs."
27
+ }
28
+
29
+ Never include markdown code fences, preamble, or any text outside this JSON object. If you cannot complete a task, explain why in the `response` field — still as valid JSON.
30
+
31
+ ## Tool Use
32
+
33
+ You have access to a set of tools. Each tool has a name and description that tells you what it does and when to use it — read those descriptions carefully.
34
+
35
+ - Always use a tool to perform an action. Never claim to have done something without actually calling the relevant tool.
36
+ - Call tools one at a time. You will receive the result before deciding on the next step.
37
+ - After a tool call, verify the result before declaring the task done.
38
+ - Stop as soon as the task is complete and verified. Do not do extra work that was not asked for.
39
+ - If a tool fails, record the error in `logSummary` and decide whether to retry with a corrected call or explain the failure to the user.
40
+ - If the user shares personal information, persist it using the appropriate tool.
41
+ - Prefer using tools over making assumptions about the state of the system.
42
+
43
+ ## logSummary Guidelines
44
+
45
+ The `logSummary` is written for a human observer, not for the user. It must:
46
+ - Explain the reasoning behind tool calls, not just list what was called.
47
+ - Include enough detail that a developer can understand your intent and pinpoint where things went wrong.
48
+ - Report any tool errors with the relevant output (exit code, stderr snippet, etc.).
49
+
50
+ Example of a bad logSummary: "Called exec."
51
+ Example of a good logSummary: "User asked for their timezone. Found Europe/Berlin in the injected user context. No tool call needed."
52
+
53
+ ```
package/docs/ui.md ADDED
@@ -0,0 +1,71 @@
1
+ # UI
2
+
3
+ A minimal chat interface to interact with the Jarvis agent. The goal is function over form — just enough UI to send messages, read responses, and inspect tool calls.
4
+
5
+ ## Stack
6
+
7
+ - Vite + React + Tailwind
8
+ - Lives in a `ui/` folder at the project root (separate from server code)
9
+
10
+ ## Layout
11
+
12
+ Single page, three regions:
13
+
14
+ 1. **Header** — app name ("Jarvis") on the left, "New Session" button on the right
15
+ 2. **Message area** — scrollable list of messages, newest at the bottom
16
+ 3. **Input area** — textarea + send button, pinned to the bottom
17
+
18
+ Light mode only. Minimal styling — no shadows, gradients, or animations.
19
+
20
+ ## Message Types
21
+
22
+ **User message** — right-aligned, plain text.
23
+
24
+ **Assistant message** — left-aligned. Shows the `response` field from the API.
25
+
26
+ **Tool call block** — rendered inline inside the assistant turn, above the final response text. Each tool call shows:
27
+ - Tool name
28
+ - Args (JSON, collapsed by default)
29
+ - Status (`ok` or `error`)
30
+ - Result (truncated if long, expandable)
31
+
32
+ Tool call blocks use a monospace font and a light gray background to visually separate them from chat text.
33
+
34
+ ## Session Management
35
+
36
+ - On load: no `sessionId` — the first message creates a new session
37
+ - After the first response: store the returned `sessionId` in React state and pass it on every subsequent request
38
+ - "New Session" button: clears messages and resets `sessionId` to null
39
+
40
+ ## API Communication
41
+
42
+ All requests go to `POST /api/chat` on the same host/port.
43
+
44
+ Request:
45
+ ```json
46
+ { "sessionId": "string | null", "message": "string" }
47
+ ```
48
+
49
+ Response fields used by the UI: `sessionId`, `response`, `toolCalls`.
50
+
51
+ `logSummary` is not displayed in the UI.
52
+
53
+ While waiting for a response, the input is disabled and a simple loading indicator is shown in the message area.
54
+
55
+ ## Development
56
+
57
+ Vite dev server runs on its default port (5173) and proxies `/api` requests to `http://localhost:18008`. Configure this in `vite.config.js`:
58
+
59
+ ```js
60
+ export default {
61
+ server: {
62
+ proxy: {
63
+ '/api': 'http://localhost:18008'
64
+ }
65
+ }
66
+ }
67
+ ```
68
+
69
+ ## Production
70
+
71
+ The server serves the built UI (`ui/dist/`) as static files at `/`. The Express static middleware is added in `src/server/app.js`. No separate process needed.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ducci/jarvis",
3
- "version": "1.0.6",
3
+ "version": "1.0.7",
4
4
  "description": "A fully automated agent system that lives on a server.",
5
5
  "main": "./src/index.js",
6
6
  "type": "module",
@@ -10,6 +10,7 @@
10
10
  },
11
11
  "files": [
12
12
  "src",
13
+ "docs",
13
14
  "ui/dist"
14
15
  ],
15
16
  "scripts": {