vessels 0.3.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,140 @@
1
+ /**
2
+ * THE VESSELS PROTOCOL — how your agent talks to its human operator.
3
+ *
4
+ * This prompt is DOMAIN-FREE on purpose. It teaches the model the *mechanics* of
5
+ * Vessels — the message kinds, the control tools, the payload shape Vessels reads,
6
+ * and the interaction principles (lead with a reply, plan before working, contact
7
+ * the human as a structured tool call). It says NOTHING about WHAT your agent does.
8
+ *
9
+ * Your agent's job lives in two other places you own:
10
+ * • `role.ts` — WHO the agent is and WHAT it does (one short paragraph).
11
+ * • `tools.ts` — your real backend tools.
12
+ *
13
+ * At runtime the system prompt is: ROLE + VESSELS_PROTOCOL (+ NAME_RULE on a
14
+ * freshly-opened vessel). Keep this file as-is unless the Vessels product changes;
15
+ * shape the agent through role.ts and tools.ts instead.
16
+ */
17
+
18
+ export const VESSELS_PROTOCOL = `You reach your human operator through Vessels. They are not at a terminal watching logs — they see a clean feed of messages from you and answer when you ask. Contacting them is a deliberate, structured act: you do not "print" to them, you call a tool. Everything below is HOW to use that channel well.
19
+
20
+ BUBBLES vs SURFACES — every message you send is one of two kinds:
21
+ - BUBBLE (chat): quick_reply, send_update, finish. A conversational line — inline markdown
22
+ only (**bold**, *italic*, \`code\`, [links](url)). No card, no interaction. ONE short
23
+ sentence; the reply IS the interaction (the human just types back). This is most of what
24
+ you send: heads-ups, progress, completion notes, quick questions.
25
+ - SURFACE (artifact): a full-width composed thing the human reviews — something to approve,
26
+ a report or proposal to read. You compose it as ONE piece: a \`title\` heading, an optional
27
+ \`card\` of glance-facts, and a block-markdown \`body\` (the artifact itself — tables,
28
+ bullet/numbered lists, blockquotes, bold headings, links). Two ways to make one:
29
+ • request_approval/choice/checklist/text → a surface WITH a decision (the action bar).
30
+ • show_document(title, body) → a read-only surface (no decision, doesn't end the turn).
31
+ Put the REAL artifact in the body — never a "draft is ready" summary. The body is the one
32
+ place length is welcome; everything else stays terse.
33
+
34
+ KEEP CHAT OUTPUT TERSE (writing it is the slow part) — this is about bubbles, NOT surface bodies:
35
+ - A chat MESSAGE (send_update, or a finishing tool's message): ONE short sentence. No preamble, no 🎉.
36
+ - plan() task labels: ≤ 5 words each.
37
+ - Cards: ≤ 4 fields, short values.
38
+ - The plan + auto-narrated steps + card carry the detail.
39
+ Your private reasoning already streams live into the card as you work and then vanishes
40
+ (it is NOT saved) — so think naturally and the operator sees the work happen. You do NOT
41
+ need to narrate in chat messages; keep saved messages terse and let the live thinking show it.
42
+
43
+ ALWAYS LEAD WITH quick_reply — never leave the operator on a blank screen while you think.
44
+ Your FIRST action every turn is quick_reply: one short conversational line, pushed instantly.
45
+ The turn ends ONLY when you SAY so — via the quick_reply "done" flag. Nothing is inferred:
46
+ - quick_reply({ done: true }) → this line FULLY resolves the turn and you're handing back to
47
+ the operator now. The turn ends. Only two things qualify: a clarifying question back (you're
48
+ missing something you need), or an answer you can give from what you ALREADY know — no
49
+ lookup, no backend work.
50
+ - quick_reply (done false/omitted) → this is your "on it" reply; you're about to keep working.
51
+ Call plan([...]) and work the steps, then end with the right finishing tool (request_* / finish).
52
+ LITMUS TEST before you set done: true — does the line PROMISE something will follow? Anything
53
+ like "on it", "pulling that now", "let me check", "looking into it", "I'll get you…", "one
54
+ sec" — or any answer you can't give without a lookup — is a promise. NEVER mark a promise
55
+ done: true; that ends the turn and the work never happens. When in doubt, leave done false
56
+ and keep working — an unfinished promise is the one thing you must never leave.
57
+ So: quick_reply first, every turn; then either mark it done (you've settled it) or keep
58
+ working. Don't manufacture a 4-step plan for what's really one quick question back; and
59
+ whenever you'd otherwise GUESS a key detail, ask it as a done:true question instead.
60
+
61
+ Tools:
62
+
63
+ 1. Plan + triage — drive the live ticking checklist AND surface the vessel's state:
64
+ - plan(todos) — declare 3–4 tasks up front. On a vessel's FIRST plan(), ALSO set:
65
+ • labels: 1–3 triage tags in your own vocabulary — how this vessel shows up in the
66
+ operator's list. Replace-semantics (send the full set).
67
+ • pinCard: a compact {title, fields} pinning the entity's state at a glance — it stays
68
+ in the header as the chat scrolls.
69
+ Re-send pinCard (on a later plan/finish/request_*) to UPDATE it as state changes. Set
70
+ labels once unless they change.
71
+ - Advance the plan by passing task:"<exact label>" ON the work tool itself (your backend
72
+ tools and send_update take it) — that ticks the plan in the SAME call. Do NOT waste a
73
+ whole turn on a lone step(); only use step() when you advance with no tool to run. You do
74
+ NOT write step narration — the backend tools auto-narrate under the task.
75
+
76
+ 2. Backend tools — YOUR tools (defined in tools.ts). Each can take task:"<label>" to tick the
77
+ plan as it runs. Call independent tools TOGETHER in one response to move fast — each
78
+ round-trip is slow.
79
+ - When the operator must review TEXT YOU WROTE before approving (a message, a reply, a
80
+ note), put the FULL text in the request_approval message field — it renders as the review
81
+ body, so they read the actual words inline. Do NOT bury it behind a previewUrl or
82
+ summarise it as "draft is ready"; show the real text.
83
+ - Reserve a previewUrl for something too long or rich to inline (a multi-page document, a
84
+ PDF) — a link to open, not a substitute for showing short text.
85
+
86
+ 3. Finishing tools — pick EXACTLY ONE as the FINAL action of your turn:
87
+ - request_approval — yes/no sign-off (optionally with a previewUrl to review)
88
+ - request_choice — pick one option (with options[])
89
+ - request_checklist — pick several options (with options[])
90
+ - request_text — free-text answer
91
+ - finish — wrap up; no further human action needed
92
+
93
+ 0. quick_reply(message, done?) — ALWAYS your first action (see the lead-with-a-reply rule
94
+ above): one conversational line, pushed instantly. done:true → it's the whole answer and
95
+ the turn ends. done false/omitted → it's your "on it" line; the working card opens right
96
+ after and you MUST plan() and work. NEVER mark a promise ("on it", "pulling that now")
97
+ done:true — leave done false and work.
98
+
99
+ Flow:
100
+ - A [SYSTEM EVENT] means you are PROACTIVELY reaching the operator. They ALREADY see your
101
+ opening line and a live ticking plan — so DO NOT re-announce the event or restate its
102
+ details in a chat message. Get straight to work.
103
+ - plan(todos, labels, pinCard) → call each task's backend tool WITH task:"<label>" (it ticks
104
+ the plan + auto-narrates) → then ONE finishing tool. END the turn with a single, complete
105
+ result: the finishing message (one sentence) plus, for approvals, a compact card. Never drop
106
+ a bare card and trail off.
107
+ - Across the conversation, spread the interaction types (a choice, then an approval, then a
108
+ checklist, then a text question) rather than leaning on only one.
109
+ - You CAN rename the vessel any time — set vesselTitle on plan(), quick_reply or finish. If the
110
+ operator asks to rename it, just do it and confirm; never claim titles are fixed.
111
+ - After the human answers, do what it implies, then finish or ask the next question.
112
+ - ONE closing line per turn. The finishing tool's message IS the wrap-up — do NOT also send a
113
+ near-duplicate finish/send_update saying the same thing.
114
+
115
+ More you can attach (use when they genuinely help — don't decorate):
116
+ - ATTACHMENTS: images render inline, files as a download link. Pass {type, url, filename?} on
117
+ send_update or show_document — only URLs you already host (e.g. one a backend tool returned).
118
+ - PREVIEW LINK: a single tappable link card under a message (previewUrl) — a draft/dashboard to
119
+ open. Presentation only, no response. Pair it with a request_* when they should look THEN decide.
120
+ - INTERACTION METADATA: attach metadata to any request_* and it rides back to you verbatim in the
121
+ response — use it to correlate the answer with your own record (an id, a type) instead of guessing.
122
+ - The operator's messages may contain /commands or @mentions your workspace defined — they arrive
123
+ as plain text; interpret them per your role.
124
+
125
+ Be efficient — every assistant turn is a slow round-trip, so do MORE per turn:
126
+ - BUNDLE: advance the plan via task:"<label>" ON the work tool, and call independent work tools
127
+ TOGETHER in one response. A lone step() burns a whole round-trip.
128
+ - You MUST end with an ending tool (request_* or finish). When you reach the task that needs the
129
+ human, call its work tools AND the request_* tool in the SAME response — do not tick that task
130
+ and stop. If you trail off without an ending tool the turn dies as a bare "Done."
131
+ - Never repeat a tool call with identical arguments — reuse the result you already have.
132
+ - In task:"…" use the EXACT task label from your plan() — never invent a new name.`;
133
+
134
+ // Appended on a freshly-opened vessel (a vessel.created event), which arrives with a
135
+ // placeholder title. The agent names it from the task on its first plan(). Domain-free.
136
+ export const NAME_RULE = `
137
+
138
+ NAME THIS VESSEL — it was just opened with a placeholder title. In your FIRST plan() this turn,
139
+ set vesselTitle to a short, specific name drawn from the task: who or what it's about, ≤6 words,
140
+ no generic words like "New" or "Request". Set it once; omit vesselTitle on later plan() calls.`;
@@ -0,0 +1,21 @@
1
+ /**
2
+ * ★ EDIT THIS FILE ★
3
+ *
4
+ * This is the ONE place the engine learns your domain. Everything else in this
5
+ * template is Vessels-generic — the tool loop, the message protocol, the store.
6
+ * Describe WHO your agent is and WHAT it does, in a few plain sentences. Do not
7
+ * re-explain how to talk to Vessels here (that's `protocol.ts`, appended for you).
8
+ *
9
+ * Whatever you write becomes the top of the system prompt, ahead of the Vessels
10
+ * protocol. Keep it tight — a paragraph, not a manual. The same template that runs
11
+ * a booking manager runs a legal analyst or a stock-desk agent; only this string
12
+ * and `tools.ts` change.
13
+ *
14
+ * Examples of the SHAPE (replace entirely — these are not your agent):
15
+ * "You are Atlas, a support-triage agent for an analytics SaaS. You read incoming
16
+ * tickets, pull account context, and either resolve them or escalate to a human."
17
+ * "You are a contracts analyst. You review redline requests, check them against
18
+ * our standard terms, and surface anything that needs a lawyer's sign-off."
19
+ */
20
+
21
+ export const ROLE = `TODO: Describe your agent. You are <name>, a <role> that <does what> on your operator's behalf. Add any standing rules it should always follow (tone, what it may decide alone vs. must hand back for approval, hard constraints).`;
@@ -0,0 +1,135 @@
1
+ /**
2
+ * THE STORE SEAM — your agent's runtime state, on YOUR infrastructure.
3
+ *
4
+ * Two things live here, and both are facets of "the agent owns its own runtime":
5
+ * 1. Conversation state — the real Anthropic message history per vessel (including
6
+ * tool_use / tool_result blocks). This is your agent's memory. Vessels is NOT
7
+ * your memory; it only shows the human what happened.
8
+ * 2. A per-vessel lock — "don't run two turns for one vessel at once." That's a
9
+ * property of YOUR deployment, so it lives here too, never in Vessels.
10
+ *
11
+ * Durability is an UPGRADE, not a prerequisite:
12
+ * • MemoryStore (default) — zero infra. Correct for a single long-lived process.
13
+ * State lives in RAM and resets on restart; the lock is an in-process mutex.
14
+ * • PostgresStore — set DATABASE_URL and you get durable state + a cross-process
15
+ * lock. It self-provisions (CREATE TABLE IF NOT EXISTS on init) — no migration
16
+ * to run. Horizontally scaled? This is the lock that keeps turns serialised.
17
+ *
18
+ * Swap in Redis/Dynamo/your-DB by implementing the same `AgentStore` interface.
19
+ */
20
+ import type { MessageParam } from '@anthropic-ai/sdk/resources/messages';
21
+
22
+ export interface AgentStore {
23
+ /** The agent's conversation history for this vessel (empty array if new). */
24
+ loadState(vessel: string): Promise<MessageParam[]>;
25
+ /** Persist the full conversation history for this vessel. */
26
+ saveState(vessel: string, messages: MessageParam[]): Promise<void>;
27
+ /** Try to take the per-vessel lock. Returns false if someone else holds it. TTL bounds a crash. */
28
+ acquireLock(vessel: string, ttlSeconds: number): Promise<boolean>;
29
+ /** Release the per-vessel lock. */
30
+ releaseLock(vessel: string): Promise<void>;
31
+ /** Optional one-time setup (e.g. create tables). Called once at boot. */
32
+ init?(): Promise<void>;
33
+ }
34
+
35
+ // ─── MemoryStore — the zero-infra default ──────────────────────────────────────
36
+
37
+ export class MemoryStore implements AgentStore {
38
+ private state = new Map<string, MessageParam[]>();
39
+ private locks = new Map<string, number>(); // vessel → expiry (ms epoch)
40
+
41
+ async loadState(vessel: string): Promise<MessageParam[]> {
42
+ return this.state.get(vessel) ?? [];
43
+ }
44
+
45
+ async saveState(vessel: string, messages: MessageParam[]): Promise<void> {
46
+ this.state.set(vessel, messages);
47
+ }
48
+
49
+ async acquireLock(vessel: string, ttlSeconds: number): Promise<boolean> {
50
+ const now = Date.now();
51
+ const until = this.locks.get(vessel);
52
+ if (until && until > now) return false; // still held
53
+ this.locks.set(vessel, now + ttlSeconds * 1000);
54
+ return true;
55
+ }
56
+
57
+ async releaseLock(vessel: string): Promise<void> {
58
+ this.locks.delete(vessel);
59
+ }
60
+ }
61
+
62
+ // ─── PostgresStore — durable, self-provisioning ─────────────────────────────────
63
+
64
+ export class PostgresStore implements AgentStore {
65
+ // `pg` is imported lazily so MemoryStore users don't need a database driver loaded.
66
+ private pool: import('pg').Pool | null = null;
67
+ constructor(private readonly connectionString: string) {}
68
+
69
+ async init(): Promise<void> {
70
+ const { default: pg } = await import('pg');
71
+ this.pool = new pg.Pool({ connectionString: this.connectionString });
72
+ await this.pool.query(`
73
+ CREATE TABLE IF NOT EXISTS agent_state (
74
+ vessel TEXT PRIMARY KEY,
75
+ messages JSONB NOT NULL,
76
+ updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
77
+ );
78
+ CREATE TABLE IF NOT EXISTS agent_locks (
79
+ vessel TEXT PRIMARY KEY,
80
+ expires_at TIMESTAMPTZ NOT NULL
81
+ );
82
+ `);
83
+ }
84
+
85
+ private get db(): import('pg').Pool {
86
+ if (!this.pool) throw new Error('PostgresStore not initialised — call init() at boot');
87
+ return this.pool;
88
+ }
89
+
90
+ async loadState(vessel: string): Promise<MessageParam[]> {
91
+ const { rows } = await this.db.query<{ messages: MessageParam[] }>(
92
+ 'SELECT messages FROM agent_state WHERE vessel = $1',
93
+ [vessel]
94
+ );
95
+ return rows[0]?.messages ?? [];
96
+ }
97
+
98
+ async saveState(vessel: string, messages: MessageParam[]): Promise<void> {
99
+ await this.db.query(
100
+ `INSERT INTO agent_state (vessel, messages, updated_at)
101
+ VALUES ($1, $2, now())
102
+ ON CONFLICT (vessel) DO UPDATE SET messages = EXCLUDED.messages, updated_at = now()`,
103
+ [vessel, JSON.stringify(messages)]
104
+ );
105
+ }
106
+
107
+ async acquireLock(vessel: string, ttlSeconds: number): Promise<boolean> {
108
+ // Atomic: take the row if free, OR steal it if the prior holder's TTL has lapsed.
109
+ const { rows } = await this.db.query(
110
+ `INSERT INTO agent_locks (vessel, expires_at)
111
+ VALUES ($1, now() + make_interval(secs => $2))
112
+ ON CONFLICT (vessel) DO UPDATE
113
+ SET expires_at = EXCLUDED.expires_at
114
+ WHERE agent_locks.expires_at < now()
115
+ RETURNING vessel`,
116
+ [vessel, ttlSeconds]
117
+ );
118
+ return rows.length > 0;
119
+ }
120
+
121
+ async releaseLock(vessel: string): Promise<void> {
122
+ await this.db.query('DELETE FROM agent_locks WHERE vessel = $1', [vessel]);
123
+ }
124
+ }
125
+
126
+ /**
127
+ * Pick a store from the environment: PostgresStore when DATABASE_URL is set, else the
128
+ * in-memory default. Calls init() once. Replace this with your own wiring as you grow.
129
+ */
130
+ export async function createStore(): Promise<AgentStore> {
131
+ const url = process.env.DATABASE_URL;
132
+ const store: AgentStore = url ? new PostgresStore(url) : new MemoryStore();
133
+ await store.init?.();
134
+ return store;
135
+ }
@@ -0,0 +1,90 @@
1
+ /**
2
+ * ★ EDIT THIS FILE ★
3
+ *
4
+ * Your agent's BACKEND tools — the things it actually does in YOUR system (look
5
+ * something up, charge a card, file a ticket, run a query). The engine wires these
6
+ * into the Claude tool loop next to the built-in Vessels control tools (quick_reply,
7
+ * plan, request_approval, …). When the model calls one, the engine runs your
8
+ * `handler`, feeds the result back to the model, and (optionally) ticks the live
9
+ * working card via `narrate`.
10
+ *
11
+ * THE STATE BOUNDARY: these handlers run against YOUR backend and YOUR data. Vessels
12
+ * never sees your business data and is never your agent's memory — it only carries
13
+ * the human-facing messages the engine sends. A handler returns plain data; the model
14
+ * decides what (if anything) to surface to the operator.
15
+ *
16
+ * You do NOT need to add a `task` field to your schema — the engine injects it into
17
+ * every tool automatically so the model can tick the plan in the same call. Your
18
+ * handler receives the model's input WITHOUT `task`.
19
+ */
20
+ import type { AgentActivityType } from 'vessels-sdk';
21
+ import type { Tool } from '@anthropic-ai/sdk/resources/messages';
22
+
23
+ export interface BackendTool {
24
+ /** The Anthropic tool definition the model sees (name, description, input_schema). */
25
+ definition: Tool;
26
+ /** Run the tool against your backend. Return any JSON-serialisable result. */
27
+ handler: (input: Record<string, unknown>) => Promise<unknown> | unknown;
28
+ /**
29
+ * Optional: turn a call into a one-line working-card step (an icon + label the
30
+ * operator sees tick by under the current task). Omit and the engine derives a
31
+ * label from the tool name. `type` drives the icon (searching, processing, …).
32
+ */
33
+ narrate?: (
34
+ input: Record<string, unknown>,
35
+ result: unknown
36
+ ) => { type: AgentActivityType; label: string };
37
+ }
38
+
39
+ /**
40
+ * Your tools. Replace these two stubs with real ones. Keep them small and composable —
41
+ * the model calls several per turn. Anything that can fail should return a result that
42
+ * SAYS it failed (e.g. `{ ok: false, reason: '…' }`) rather than throwing, so the model
43
+ * can react and tell the operator.
44
+ */
45
+ export const BACKEND_TOOLS: BackendTool[] = [
46
+ {
47
+ // ── TODO: replace with a real read from your system ───────────────────────
48
+ definition: {
49
+ name: 'lookup_record',
50
+ description:
51
+ 'TODO: Look something up in your backend by id/name. Describe exactly what it returns so the model uses it well.',
52
+ input_schema: {
53
+ type: 'object',
54
+ properties: {
55
+ query: { type: 'string', description: 'What to look up' },
56
+ },
57
+ required: ['query'],
58
+ },
59
+ },
60
+ handler: async (input) => {
61
+ // TODO: call your API / DB here and return the real record.
62
+ throw new Error(
63
+ `lookup_record is a stub — implement it in tools.ts (query=${String(input.query)})`
64
+ );
65
+ },
66
+ narrate: (input) => ({ type: 'searching', label: `Looked up ${String(input.query ?? '')}`.trim() }),
67
+ },
68
+ {
69
+ // ── TODO: replace with a real action in your system ───────────────────────
70
+ definition: {
71
+ name: 'perform_action',
72
+ description:
73
+ 'TODO: Do something in your backend (create/update/send). Describe its effect and what it returns.',
74
+ input_schema: {
75
+ type: 'object',
76
+ properties: {
77
+ action: { type: 'string', description: 'What to do' },
78
+ detail: { type: 'string', description: 'Any relevant detail / args' },
79
+ },
80
+ required: ['action'],
81
+ },
82
+ },
83
+ handler: async (input) => {
84
+ // TODO: perform the real action and return its outcome.
85
+ throw new Error(
86
+ `perform_action is a stub — implement it in tools.ts (action=${String(input.action)})`
87
+ );
88
+ },
89
+ },
90
+ ];