@damn-dev/cli 0.10.0 → 0.13.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@damn-dev/cli",
3
- "version": "0.10.0",
3
+ "version": "0.13.3",
4
4
  "description": "damn.dev — self-hosted workspace OS for human + AI agent collaboration.",
5
5
  "license": "Apache-2.0",
6
6
  "homepage": "https://damn.dev",
@@ -0,0 +1,238 @@
1
+ ---
2
+ name: "browser-builtin"
3
+ description: "Browse and interact with live websites as a real user would — reading JS-rendered pages, dashboards behind login, forms, interactive apps, and visually-rich content. Reach for it when `web_extract` returns the loading skeleton, when a site is behind Cloudflare/DataDome, when the user asks about visual design, or when an action has to be taken on the page (click, type, submit)."
4
+ version: "1.3.4"
5
+ required_secrets: []
6
+ required_tools: []
7
+ tools:
8
+ - name: browser_navigate
9
+ description: "Open a URL in the agent's browser. ALWAYS the first call in any browser task — every other browser_* tool acts on the currently-loaded page. MUST pass `url` as an absolute http(s) URL — the tool has NO default and cannot infer the URL from conversation context on its own; YOU (the agent) must derive it from the user's intent and pass it explicitly. Intent-to-URL mapping: (a) Full URL given ('https://apple.com') → pass verbatim. (b) Bare domain ('apple.com', 'duckduckgo.com') → prepend 'https://'. (c) Brand / site name ('Twitter', 'Hacker News', 'GitHub') → infer the canonical domain (x.com, news.ycombinator.com, github.com) and navigate there — trust common knowledge. (d) Search intent with no site ('find articles about X', 'look up Y') → navigate to a search engine (https://duckduckgo.com) first, then use browser_type to enter the query. (e) Context reference ('go back', 'reopen that page') → use browser_back/forward instead; don't re-navigate from scratch. If genuinely ambiguous (multiple plausible domains for the user's request), ask ONE concise clarifying question before navigating — don't ask for what you can reasonably infer. Returns {urlRequested, urlBefore, urlAfter, title, status, challenge, navigated, at}. Use `urlBefore`/`urlAfter` to verify navigation actually happened — if they match, nothing landed. If `challenge` is non-null (cloudflare-turnstile, recaptcha, hcaptcha, cloudflare-managed, datadome, perimeterx), the page is gated behind anti-bot; DO NOT assume content loaded. Try an RSS feed, a cached mirror, or tell the user — don't loop retrying."
10
+ parameters:
11
+ - name: url
12
+ type: string
13
+ description: "Absolute http(s) URL with scheme (e.g. https://example.com). REQUIRED — omitting returns 'invalid_url: Missing required parameter: url'. If the user gives a bare domain or brand name, derive the canonical URL and prepend 'https://' before calling."
14
+ required: true
15
+ - name: waitUntil
16
+ type: string
17
+ description: "When navigation is 'complete': 'load' | 'domcontentloaded' (default) | 'networkidle' | 'commit'. Use 'networkidle' for heavy SPAs that keep fetching after first paint."
18
+ required: false
19
+ - name: timeoutMs
20
+ type: number
21
+ description: Navigation timeout in ms. Default 30000.
22
+ required: false
23
+ endpoint: inprocess://browser-builtin/browser_navigate
24
+ method: POST
25
+ requires_approval: true
26
+
27
+ - name: browser_snapshot
28
+ description: "Read the current page as a structured accessibility tree — headings, links, buttons, inputs, table cells, paragraphs — in ~10KB of model-friendly text. Use when the user's intent is textual: reading, summarizing, extracting, listing, or finding information. Also the prerequisite for browser_click / browser_type / browser_extract — they need refs from a prior snapshot. Snapshot returns WORDS; it does not return pixels. If the user wants to see the page, use browser_vision instead."
29
+ parameters: []
30
+ endpoint: inprocess://browser-builtin/browser_snapshot
31
+ method: POST
32
+ requires_approval: true
33
+
34
+ - name: browser_vision
35
+ description: "Screenshot the current page. Call this whenever the user wants to SEE the page — asking for a visual, a screenshot, a rendering, wanting to look at the design, asking how something appears, or any request whose answer is pixels rather than text. Also call it when the user's question is visual in nature (layout, design, color, typography, imagery, brand feel, visual hierarchy, chart/canvas/image content) even if not phrased as 'show me'. Set `inline: true` to route the screenshot into the agent's context as a native vision input — this is the ONLY way the agent can actually see the image. Set `fullPage: true` to capture the entire scrollable document. CRITICAL: vision is NOT a fallback for snapshot, and snapshot is NOT a substitute for vision. Snapshot answers 'what text is on the page'; vision answers 'what does the page look like'. If the user asks to see the page, don't describe it from snapshot DOM text — that's guessing dressed up as observation. Actually look."
36
+ parameters:
37
+ - name: inline
38
+ type: boolean
39
+ description: "Routes the screenshot into the agent's context as a native vision input — the ONLY way the agent actually SEES the image. DEFAULT TRUE. Set to false ONLY when you need a file on disk for a human to open later AND the agent itself does NOT need to analyze the pixels. For any request where the agent needs to describe visual content (layout, design, colors, imagery), leave this as the default or pass true explicitly."
40
+ required: false
41
+ - name: fullPage
42
+ type: boolean
43
+ description: Capture the entire scrollable document (true) or just the current viewport (false, default). Use true for "describe the whole page" and false for "what's above the fold".
44
+ required: false
45
+ endpoint: inprocess://browser-builtin/browser_vision
46
+ method: POST
47
+ requires_approval: true
48
+
49
+ - name: browser_extract
50
+ description: "Pull ONE specific field or attribute from the page via CSS selector. Use AFTER browser_snapshot has shown you the DOM shape, when you know exactly what you want and don't need the whole tree. Examples: `h1` for title, `[data-testid=\"price\"]` for a price, `a.next[href]` for a pagination URL. Faster than another snapshot when fishing for a single value."
51
+ parameters:
52
+ - name: selector
53
+ type: string
54
+ description: "CSS selector. Examples: 'h1', '.price', 'a[href*=\"pricing\"]', '[data-testid=\"hero-title\"]'."
55
+ required: true
56
+ - name: attr
57
+ type: string
58
+ description: If set, return this attribute instead of text content (e.g. "href", "src", "data-id", "aria-label").
59
+ required: false
60
+ - name: all
61
+ type: boolean
62
+ description: If true, return up to 100 matches (useful for lists, link collections, row scrapes). Default false returns just the first match.
63
+ required: false
64
+ endpoint: inprocess://browser-builtin/browser_extract
65
+ method: POST
66
+ requires_approval: true
67
+
68
+ - name: browser_console
69
+ description: "Read the browser's JavaScript console (last 100 entries). Use when a page appears broken, an API call is failing silently, a form won't submit, the user asks 'why isn't this working', or you're debugging a JS-heavy SPA. Also useful after browser_click/type to catch client-side errors triggered by the action."
70
+ parameters:
71
+ - name: level
72
+ type: string
73
+ description: Filter to one severity (log | warn | error | info | debug). Omit for all.
74
+ required: false
75
+ endpoint: inprocess://browser-builtin/browser_console
76
+ method: POST
77
+ requires_approval: true
78
+
79
+ - name: browser_click
80
+ description: "Click a button, link, or interactive element. Use when the user's goal requires ACTION on the page: submit, open, dismiss, add to cart, expand, filter, navigate via UI. Pass the CSS `ref` from a recent browser_snapshot — snapshot first so you know what's clickable. Interactive — requires approval. INVOKE THIS TOOL to perform the click — narration does not. Never claim you clicked without a browser_click tool_result visible in this turn. Returns {clicked, urlBefore, urlAfter, navigated, at}. Use `navigated` (true when `urlBefore !== urlAfter`) to distinguish a link that changed the page from an in-page control (dropdown, toggle)."
81
+ parameters:
82
+ - name: ref
83
+ type: string
84
+ description: CSS selector (typically copied from a browser_snapshot node's `ref` field).
85
+ required: true
86
+ - name: timeoutMs
87
+ type: number
88
+ description: Wait up to this long for the element to be clickable. Default 30000.
89
+ required: false
90
+ endpoint: inprocess://browser-builtin/browser_click
91
+ method: POST
92
+ requires_approval: true
93
+
94
+ - name: browser_type
95
+ description: "Fill an input or textarea with text. Use when the user needs data entered: search query, message body, login credentials (only if profile isn't already logged in), form fields, filters. Pass `submit: true` to press Enter after filling — that's the shortcut for submitting most forms and triggering search. Interactive — requires approval. INVOKE THIS TOOL to perform the typing — narration does not. Never claim you typed without a browser_type tool_result visible in this turn. Returns {ref, text, resultValue, submitted, urlBefore, urlAfter, navigated, at}. ALWAYS check `resultValue === text` to verify the write actually landed — React-controlled inputs can silently reject `.fill()` and return an empty `resultValue`. If they differ, report that honestly to the user rather than claiming success."
96
+ parameters:
97
+ - name: ref
98
+ type: string
99
+ description: CSS selector for the input or textarea.
100
+ required: true
101
+ - name: text
102
+ type: string
103
+ description: Text to fill. Replaces any existing value.
104
+ required: true
105
+ - name: submit
106
+ type: boolean
107
+ description: If true, press Enter after filling (submits most forms). Default false.
108
+ required: false
109
+ - name: timeoutMs
110
+ type: number
111
+ description: Fill timeout in ms. Default 30000.
112
+ required: false
113
+ endpoint: inprocess://browser-builtin/browser_type
114
+ method: POST
115
+ requires_approval: true
116
+
117
+ - name: browser_scroll
118
+ description: "Scroll the page. Use when content you need is below the fold, when snapshot returned less than expected, or when a page uses infinite scroll and more items load on scroll. Call browser_snapshot again after scrolling to see the newly-revealed content. INVOKE THIS TOOL to scroll — narration does not. Returns {direction, amount, scrollYBefore, scrollYAfter, moved, at}. Check `moved` (true when `scrollYBefore !== scrollYAfter`) to confirm the scroll landed — some pages trap scroll inside a modal or infinite container and the window itself doesn't move."
119
+ parameters:
120
+ - name: direction
121
+ type: string
122
+ description: "'down' (default) | 'up' | 'top' | 'bottom'."
123
+ required: false
124
+ - name: amount
125
+ type: number
126
+ description: Pixels to scroll for up/down (ignored for top/bottom). Default 500.
127
+ required: false
128
+ endpoint: inprocess://browser-builtin/browser_scroll
129
+ method: POST
130
+ requires_approval: true
131
+
132
+ - name: browser_press
133
+ description: "Send a single keystroke or key chord to the focused element. Use for dismissing modals (`Escape`), navigating menus (`ArrowDown`, `Enter`), keyboard shortcuts (`Control+F`, `Control+A`), or tabbing between inputs (`Tab`). Interactive — requires approval. INVOKE THIS TOOL to press the key — narration does not. Never claim you pressed without a browser_press tool_result visible in this turn. Returns {key, urlBefore, urlAfter, navigated, at}. Use `navigated` to detect if the key triggered navigation (Enter submitting a form, for example)."
134
+ parameters:
135
+ - name: key
136
+ type: string
137
+ description: "Key name or chord. Examples: 'Enter', 'Escape', 'Tab', 'ArrowDown', 'Control+A', 'Meta+K'."
138
+ required: true
139
+ endpoint: inprocess://browser-builtin/browser_press
140
+ method: POST
141
+ requires_approval: true
142
+
143
+ - name: browser_back
144
+ description: "Navigate back one step in the browser's history. Use to undo a navigation without losing the previous page's state (form values, scroll position), or when the user says 'go back', 'previous page', 'undo that nav'. INVOKE THIS TOOL to navigate back — narration does not. Returns {urlBefore, urlAfter, title, navigated, at}. Check `navigated` (true when `urlBefore !== urlAfter`) — if false, there was no prior entry in history and nothing changed."
145
+ parameters: []
146
+ endpoint: inprocess://browser-builtin/browser_back
147
+ method: POST
148
+ requires_approval: true
149
+
150
+ - name: browser_forward
151
+ description: "Navigate forward one step in history (re-enter a page after browser_back). Use when the user says 'go forward', 'next page', 'redo that'. INVOKE THIS TOOL to navigate forward — narration does not. Returns {urlBefore, urlAfter, title, navigated, at}. Check `navigated` (true when `urlBefore !== urlAfter`) — if false, there was no forward entry in history."
152
+ parameters: []
153
+ endpoint: inprocess://browser-builtin/browser_forward
154
+ method: POST
155
+ requires_approval: true
156
+
157
+ - name: browser_cdp
158
+ description: "Escape hatch: send a raw Firefox DevTools Protocol command. Use ONLY when the other browser_* tools genuinely cannot do what's needed — network interception, custom headers, cookie manipulation, full-control screenshots. 99% of tasks are covered by the other tools. Always requires approval. INVOKE THIS TOOL to send the CDP command — narration does not. Never claim you sent a CDP command without a browser_cdp tool_result visible in this turn. Returns {method, params, result, urlBefore, urlAfter, at}. The `result` field is the raw CDP response — structure varies by method."
159
+ parameters:
160
+ - name: method
161
+ type: string
162
+ description: "CDP method (e.g. 'Network.setUserAgentOverride', 'Page.captureScreenshot', 'Network.setCookie')."
163
+ required: true
164
+ - name: params
165
+ type: object
166
+ description: Method-specific params object.
167
+ required: false
168
+ endpoint: inprocess://browser-builtin/browser_cdp
169
+ method: POST
170
+ requires_approval: true
171
+ ---
172
+
173
+ # browser-builtin
174
+
175
+ A real Firefox with anti-detection (camoufox) for when the user's request needs a live webpage — reading it, looking at it, or doing something on it.
176
+
177
+ ## Decide by intent, not by tool name
178
+
179
+ The user will describe what they need in plain language. Match their intent to the right first tool:
180
+
181
+ | What the user wants | Start with |
182
+ |-------------------------------------------------------------------------|--------------------------------------------|
183
+ | **Find** a page, product, article, or company | `web_search` (not this skill) |
184
+ | Read a **static article** (Wikipedia, blog posts, news) | `web_extract` (not this skill) |
185
+ | Read or summarize a **JS-heavy** page, SPA, dashboard, or logged-in UI | `browser_navigate` → `browser_snapshot` |
186
+ | Describe the **visual design / layout / look** of a page | `browser_navigate` → `browser_vision({inline: true})` |
187
+ | Get **one specific field** from a known DOM (price, headline, link) | `browser_navigate` → `browser_snapshot` → `browser_extract` |
188
+ | **Click / fill / submit** something on a page | `browser_navigate` → `browser_snapshot` → `browser_click` / `browser_type` |
189
+ | Page **won't load** or something is broken | `browser_navigate` → `browser_console` |
190
+ | Compare two pages **visually** | `browser_navigate` → `browser_vision` (for each) |
191
+ | `web_extract` returned a **loading skeleton** or challenge page | `browser_navigate` (this skill) |
192
+
193
+ **Do not use the browser as a default.** It's slower than `web_search` / `web_extract` and uses ~150MB per agent. Reach for it when a simpler tool has failed OR when the task is inherently interactive/visual.
194
+
195
+ ## Snapshot vs. Vision — the decision that matters most
196
+
197
+ Snapshot and vision are NOT redundant. They answer different questions:
198
+ - **Snapshot** tells you what text is on the page (headings, links, labels, prices, articles, form fields).
199
+ - **Vision** shows you what the page actually looks like (layout, colors, imagery, visual hierarchy, brand feel).
200
+
201
+ **Use `browser_snapshot`** when the user's intent is textual: reading, summarizing, listing, extracting, finding information. Anything a screen reader could answer.
202
+
203
+ **Use `browser_vision({inline: true})`** when the user wants a visual answer — wants to see the page, asks how it looks, asks about design/layout/color/imagery, wants you to compare pages visually, or when the information is inherently pictorial (charts, diagrams, canvas/WebGL, image-heavy marketing). Also use it when snapshot returned suspiciously little on a rich-looking page.
204
+
205
+ **When the user wants to SEE, call vision — even after snapshot already ran.** Visual intent is not satisfied by describing pixels from DOM text. That's guessing dressed up as observation. If you already have snapshot data and the user then asks to see the page, `browser_vision` is the next call — not another snapshot and not a prose description.
206
+
207
+ **Do NOT call vision when the user's intent is text.** "Screenshot this and tell me the article content" is a text intent — use snapshot. Vision costs more tokens and takes longer; save it for when pixels actually matter.
208
+
209
+ ## The minimal flow
210
+
211
+ 1. `browser_navigate({url})` — inspect the `challenge` field. If non-null, STOP: anti-bot is in play. Don't proceed as if the page loaded.
212
+ 2. Pick your second tool by intent (see table above). Usually `browser_snapshot`.
213
+ 3. Take action if needed: `browser_extract` for a specific value, `browser_vision` for a visual answer, `browser_click` / `browser_type` for interaction.
214
+ 4. If something looks wrong, `browser_console` to see JS errors.
215
+
216
+ ## Session persistence
217
+
218
+ Each agent has its own private Firefox profile at `~/.damn-dev/browser-profiles/{agentId}/`. Cookies, localStorage, and session data survive across backend restarts:
219
+
220
+ - Log in once; subsequent `browser_navigate` calls start logged in.
221
+ - Profile is wiped when the agent is deleted.
222
+ - Workspace admin can reset a profile from Settings → Browser.
223
+
224
+ ## Domain policy
225
+
226
+ The workspace may have an allowlist or denylist. Blocked navigations return `{ok: false, code: 'blocked_host'}`. Policy lives in Settings → Browser.
227
+
228
+ ## When things go wrong
229
+
230
+ - **`challenge` non-null** after navigate → anti-bot active. Try an RSS feed, an archive mirror (archive.org), or tell the user. Don't loop retrying — fingerprinting gets worse each attempt.
231
+ - **Snapshot looks sparse** on a visibly rich page → likely canvas, WebGL, or render-gated. Try `browser_scroll` + snapshot again, or fall through to `browser_vision`.
232
+ - **`capability_not_ready` error** → the 363MB Firefox artifact isn't downloaded on this host. User fixes it at Settings → Browser → Download now.
233
+ - **Agent says "the screenshot is too large"** → don't read the saved file. Pass `inline: true` to browser_vision; the screenshot is routed as a native vision content block (~1600 tokens), not stuffed into text.
234
+ - **`blocked_host` error** → workspace policy denies this domain. Ask the user if they want to add it to the allowlist.
235
+
236
+ ## Escape hatch
237
+
238
+ `browser_cdp` exposes the Firefox DevTools Protocol directly. Use ONLY when the other browser_* tools genuinely can't do the job (network interception, custom headers, cookie manipulation). Always approval-gated.
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: "obsidian-builtin"
3
3
  description: "Read, search, and write notes in your Obsidian vault via plain file operations on the mounted vault directory."
4
- version: "1.0.1"
4
+ version: "1.0.4"
5
5
  required_secrets: []
6
6
  ---
7
7
 
@@ -17,14 +17,21 @@ If `$OBSIDIAN_VAULT_PATH` is not set, check `/host/obsidian` (sandbox mount fall
17
17
 
18
18
  [ -d /host/obsidian ] && VAULT=/host/obsidian
19
19
 
20
- Notes are plain Markdown files. Use shell via `shell_exec_guarded`:
20
+ Notes are plain Markdown files. Dates come from the filesystem `mtime` (updated
21
+ automatically on every save) — no need for dates in filenames or frontmatter.
22
+ Run commands via the `shell-exec` skill:
21
23
  - **Search filenames:** `find "$VAULT" -name '*.md' -type f | head -20`
22
24
  - **Search content:** `grep -rl 'query' "$VAULT" --include='*.md'`
23
25
  - **Read:** `cat "$VAULT/path/to/note.md"`
24
- - **List recent:** `find "$VAULT" -name '*.md' -type f -mtime -7 | head -20`
26
+ - **List recent (filenames only):** `find "$VAULT" -name '*.md' -type f -mtime -7 | head -20`
27
+ - **List recent with dates (preferred for "most recent N"):**
28
+ `find "$VAULT" -name '*.md' -type f -exec stat -f '%m %N' {} + | sort -rn | head -5`
29
+ — `%m` is epoch mtime, `%N` is the path; no `2>/dev/null` (keeps classification safe).
25
30
  - **Create:** write a new `.md` file. Obsidian picks it up automatically.
26
31
 
27
32
  Always quote `"$VAULT"` in every command — vault paths commonly contain spaces.
33
+ Avoid `2>/dev/null` and other stderr redirects when not needed — they add noise
34
+ and the classifier will ask for approval unnecessarily.
28
35
 
29
36
  ## Advanced — obsidian-cli (host-only)
30
37