npm - @damn-dev/cli - Versions diffs - 0.10.0 → 0.13.3 - Mend

@damn-dev/cli 0.10.0 → 0.13.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@damn-dev/cli",
-  "version": "0.10.0",
+  "version": "0.13.3",
   "description": "damn.dev — self-hosted workspace OS for human + AI agent collaboration.",
   "license": "Apache-2.0",
   "homepage": "https://damn.dev",

package/runtime/apps/backend/dist/resources/skills/browser-builtin/SKILL.md ADDED Viewed

@@ -0,0 +1,238 @@
+---
+name: "browser-builtin"
+description: "Browse and interact with live websites as a real user would — reading JS-rendered pages, dashboards behind login, forms, interactive apps, and visually-rich content. Reach for it when `web_extract` returns the loading skeleton, when a site is behind Cloudflare/DataDome, when the user asks about visual design, or when an action has to be taken on the page (click, type, submit)."
+version: "1.3.4"
+required_secrets: []
+required_tools: []
+tools:
+  - name: browser_navigate
+    description: "Open a URL in the agent's browser. ALWAYS the first call in any browser task — every other browser_* tool acts on the currently-loaded page. MUST pass `url` as an absolute http(s) URL — the tool has NO default and cannot infer the URL from conversation context on its own; YOU (the agent) must derive it from the user's intent and pass it explicitly. Intent-to-URL mapping: (a) Full URL given ('https://apple.com') → pass verbatim. (b) Bare domain ('apple.com', 'duckduckgo.com') → prepend 'https://'. (c) Brand / site name ('Twitter', 'Hacker News', 'GitHub') → infer the canonical domain (x.com, news.ycombinator.com, github.com) and navigate there — trust common knowledge. (d) Search intent with no site ('find articles about X', 'look up Y') → navigate to a search engine (https://duckduckgo.com) first, then use browser_type to enter the query. (e) Context reference ('go back', 'reopen that page') → use browser_back/forward instead; don't re-navigate from scratch. If genuinely ambiguous (multiple plausible domains for the user's request), ask ONE concise clarifying question before navigating — don't ask for what you can reasonably infer. Returns {urlRequested, urlBefore, urlAfter, title, status, challenge, navigated, at}. Use `urlBefore`/`urlAfter` to verify navigation actually happened — if they match, nothing landed. If `challenge` is non-null (cloudflare-turnstile, recaptcha, hcaptcha, cloudflare-managed, datadome, perimeterx), the page is gated behind anti-bot; DO NOT assume content loaded. Try an RSS feed, a cached mirror, or tell the user — don't loop retrying."
+    parameters:
+      - name: url
+        type: string
+        description: "Absolute http(s) URL with scheme (e.g. https://example.com). REQUIRED — omitting returns 'invalid_url: Missing required parameter: url'. If the user gives a bare domain or brand name, derive the canonical URL and prepend 'https://' before calling."
+        required: true
+      - name: waitUntil
+        type: string
+        description: "When navigation is 'complete': 'load' | 'domcontentloaded' (default) | 'networkidle' | 'commit'. Use 'networkidle' for heavy SPAs that keep fetching after first paint."
+        required: false
+      - name: timeoutMs
+        type: number
+        description: Navigation timeout in ms. Default 30000.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_navigate
+    method: POST
+    requires_approval: true
+  - name: browser_snapshot
+    description: "Read the current page as a structured accessibility tree — headings, links, buttons, inputs, table cells, paragraphs — in ~10KB of model-friendly text. Use when the user's intent is textual: reading, summarizing, extracting, listing, or finding information. Also the prerequisite for browser_click / browser_type / browser_extract — they need refs from a prior snapshot. Snapshot returns WORDS; it does not return pixels. If the user wants to see the page, use browser_vision instead."
+    parameters: []
+    endpoint: inprocess://browser-builtin/browser_snapshot
+    method: POST
+    requires_approval: true
+  - name: browser_vision
+    description: "Screenshot the current page. Call this whenever the user wants to SEE the page — asking for a visual, a screenshot, a rendering, wanting to look at the design, asking how something appears, or any request whose answer is pixels rather than text. Also call it when the user's question is visual in nature (layout, design, color, typography, imagery, brand feel, visual hierarchy, chart/canvas/image content) even if not phrased as 'show me'. Set `inline: true` to route the screenshot into the agent's context as a native vision input — this is the ONLY way the agent can actually see the image. Set `fullPage: true` to capture the entire scrollable document. CRITICAL: vision is NOT a fallback for snapshot, and snapshot is NOT a substitute for vision. Snapshot answers 'what text is on the page'; vision answers 'what does the page look like'. If the user asks to see the page, don't describe it from snapshot DOM text — that's guessing dressed up as observation. Actually look."
+    parameters:
+      - name: inline
+        type: boolean
+        description: "Routes the screenshot into the agent's context as a native vision input — the ONLY way the agent actually SEES the image. DEFAULT TRUE. Set to false ONLY when you need a file on disk for a human to open later AND the agent itself does NOT need to analyze the pixels. For any request where the agent needs to describe visual content (layout, design, colors, imagery), leave this as the default or pass true explicitly."
+        required: false
+      - name: fullPage
+        type: boolean
+        description: Capture the entire scrollable document (true) or just the current viewport (false, default). Use true for "describe the whole page" and false for "what's above the fold".
+        required: false
+    endpoint: inprocess://browser-builtin/browser_vision
+    method: POST
+    requires_approval: true
+  - name: browser_extract
+    description: "Pull ONE specific field or attribute from the page via CSS selector. Use AFTER browser_snapshot has shown you the DOM shape, when you know exactly what you want and don't need the whole tree. Examples: `h1` for title, `[data-testid=\"price\"]` for a price, `a.next[href]` for a pagination URL. Faster than another snapshot when fishing for a single value."
+    parameters:
+      - name: selector
+        type: string
+        description: "CSS selector. Examples: 'h1', '.price', 'a[href*=\"pricing\"]', '[data-testid=\"hero-title\"]'."
+        required: true
+      - name: attr
+        type: string
+        description: If set, return this attribute instead of text content (e.g. "href", "src", "data-id", "aria-label").
+        required: false
+      - name: all
+        type: boolean
+        description: If true, return up to 100 matches (useful for lists, link collections, row scrapes). Default false returns just the first match.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_extract
+    method: POST
+    requires_approval: true
+  - name: browser_console
+    description: "Read the browser's JavaScript console (last 100 entries). Use when a page appears broken, an API call is failing silently, a form won't submit, the user asks 'why isn't this working', or you're debugging a JS-heavy SPA. Also useful after browser_click/type to catch client-side errors triggered by the action."
+    parameters:
+      - name: level
+        type: string
+        description: Filter to one severity (log | warn | error | info | debug). Omit for all.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_console
+    method: POST
+    requires_approval: true
+  - name: browser_click
+    description: "Click a button, link, or interactive element. Use when the user's goal requires ACTION on the page: submit, open, dismiss, add to cart, expand, filter, navigate via UI. Pass the CSS `ref` from a recent browser_snapshot — snapshot first so you know what's clickable. Interactive — requires approval. INVOKE THIS TOOL to perform the click — narration does not. Never claim you clicked without a browser_click tool_result visible in this turn. Returns {clicked, urlBefore, urlAfter, navigated, at}. Use `navigated` (true when `urlBefore !== urlAfter`) to distinguish a link that changed the page from an in-page control (dropdown, toggle)."
+    parameters:
+      - name: ref
+        type: string
+        description: CSS selector (typically copied from a browser_snapshot node's `ref` field).
+        required: true
+      - name: timeoutMs
+        type: number
+        description: Wait up to this long for the element to be clickable. Default 30000.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_click
+    method: POST
+    requires_approval: true
+  - name: browser_type
+    description: "Fill an input or textarea with text. Use when the user needs data entered: search query, message body, login credentials (only if profile isn't already logged in), form fields, filters. Pass `submit: true` to press Enter after filling — that's the shortcut for submitting most forms and triggering search. Interactive — requires approval. INVOKE THIS TOOL to perform the typing — narration does not. Never claim you typed without a browser_type tool_result visible in this turn. Returns {ref, text, resultValue, submitted, urlBefore, urlAfter, navigated, at}. ALWAYS check `resultValue === text` to verify the write actually landed — React-controlled inputs can silently reject `.fill()` and return an empty `resultValue`. If they differ, report that honestly to the user rather than claiming success."
+    parameters:
+      - name: ref
+        type: string
+        description: CSS selector for the input or textarea.
+        required: true
+      - name: text
+        type: string
+        description: Text to fill. Replaces any existing value.
+        required: true
+      - name: submit
+        type: boolean
+        description: If true, press Enter after filling (submits most forms). Default false.
+        required: false
+      - name: timeoutMs
+        type: number
+        description: Fill timeout in ms. Default 30000.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_type
+    method: POST
+    requires_approval: true
+  - name: browser_scroll
+    description: "Scroll the page. Use when content you need is below the fold, when snapshot returned less than expected, or when a page uses infinite scroll and more items load on scroll. Call browser_snapshot again after scrolling to see the newly-revealed content. INVOKE THIS TOOL to scroll — narration does not. Returns {direction, amount, scrollYBefore, scrollYAfter, moved, at}. Check `moved` (true when `scrollYBefore !== scrollYAfter`) to confirm the scroll landed — some pages trap scroll inside a modal or infinite container and the window itself doesn't move."
+    parameters:
+      - name: direction
+        type: string
+        description: "'down' (default) | 'up' | 'top' | 'bottom'."
+        required: false
+      - name: amount
+        type: number
+        description: Pixels to scroll for up/down (ignored for top/bottom). Default 500.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_scroll
+    method: POST
+    requires_approval: true
+  - name: browser_press
+    description: "Send a single keystroke or key chord to the focused element. Use for dismissing modals (`Escape`), navigating menus (`ArrowDown`, `Enter`), keyboard shortcuts (`Control+F`, `Control+A`), or tabbing between inputs (`Tab`). Interactive — requires approval. INVOKE THIS TOOL to press the key — narration does not. Never claim you pressed without a browser_press tool_result visible in this turn. Returns {key, urlBefore, urlAfter, navigated, at}. Use `navigated` to detect if the key triggered navigation (Enter submitting a form, for example)."
+    parameters:
+      - name: key
+        type: string
+        description: "Key name or chord. Examples: 'Enter', 'Escape', 'Tab', 'ArrowDown', 'Control+A', 'Meta+K'."
+        required: true
+    endpoint: inprocess://browser-builtin/browser_press
+    method: POST
+    requires_approval: true
+  - name: browser_back
+    description: "Navigate back one step in the browser's history. Use to undo a navigation without losing the previous page's state (form values, scroll position), or when the user says 'go back', 'previous page', 'undo that nav'. INVOKE THIS TOOL to navigate back — narration does not. Returns {urlBefore, urlAfter, title, navigated, at}. Check `navigated` (true when `urlBefore !== urlAfter`) — if false, there was no prior entry in history and nothing changed."
+    parameters: []
+    endpoint: inprocess://browser-builtin/browser_back
+    method: POST
+    requires_approval: true
+  - name: browser_forward
+    description: "Navigate forward one step in history (re-enter a page after browser_back). Use when the user says 'go forward', 'next page', 'redo that'. INVOKE THIS TOOL to navigate forward — narration does not. Returns {urlBefore, urlAfter, title, navigated, at}. Check `navigated` (true when `urlBefore !== urlAfter`) — if false, there was no forward entry in history."
+    parameters: []
+    endpoint: inprocess://browser-builtin/browser_forward
+    method: POST
+    requires_approval: true
+  - name: browser_cdp
+    description: "Escape hatch: send a raw Firefox DevTools Protocol command. Use ONLY when the other browser_* tools genuinely cannot do what's needed — network interception, custom headers, cookie manipulation, full-control screenshots. 99% of tasks are covered by the other tools. Always requires approval. INVOKE THIS TOOL to send the CDP command — narration does not. Never claim you sent a CDP command without a browser_cdp tool_result visible in this turn. Returns {method, params, result, urlBefore, urlAfter, at}. The `result` field is the raw CDP response — structure varies by method."
+    parameters:
+      - name: method
+        type: string
+        description: "CDP method (e.g. 'Network.setUserAgentOverride', 'Page.captureScreenshot', 'Network.setCookie')."
+        required: true
+      - name: params
+        type: object
+        description: Method-specific params object.
+        required: false
+    endpoint: inprocess://browser-builtin/browser_cdp
+    method: POST
+    requires_approval: true
+---
+# browser-builtin
+A real Firefox with anti-detection (camoufox) for when the user's request needs a live webpage — reading it, looking at it, or doing something on it.
+## Decide by intent, not by tool name
+The user will describe what they need in plain language. Match their intent to the right first tool:
+| What the user wants                                                     | Start with                                 |
+|-------------------------------------------------------------------------|--------------------------------------------|
+| **Find** a page, product, article, or company                           | `web_search` (not this skill)              |
+| Read a **static article** (Wikipedia, blog posts, news)                 | `web_extract` (not this skill)             |
+| Read or summarize a **JS-heavy** page, SPA, dashboard, or logged-in UI  | `browser_navigate` → `browser_snapshot`    |
+| Describe the **visual design / layout / look** of a page                | `browser_navigate` → `browser_vision({inline: true})` |
+| Get **one specific field** from a known DOM (price, headline, link)     | `browser_navigate` → `browser_snapshot` → `browser_extract` |
+| **Click / fill / submit** something on a page                           | `browser_navigate` → `browser_snapshot` → `browser_click` / `browser_type` |
+| Page **won't load** or something is broken                              | `browser_navigate` → `browser_console`     |
+| Compare two pages **visually**                                          | `browser_navigate` → `browser_vision` (for each) |
+| `web_extract` returned a **loading skeleton** or challenge page         | `browser_navigate` (this skill)            |
+**Do not use the browser as a default.** It's slower than `web_search` / `web_extract` and uses ~150MB per agent. Reach for it when a simpler tool has failed OR when the task is inherently interactive/visual.
+## Snapshot vs. Vision — the decision that matters most
+Snapshot and vision are NOT redundant. They answer different questions:
+- **Snapshot** tells you what text is on the page (headings, links, labels, prices, articles, form fields).
+- **Vision** shows you what the page actually looks like (layout, colors, imagery, visual hierarchy, brand feel).
+**Use `browser_snapshot`** when the user's intent is textual: reading, summarizing, listing, extracting, finding information. Anything a screen reader could answer.
+**Use `browser_vision({inline: true})`** when the user wants a visual answer — wants to see the page, asks how it looks, asks about design/layout/color/imagery, wants you to compare pages visually, or when the information is inherently pictorial (charts, diagrams, canvas/WebGL, image-heavy marketing). Also use it when snapshot returned suspiciously little on a rich-looking page.
+**When the user wants to SEE, call vision — even after snapshot already ran.** Visual intent is not satisfied by describing pixels from DOM text. That's guessing dressed up as observation. If you already have snapshot data and the user then asks to see the page, `browser_vision` is the next call — not another snapshot and not a prose description.
+**Do NOT call vision when the user's intent is text.** "Screenshot this and tell me the article content" is a text intent — use snapshot. Vision costs more tokens and takes longer; save it for when pixels actually matter.
+## The minimal flow
+1. `browser_navigate({url})` — inspect the `challenge` field. If non-null, STOP: anti-bot is in play. Don't proceed as if the page loaded.
+2. Pick your second tool by intent (see table above). Usually `browser_snapshot`.
+3. Take action if needed: `browser_extract` for a specific value, `browser_vision` for a visual answer, `browser_click` / `browser_type` for interaction.
+4. If something looks wrong, `browser_console` to see JS errors.
+## Session persistence
+Each agent has its own private Firefox profile at `~/.damn-dev/browser-profiles/{agentId}/`. Cookies, localStorage, and session data survive across backend restarts:
+- Log in once; subsequent `browser_navigate` calls start logged in.
+- Profile is wiped when the agent is deleted.
+- Workspace admin can reset a profile from Settings → Browser.
+## Domain policy
+The workspace may have an allowlist or denylist. Blocked navigations return `{ok: false, code: 'blocked_host'}`. Policy lives in Settings → Browser.
+## When things go wrong
+- **`challenge` non-null** after navigate → anti-bot active. Try an RSS feed, an archive mirror (archive.org), or tell the user. Don't loop retrying — fingerprinting gets worse each attempt.
+- **Snapshot looks sparse** on a visibly rich page → likely canvas, WebGL, or render-gated. Try `browser_scroll` + snapshot again, or fall through to `browser_vision`.
+- **`capability_not_ready` error** → the 363MB Firefox artifact isn't downloaded on this host. User fixes it at Settings → Browser → Download now.
+- **Agent says "the screenshot is too large"** → don't read the saved file. Pass `inline: true` to browser_vision; the screenshot is routed as a native vision content block (~1600 tokens), not stuffed into text.
+- **`blocked_host` error** → workspace policy denies this domain. Ask the user if they want to add it to the allowlist.
+## Escape hatch
+`browser_cdp` exposes the Firefox DevTools Protocol directly. Use ONLY when the other browser_* tools genuinely can't do the job (network interception, custom headers, cookie manipulation). Always approval-gated.

package/runtime/apps/backend/dist/resources/skills/obsidian-builtin/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: "obsidian-builtin"
 description: "Read, search, and write notes in your Obsidian vault via plain file operations on the mounted vault directory."
-version: "1.0.1"
+version: "1.0.4"
 required_secrets: []
 ---
@@ -17,14 +17,21 @@ If `$OBSIDIAN_VAULT_PATH` is not set, check `/host/obsidian` (sandbox mount fall
     [ -d /host/obsidian ] && VAULT=/host/obsidian
-Notes are plain Markdown files. Use shell via `shell_exec_guarded`:
+Notes are plain Markdown files. Dates come from the filesystem `mtime` (updated
+automatically on every save) — no need for dates in filenames or frontmatter.
+Run commands via the `shell-exec` skill:
 - **Search filenames:** `find "$VAULT" -name '*.md' -type f | head -20`
 - **Search content:** `grep -rl 'query' "$VAULT" --include='*.md'`
 - **Read:** `cat "$VAULT/path/to/note.md"`
-- **List recent:** `find "$VAULT" -name '*.md' -type f -mtime -7 | head -20`
+- **List recent (filenames only):** `find "$VAULT" -name '*.md' -type f -mtime -7 | head -20`
+- **List recent with dates (preferred for "most recent N"):**
+  `find "$VAULT" -name '*.md' -type f -exec stat -f '%m %N' {} + | sort -rn | head -5`
+  — `%m` is epoch mtime, `%N` is the path; no `2>/dev/null` (keeps classification safe).
 - **Create:** write a new `.md` file. Obsidian picks it up automatically.
 Always quote `"$VAULT"` in every command — vault paths commonly contain spaces.
+Avoid `2>/dev/null` and other stderr redirects when not needed — they add noise
+and the classifier will ask for approval unnecessarily.
 ## Advanced — obsidian-cli (host-only)