@mindstudio-ai/remy 0.1.34 → 0.1.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/dist/headless.js +578 -393
  2. package/dist/index.js +652 -385
  3. package/dist/prompt/sources/llms.txt +1618 -0
  4. package/dist/prompt/static/instructions.md +1 -1
  5. package/dist/prompt/static/team.md +1 -1
  6. package/dist/subagents/.notes-background-agents.md +60 -48
  7. package/dist/subagents/browserAutomation/prompt.md +14 -11
  8. package/dist/subagents/designExpert/data/sources/dev/index.html +901 -0
  9. package/dist/subagents/designExpert/data/sources/dev/serve.mjs +244 -0
  10. package/dist/subagents/designExpert/data/sources/dev/specimens-fonts.html +126 -0
  11. package/dist/subagents/designExpert/data/sources/dev/specimens-pairings.html +114 -0
  12. package/dist/subagents/designExpert/data/{fonts.json → sources/fonts.json} +0 -97
  13. package/dist/subagents/designExpert/data/sources/inspiration.json +392 -0
  14. package/dist/subagents/designExpert/prompt.md +36 -12
  15. package/dist/subagents/designExpert/prompts/animation.md +14 -6
  16. package/dist/subagents/designExpert/prompts/color.md +25 -5
  17. package/dist/subagents/designExpert/prompts/{icons.md → components.md} +17 -5
  18. package/dist/subagents/designExpert/prompts/frontend-design-notes.md +17 -122
  19. package/dist/subagents/designExpert/prompts/identity.md +15 -61
  20. package/dist/subagents/designExpert/prompts/images.md +35 -10
  21. package/dist/subagents/designExpert/prompts/layout.md +14 -9
  22. package/dist/subagents/designExpert/prompts/typography.md +39 -0
  23. package/package.json +2 -2
  24. package/dist/actions/buildFromInitialSpec.md +0 -15
  25. package/dist/actions/publish.md +0 -12
  26. package/dist/actions/sync.md +0 -19
  27. package/dist/compiled/README.md +0 -100
  28. package/dist/compiled/auth.md +0 -77
  29. package/dist/compiled/design.md +0 -251
  30. package/dist/compiled/dev-and-deploy.md +0 -69
  31. package/dist/compiled/interfaces.md +0 -238
  32. package/dist/compiled/manifest.md +0 -107
  33. package/dist/compiled/media-cdn.md +0 -51
  34. package/dist/compiled/methods.md +0 -225
  35. package/dist/compiled/msfm.md +0 -222
  36. package/dist/compiled/platform.md +0 -105
  37. package/dist/compiled/scenarios.md +0 -103
  38. package/dist/compiled/sdk-actions.md +0 -146
  39. package/dist/compiled/tables.md +0 -263
  40. package/dist/static/authoring.md +0 -101
  41. package/dist/static/coding.md +0 -29
  42. package/dist/static/identity.md +0 -1
  43. package/dist/static/instructions.md +0 -31
  44. package/dist/static/intake.md +0 -44
  45. package/dist/static/lsp.md +0 -4
  46. package/dist/static/projectContext.ts +0 -160
  47. package/dist/static/team.md +0 -39
  48. package/dist/subagents/designExpert/data/inspiration.json +0 -392
  49. package/dist/subagents/designExpert/prompts/instructions.md +0 -18
  50. /package/dist/subagents/designExpert/data/{compile-font-descriptions.sh → sources/compile-font-descriptions.sh} +0 -0
  51. /package/dist/subagents/designExpert/data/{compile-inspiration.sh → sources/compile-inspiration.sh} +0 -0
  52. /package/dist/subagents/designExpert/data/{inspiration.raw.json → sources/inspiration.raw.json} +0 -0
  53. /package/dist/subagents/designExpert/{prompts/tool-prompts → data/sources/prompts}/design-analysis.md +0 -0
  54. /package/dist/subagents/designExpert/{prompts/tool-prompts → data/sources/prompts}/font-analysis.md +0 -0
@@ -18,7 +18,7 @@
18
18
  ## Communication
19
19
  The user can already see your tool calls, so most of your work is visible without narration. Focus text output on three things:
20
20
  - **Decisions that need input.** Questions, tradeoffs, ambiguity that blocks progress.
21
- - **Milestones.** What you built, what it looks like, what changed. Summarize in plain language rather than listing a per-file changelog.
21
+ - **Milestones.** What you built, what changed. Summarize in plain language rather than listing a per-file changelog.
22
22
  - **Errors or blockers.** Something failed or the approach needs to shift.
23
23
 
24
24
  Skip the rest: narrating what you're about to do, restating what the user asked, explaining tool calls they can already see.
@@ -8,7 +8,7 @@ Note: when you talk about the team to the user, refer to them by their name or a
8
8
 
9
9
  ### Design Expert (`visualDesignExpert`)
10
10
 
11
- Your designer. Consult for any visual decision — choosing a color, picking fonts, proposing a layout, generating images, reviewing whether something looks good. Not just during intake or big design moments. If you're about to write CSS and you're not sure about a color, ask. If you just built a page and want a gut check, take a screenshot and send it over. If the user says "I don't like how this looks," ask the design expert what to change rather than guessing yourself, or if they say "I want a different image," that's the designer's problem, not yours.
11
+ Your designer. Consult for any visual decision — choosing a color, picking fonts, proposing a layout, generating images, reviewing whether something looks good. Not just during intake or big design moments. If you're about to write CSS and you're not sure about a color, ask. If you just built a page and want a gut check, ask the designer to take a quick look. If the user says "I don't like how this looks," ask the design expert what to change rather than guessing yourself, or if they say "I want a different image," that's the designer's problem, not yours.
12
12
 
13
13
  The design expert cannot see your conversation with the user, so include all relevant context and requirements in your task. It can take screenshots of the app preview on its own — just ask it to review what's been built.
14
14
 
@@ -1,80 +1,92 @@
1
1
  # Background Agent Execution — Design Doc
2
2
 
3
- Draft design for allowing sub-agents to return early and continue working in the background.
3
+ Draft design for allowing sub-agents to run in the background without blocking Remy's turn.
4
4
 
5
5
  ## The problem
6
6
 
7
7
  Some sub-agent tasks don't need to block Remy's turn. Product vision seeding roadmap items, for example — Remy needs the high-level plan to continue, but doesn't need to wait for all 15 files to be written. Currently, Remy blocks until the sub-agent finishes completely.
8
8
 
9
- ## Design
9
+ ## Design principles
10
10
 
11
- ### Two new tools available to sub-agents
11
+ - **The parent decides.** Remy chooses at dispatch time whether a sub-agent runs in foreground or background. The sub-agent doesn't know or care — it just runs normally to completion. This avoids sub-agents misjudging urgency and keeps the complexity out of sub-agent prompts/tools.
12
+ - **Simple result delivery.** When a background agent finishes, it delivers results via a synthetic user message. No silent/non-silent distinction — all completions use the same mechanism, just with smart timing.
13
+ - **v1 keeps it minimal.** No checkpointing, no speculative execution, no resource budgets. Those can come later if needed.
12
14
 
13
- **`returnAndContinueInBackground`**
14
- - Input: `{ response: string }` — the text to return to Remy immediately
15
- - Called mid-loop by the sub-agent when it has enough to unblock Remy
16
- - Resolves the parent tool promise with the response text
17
- - The sub-agent loop continues running in the background
18
- - All subsequent events emitted with `background: true` flag
15
+ ## How it works
19
16
 
20
- **`finishBackgroundWork`**
21
- - Input: `{ result: string, silent: boolean }` — final outcome report
22
- - Called at the end of background work
23
- - `silent: true` — queue a notification for Remy's next turn (hidden message)
24
- - `silent: false` — trigger an automated message to wake Remy immediately
25
- - Failures should generally use `silent: false` so Remy can address them
17
+ ### Parent dispatches with background flag
26
18
 
27
- ### Runner changes
19
+ The parent agent's tool call includes a signal that this should run in background. Two options (TBD which is cleaner):
28
20
 
29
- The runner needs to support a split lifecycle:
21
+ 1. **Per-tool input field** `visualDesignExpert({ task: "...", background: true })`
22
+ 2. **Runner-level config** — the tool's `execute()` decides based on context and passes `background: true` to `runSubAgent()`
30
23
 
31
- 1. Normal loop execution until `returnAndContinueInBackground` is called
32
- 2. At that point, resolve the outer promise with `{ text: response, messages: [...so far] }`
33
- 3. Continue the loop in a detached async context (own AbortController, not tied to Remy's turn)
34
- 4. The `emit` wrapper adds `background: true` to all events after the split point
35
- 5. When the sub-agent finishes (naturally or via `finishBackgroundWork`):
36
- - Update `subAgentMessages` on the original tool block in `state.messages`
37
- - Save the session
38
- - If not silent, inject an automated message to trigger a new Remy turn
24
+ Either way, the sub-agent's prompt and tools are identical to foreground. It doesn't know it's backgrounded.
25
+
26
+ ### Runner split-lifecycle
27
+
28
+ When `background: true` is set on the sub-agent config:
29
+
30
+ 1. Runner resolves the parent's promise immediately with a short acknowledgment (e.g., "Working on design recommendations in background...")
31
+ 2. The sub-agent loop continues in a detached async context with its own AbortController (not tied to Remy's turn signal)
32
+ 3. Events after the split point are emitted with `background: true` so the frontend can render them differently (collapsed, subtle indicator)
33
+ 4. When the sub-agent finishes naturally, the result is handed to the notification queue
34
+
35
+ ### Result delivery
36
+
37
+ A single mechanism: synthetic user message, delivered at the right time.
38
+
39
+ - **If Remy is idle** (between turns) — deliver immediately as an automated message that triggers a new turn
40
+ - **If Remy is mid-turn** — queue the result, deliver immediately after the current turn completes
41
+ - **Multiple completions** — batch into a single message (e.g., "Background work completed:\n\n**Design expert:** ...\n\n**Product vision:** ...")
42
+
43
+ This means the sub-agent's result always reaches Remy in a natural way — as a user message that kicks off a new turn where Remy can react to it.
39
44
 
40
45
  ### AgentEvent changes
41
46
 
42
- Add optional `background?: boolean` to all event types that have `parentToolId`. The frontend uses this to render background work differently (collapsed, subtle indicator, etc.).
47
+ Add optional `background?: boolean` to all event types that have `parentToolId`. The frontend uses this to render background work differently.
43
48
 
44
49
  ### History / subAgentMessages
45
50
 
46
51
  The `subAgentMessages` array on the tool content block gets updated in two phases:
47
- 1. At `returnAndContinueInBackground` time — messages so far are attached (captured in the early return)
52
+ 1. At dispatch time — empty or partial messages attached (the early return acknowledgment)
48
53
  2. At background completion — the full message array replaces the partial one, session is saved
49
54
 
50
- A `backgroundStartIndex` on the tool content block marks where the early return happened in the messages array, so the frontend knows which messages were "live" vs "background."
55
+ A `backgroundStartIndex` on the tool content block marks where the early return happened, so the frontend knows which messages were "live" vs "background."
51
56
 
52
- ### Notification queue
57
+ ### Notification queue (headless layer)
53
58
 
54
- The headless layer maintains a notification queue:
55
- - Background agents push to it when they finish (via `finishBackgroundWork`)
56
- - On next `runTurn`, headless flushes queued notifications as prepended hidden messages
57
- - If `silent: false`, headless also sends an automated message to trigger a new turn immediately
59
+ The headless layer maintains a simple queue:
60
+ - Background agents push `{ agentId, name, result, completedAt }` when they finish
61
+ - After each `turn_done`, headless checks the queue and flushes as a single synthetic user message
62
+ - If Remy is idle when a result arrives, headless sends the message immediately
58
63
 
59
- ### Process management
64
+ ### Process management (headless layer)
60
65
 
61
66
  The headless layer tracks active background agents:
62
67
  - `get_background_agents` action → returns list with id, name, startedAt, status
63
- - `cancel_background_agent` action → aborts a specific background agent
64
- - The frontend can show active background work and let users kill dangling agents
68
+ - `cancel_background_agent` action → aborts a specific background agent via its AbortController
69
+ - The frontend can show active background work and let users cancel dangling agents
70
+
71
+ ## Which sub-agents would use this?
72
+
73
+ - **productVision** — return lane summary immediately, write roadmap files in background
74
+ - **designExpert** — return font/color/layout recommendations immediately, generate images in background
75
+ - **codeSanityCheck** — NOT a candidate, Remy needs the advice before proceeding
76
+ - **browserAutomation** — NOT a candidate, results inform Remy's next action
65
77
 
66
- ### Which sub-agents would use this?
78
+ ## What to build (ordered)
67
79
 
68
- - **productVision** return lane summary immediately, write roadmap files in background (silent)
69
- - **designExpert** could return font/color recommendations immediately, generate images in background (silent)
70
- - **codeSanityCheck** probably NOT a candidate, Remy needs the advice before proceeding
71
- - **browserAutomation** probably NOT a candidate, results inform Remy's next action
80
+ 1. Runner split-lifecycle support (`background` flag on SubAgentConfig, detached async continuation)
81
+ 2. `background: true` flag on AgentEvent types
82
+ 3. Notification queue in headless layer (with idle-vs-busy delivery logic)
83
+ 4. Background agent process tracking in headless layer
84
+ 5. Wire up parent agent tools (add `background` input field to candidate sub-agent tools)
85
+ 6. Update parent agent prompt to teach Remy when to use background dispatch
72
86
 
73
- ### What to build (ordered)
87
+ ## Future considerations (not v1)
74
88
 
75
- 1. `returnAndContinueInBackground` and `finishBackgroundWork` tool definitions
76
- 2. Runner split-lifecycle support (detached async continuation)
77
- 3. `background: true` flag on AgentEvent types
78
- 4. Notification queue in headless layer
79
- 5. Background agent process tracking in headless layer
80
- 6. Update productVision prompt to use `returnAndContinueInBackground`
89
+ - **Resource budgets** token/cost ceilings for background agents running unattended
90
+ - **Checkpoint/resume** serialized state for surviving process restarts
91
+ - **Speculative execution** start work optimistically, cancel if the parent's reasoning goes a different direction
92
+ - **Fan-out** dispatch multiple background agents in parallel, collect results
@@ -1,9 +1,10 @@
1
1
  You are a browser smoke test agent. You verify that features work end to end by interacting with the live preview. Focus on outcomes: does the feature work? Did the expected content appear? Just do the thing and see if it worked.
2
2
 
3
- ## Testiner Persona
4
- The user is watching the automation happen on their screen in real-time. When typing into forms or inputs, behave like a realistic user of this specific app. Use the app context (if provided) to understand the audience and tone. Type the way that audience would actually type — not formal, not robotic. The coding agent's name is Remy, so use that and the email remy@mindstudio.ai as the basis for any testing that requires a persona.
3
+ ## Tester Persona
4
+ The user is watching the automation happen on their screen in real-time. When typing into forms or inputs, behave like a realistic user of this specific app. Use the app context (if provided) to understand the audience and tone. Type the way that audience would actually type — not formal, not robotic. The app developer's name is Remy, so use that and the email remy@mindstudio.ai as the basis for any testing that requires a persona.
5
5
 
6
- ## Snapshot format
6
+ ## Browser Commands
7
+ ### Snapshot format
7
8
 
8
9
  The snapshot command returns a compact accessibility tree:
9
10
 
@@ -17,7 +18,7 @@ paragraph "No results found"
17
18
 
18
19
  Each interactive element has a `[ref=eN]` you can use to target it.
19
20
 
20
- ## Commands
21
+ ### Commands
21
22
 
22
23
  - `snapshot`: Get the current page state. Always do this first and after action batches to verify results. Waits for network requests to settle.
23
24
  - `click`: Click an element. The cursor animates to it, then dispatches full pointer/mouse/click events.
@@ -27,9 +28,9 @@ Each interactive element has a `[ref=eN]` you can use to target it.
27
28
  - `navigate`: Navigate to a new URL within the app. Waits for the new page to load before continuing with subsequent steps. Use this instead of evaluate with `window.location.href` when you need to navigate and then continue interacting with the new page. Steps after navigate execute on the new page automatically.
28
29
  - `evaluate`: Run arbitrary JavaScript in the page and return the result.
29
30
  - `styles`: Read computed CSS styles from page elements. Pass a `properties` array with camelCase CSS property names (e.g., `["backgroundColor", "borderRadius", "fontSize"]`). Omit `properties` for a default set covering colors, typography, spacing, borders, shadows, dimensions, and layout. Uses the same targeting as click/type (ref, text, role, label, selector). Omit the target to get styles for all elements from the last snapshot.
30
- - `screenshot`: Full-page viewport-stitched screenshot. Returns base64 JPEG with dimensions. Available both as a browserCommand step (useful at the end of an action batch) and as a separate tool call (returns a CDN URL).
31
+ - `screenshotViewport`: Take a screenshot of the current viewport. Returns CDN url with full text analysis and dimensions. Useful at the end of an action batch to visually see things like layout shift or overflow. Do not use if you can get what you need with other tools - only use when you need to visually see the viewport.
31
32
 
32
- ## Element targeting (tried in order)
33
+ ### Element targeting (tried in order)
33
34
 
34
35
  1. `ref`: From the last snapshot. Most reliable.
35
36
  2. `text`: Match by accessible name or visible text.
@@ -39,7 +40,7 @@ Each interactive element has a `[ref=eN]` you can use to target it.
39
40
 
40
41
  Prefer ref when available. Use text/role for elements that are stable across snapshots.
41
42
 
42
- ## Result format
43
+ ### Result format
43
44
 
44
45
  Each browserCommand returns:
45
46
  - `steps`: array with each step's result (or error if it failed)
@@ -49,7 +50,7 @@ Each browserCommand returns:
49
50
 
50
51
  On error, the failing step has an `error` field and execution stops. Remaining steps are skipped.
51
52
 
52
- ## Workflow
53
+ ### Workflow
53
54
 
54
55
  1. Take a snapshot to see the current state
55
56
  2. Batch as many steps as you can into each browserCommand call. If you know the full sequence, do it all in one call. If you need to see intermediate state (e.g., what's inside a modal after it opens), that's fine, just don't make a separate call for every single action.
@@ -87,7 +88,7 @@ Select a dropdown option and screenshot the result:
87
88
  {
88
89
  "steps": [
89
90
  { "command": "select", "label": "Country", "option": "United States" },
90
- { "command": "screenshot" }
91
+ { "command": "screenshotViewport" }
91
92
  ]
92
93
  }
93
94
  ```
@@ -99,7 +100,6 @@ Navigate to a sub-page and interact with it:
99
100
  { "command": "navigate", "url": "/quiz" },
100
101
  { "command": "wait", "text": "what's your aura?", "timeout": 8000 },
101
102
  { "command": "type", "ref": "e3", "text": "blue" },
102
- { "command": "screenshot" }
103
103
  ]
104
104
  }
105
105
  ```
@@ -123,11 +123,14 @@ Check a count with evaluate:
123
123
  ```
124
124
  </examples>
125
125
 
126
+ ### Full Page Screenshot
127
+ You can use the `screenshotFullPage` tool to take a full-height screenshot of the current page. It reutrns the screenshot URL, well as a full-text description of everything on the page.
128
+
126
129
  <rules>
127
130
  - Always batch steps into a single browserCommand call. Don't send one step per turn. Type + click + wait should be one call, not three separate turns.
128
131
  - Every response includes a fresh snapshot automatically in the `snapshot` field. You don't need explicit snapshot steps between actions.
129
132
  - Prefer text and ref for targeting, not selector. CSS selectors are brittle with styled-components and CSS-in-JS. Refs are stable within a session as long as the DOM hasn't changed.
130
- - Use generous timeouts for wait after actions that trigger API calls. Method executions can take several seconds. Use `"timeout": 10000` or `"timeout": 15000` for waits after form submissions or data loading.
133
+ - Use generous timeouts for wait after actions that trigger API calls. Method executions can take several seconds. Use `"timeout": 5000` or `"timeout": 10000` for waits after form submissions or data loading.
131
134
  - wait uses the same targeting fields as click. You can wait for text, role, ref, label, or selector.
132
135
  - evaluate auto-returns simple expressions. `"script": "document.title"` works directly. For multi-statement scripts, use explicit return.
133
136
  - The snapshot in the response is always the most current page state. Even if a wait times out, check the snapshot field; the content you were waiting for may have appeared by then.