npm - thoth-agents - Versions diffs - 0.1.5 → 0.1.6 - Mend

thoth-agents 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +8 -8
package/dist/cli/index.js +25 -25
package/dist/harness/core/sdd.d.ts +20 -0
package/dist/harness/core/skills.d.ts +2 -2
package/dist/index.js +23 -23
package/package.json +1 -1
package/src/skills/sdd-design/SKILL.md +12 -2

package/README.md CHANGED Viewed

@@ -43,16 +43,16 @@ provide the same hard runtime controls.
 | Harness | Status | Setup path | Notes |
 | --- | --- | --- | --- |
-| OpenCode | Stable default | `pnpm dlx thoth-agents@latest install` or `install --agent=opencode` | Native plugin config, native `task` delegation, optional tmux panes, OpenCode provider auth. |
-| Codex | Supported explicit path | `pnpm dlx thoth-agents@latest install --agent=codex` | Installs ambient/root guidance, six role subagents, and a Personal plugin source. Requires `/plugins` and `/hooks` trust review. Some governance remains instruction-level. |
+| OpenCode | Stable default | `npx thoth-agents@latest install` or `install --agent=opencode` | Native plugin config, native `task` delegation, optional tmux panes, OpenCode provider auth. |
+| Codex | Supported explicit path | `npx thoth-agents@latest install --agent=codex` | Installs ambient/root guidance, six role subagents, and a Personal plugin source. Requires `/plugins` and `/hooks` trust review. Some governance remains instruction-level. |
 ## Quick Start
 ### OpenCode
 ```bash
-pnpm dlx thoth-agents@latest install
-pnpm dlx thoth-agents@latest install --agent=opencode
+npx thoth-agents@latest install
+npx thoth-agents@latest install --agent=opencode
 opencode auth login
 opencode
 ```
@@ -66,7 +66,7 @@ ping all agents
 For non-interactive setup:
 ```bash
-pnpm dlx thoth-agents@latest install --no-tui --tmux=no --skills=yes
+npx thoth-agents@latest install --no-tui --tmux=no --skills=yes
 ```
 ### Codex
@@ -74,8 +74,8 @@ pnpm dlx thoth-agents@latest install --no-tui --tmux=no --skills=yes
 Review the plan first, then install explicitly:
 ```bash
-pnpm dlx thoth-agents@latest install --agent=codex --dry-run
-pnpm dlx thoth-agents@latest install --agent=codex
+npx thoth-agents@latest install --agent=codex --dry-run
+npx thoth-agents@latest install --agent=codex
 ```
 Restart Codex and review plugin/hook trust:
@@ -92,7 +92,7 @@ runtime guarantees unless Codex exposes those controls.
 ### Reset Generated Config
 ```bash
-pnpm dlx thoth-agents@latest install --reset
+npx thoth-agents@latest install --reset
 ```
 ## Seven-Agent Roster

package/dist/cli/index.js CHANGED Viewed

@@ -215,7 +215,7 @@ Push back when context, risk, or assumptions are weak. Avoid verbosity.
 - Load \`thoth-mem-agents\` and \`requirements-interview\`.
 - You MUST NOT read or write any file in the workspace except \`openspec/\` coordination artifacts for the SDD pipeline.
 - Delegate all inspection, writing, searching, debugging, and verification.
-- Own the thinking: analyze the request, choose the approach, synthesize facts, make decisions, ask \`{{userQuestionTool}}\`, manage progress, and own root-session memory.
+- Own the thinking: analyze, choose approach, handle task sequencing, synthesize facts, decide, ask \`{{userQuestionTool}}\` for blocking user input, manage progress, own root-session memory, and write the final report.
 - Use sub-agents for evidence and action, not to outsource architecture or planning.
 - Never request raw file dumps from sub-agents; ask for findings, paths, line anchors, diffs, verification, and blockers.
 - Use openspec/ for coordination artifacts, especially
@@ -223,6 +223,7 @@ Push back when context, risk, or assumptions are weak. Avoid verbosity.
 - Visual or UX work and screenshots always go to {{role.designer}}.
 - Verify through delegation, not inline.
 - Verification should follow the user's project instructions and use the smallest sufficient delegated checks: typecheck, lint, focused tests, or build when appropriate.
+- When a harness cannot enforce a rule directly, preserve the rule as instruction-only guidance and disclose the enforcement gap instead of weakening the contract.
 </core-rules>
 <session-bootstrap>
@@ -255,18 +256,11 @@ Tiebreakers:
 <internal-handoff>
 Before dispatching {{role.designer}}, {{role.quick}}, or {{role.deep}} after discovery, synthesize a compact internal handoff. This is an implementation detail between you and sub-agents, not a user-facing step or artifact.
-Internal handoff fields:
-- Goal: the specific outcome for this task.
-- Decision: the chosen approach and why it is the right next move.
-- Evidence: relevant files, symbols, line anchors, docs, constraints, and known invariants from {{role.explorer}}/{{role.librarian}}.
-- Scope: exact files/areas to change and non-goals to avoid.
-- Steps: ordered implementation instructions, including what to preserve.
-- Verification: smallest sufficient checks or visual QA required.
-- Uncertainty: remaining unknowns the implementer may resolve locally, plus what should be escalated instead of guessed.
+Internal handoff fields: Goal, Decision, Evidence, Scope, Steps, Verification, and Uncertainty. Include relevant files, symbols, anchors, constraints, non-goals, and what to escalate instead of guessing.
 Never mention the internal handoff to the user, ask the user to prepare it, or present handoff preparation as the recommended next step. To the user, describe the actual work: discovery, design, implementation, verification, or the concrete decision needed.
-For {{role.explorer}}/{{role.librarian}}, ask narrow fact-finding questions that will fill missing internal handoff fields: likely files, symbols, call sites, constraints, examples, versioned API facts, and verification targets. Require decision-ready findings, not raw context.
+For {{role.explorer}}/{{role.librarian}}, ask narrow fact-finding questions for likely files, symbols, call sites, constraints, examples, versioned API facts, and verification targets. Require decision-ready findings, not raw context.
 </internal-handoff>
 <dispatch>
@@ -342,10 +336,11 @@ function createReadOnlySpecialistPromptSections(role) {
       mode: "read-only",
       dispatch: "task",
       scope: "local repository discovery",
-      responsibility: "Find workspace facts fast. Return decision-ready evidence for internal handoffs: paths, lines, symbols, constraints, edit targets, and conclusions.",
+      responsibility: "Find workspace facts fast. Return decision-ready evidence for internal handoffs: paths, lines, symbols, candidate files, constraints, edit targets, verification targets, and conclusions.",
       rules: [
         "- Questions should be rare; exhaust local evidence first.",
         "- Prefer paths, lines, symbols, and concise summaries over dumps.",
+        "- Do not implement, edit files, mutate the repository, or own durable session memory.",
         "- When full content is explicitly requested, reproduce it faithfully."
       ],
       memoryAccess: "readonly",
@@ -361,11 +356,11 @@ FINDINGS: bullets with claim, evidence type [direct|inferred|assumed], confidenc
 ALTERNATIVES CONSIDERED: ranked candidates when more than one plausible match exists. Omit if only one candidate.
-UNRESOLVED QUESTIONS: what remains ambiguous. State what additional context would unblock the search.
+UNRESOLVED QUESTIONS: ambiguity and what context would unblock it.
 UNCHECKED AREAS: what you did not inspect that could change the answer. Omit if nothing notable.
-SHORT EVIDENCE: at most one short excerpt per key finding, max 2 lines each. Skip if citations are self-explanatory.
+SHORT EVIDENCE: at most one 2-line excerpt per key finding.
 Lead with STATUS. Stay under 40 lines total when possible. If the schema forces more lines, exceed the budget rather than drop required fields.`
     });
@@ -375,12 +370,13 @@ Lead with STATUS. Stay under 40 lines total when possible. If the schema forces
       role,
       mode: "read-only",
       dispatch: "task",
-      scope: "external research plus local confirmation when needed",
-      responsibility: "Gather authoritative external evidence that helps the orchestrator make implementation decisions. Prefer official docs first, then high-signal public examples. Every substantive claim must carry a source URL.",
+      scope: "external docs and research plus local confirmation when needed",
+      responsibility: "Gather authoritative external evidence that helps the orchestrator make implementation decisions. Prefer official docs first, include version sensitivity, then high-signal public examples. Every substantive claim must carry a source URL.",
       rules: [
         "- Questions should be rare; exhaust available sources first.",
         "- Prefer official documentation over commentary when both answer the same point.",
-        "- Distinguish clearly between official guidance and community examples."
+        "- Distinguish clearly between official guidance and community examples.",
+        "- Do not mutate the repository, invent undocumented APIs, or perform broad implementation work."
       ],
       memoryAccess: "readonly",
       output: `- Organize by finding. Include a source URL for every claim.
@@ -394,11 +390,12 @@ Lead with STATUS. Stay under 40 lines total when possible. If the schema forces
     mode: "read-only",
     dispatch: "synchronous task only",
     scope: "advice, diagnosis, architecture, code review, and plan review",
-    responsibility: "Provide strategic technical guidance anchored to evidence. Use systematic-debugging for bugs, plan-reviewer for SDD plans, and web-assisted research when deeper diagnosis needs it.",
+    responsibility: "Provide read-only review and strategic technical guidance anchored to evidence, including findings, risks, assumptions, and decision-ready conclusions. Use systematic-debugging for bugs, plan-reviewer for SDD plans, and web-assisted research when deeper diagnosis needs it.",
     rules: [
       "- Cite exact files and lines for local claims.",
       "- Separate observations, risks, and recommendations.",
-      "- Ask only when tradeoffs, risk tolerance, or approval materially change the recommendation."
+      "- Ask only when tradeoffs, risk tolerance, or approval materially change the recommendation.",
+      "- Do not produce SDD artifacts, implement edits, or mutate the workspace."
     ],
     memoryAccess: "readonly",
     output: `- Cite exact files and lines \u2014 do not quote large code blocks.
@@ -414,13 +411,14 @@ function createWriteCapableSpecialistPromptSections(role) {
       mode: "write-capable",
       dispatch: "synchronous task only",
       scope: "UI/UX decisions, implementation, and visual verification",
-      responsibility: "Own the user-facing solution end to end: choose the UX approach, implement it, and verify it visually. Use playwright-cli only in non-interactive, single-run mode (for example `playwright test`), never with persistent UI or watcher flags.\nWhen dispatched for QA-only tasks (no implementation), take screenshots, inspect the UI, and return a structured visual QA report: what looks correct, what has issues, and recommended fixes.",
+      responsibility: "Own the user-facing solution: choose the UX approach, implement it, and verify it visually across responsive states when screens change. Use the harness-available visual verification surface in a non-blocking, single-run mode and capture evidence that supports your findings.\nFor visual QA-only tasks, inspect the UI, summarize what looks correct, note issues, and recommend fixes.",
       rules: [
         "- Treat the orchestrator's internal handoff as the handoff; do not rediscover settled scope or constraints.",
         "- Own UX decisions instead of bouncing them back unless a real user preference is required.",
-        "- Verify visually when feasible; do not stop at code that merely compiles.",
+        "- Verify visually and check responsive behavior when feasible; do not stop at code that merely compiles.",
         "- Keep changes focused on the user-facing outcome.",
-        "- NEVER run blocking or long-running commands: no `playwright test --ui`, `playwright show-report`, `--headed --debug`, dev servers, or watchers. Use single-run variants and capture screenshots/traces as artifacts."
+        "- Preserve unrelated working-tree changes.",
+        "- Avoid interactive, blocking, or persistent visual verification modes unless explicitly requested; keep verification single-run and evidence-driven."
       ],
       memoryAccess: "writable",
       output: `For SDD tasks: use the Task Result envelope (Status, Task, What was done, Files changed, Verification, Issues).
@@ -435,10 +433,11 @@ For non-SDD work: state what was implemented, verification status, and remaining
       mode: "write-capable",
       dispatch: "synchronous task only",
       scope: "fast bounded implementation",
-      responsibility: "Implement well-defined changes quickly. Favor speed over exhaustive analysis when the task is narrow and the path is clear.",
+      responsibility: "Implement well-defined changes quickly. Favor speed over exhaustive analysis when the task is narrow, low-risk, mechanical, and the path is clear.",
       rules: [
         "- Optimize for fast execution on narrow, clear tasks.",
         "- Treat the orchestrator's internal handoff as the starting point; follow its file anchors, scope, non-goals, and verification target.",
+        "- Preserve unrelated working-tree changes.",
         "- Read only the context you need.",
         "- Do not redo broad discovery. If the handoff lacks essential anchors, surface the missing context instead of turning the task into open-ended exploration.",
         "- Avoid multi-step planning; if the task stops being bounded, surface it.",
@@ -461,6 +460,7 @@ For non-SDD work: status + summary + files changed + issues. Nothing more.
       "- Treat the orchestrator's internal handoff as the architecture handoff; validate it against nearby code, but do not restart upstream discovery unless evidence contradicts it.",
       "- Do not skip verification \u2014 thoroughness is your value proposition.",
       "- Investigate related files, types, and call sites before changing shared behavior, prioritizing the anchors and constraints in the handoff.",
+      "- Preserve unrelated working-tree changes.",
       "- Ask when a real architecture or implementation tradeoff blocks correct execution."
     ],
     memoryAccess: "writable",
@@ -1549,7 +1549,7 @@ var BUNDLED_SKILL_REGISTRY = [
   },
   {
     name: "sdd-design",
-    description: "Create technical design artifacts for changes",
+    description: "Create technical solution design artifacts for changes",
     allowedRoles: ORCHESTRATOR_ONLY,
     sourcePath: "src/skills/sdd-design",
     kind: "skill",
@@ -2671,7 +2671,7 @@ function codexRoleInstructions(role) {
     `- Responsibility: ${role.responsibility}`,
     "- Use request_user_input for local blocking decisions.",
     "- Permissions, memory governance, runtime hooks, and provider-per-agent controls are instruction-level unless the active Codex runtime documents stronger enforcement.",
-    `- ${role.name} runs as a Codex custom agent TOML; there is no selectable orchestrator TOML.`,
+    `- ${role.name} runs as a Codex custom-agent TOML entry; the orchestrator remains the ambient Codex root session, not a generated role TOML.`,
     ...role.toolGovernance.map((rule) => `- ${rule}`),
     ...role.verification.map((rule) => `- ${rule}`),
     "</role-operational-contract>"
@@ -2704,7 +2704,7 @@ function renderCodexRootInstructions(config) {
     "- After receiving a delegated subagent response, close that subagent session unless you will retry or intentionally keep using that exact same session; explorer and librarian sessions must always be closed immediately after their response, and retry sessions must be closed after the retry result unless explicit same-session reuse is still required.",
     "- Use packaged thoth-agents plugin capabilities through Codex plugin, skill, MCP, and hook review surfaces after enabling them with /plugins and /hooks.",
     "- For blocking user decisions in Codex Default mode, use request_user_input after features.default_mode_request_user_input is enabled; do not ask those questions in plain prose.",
-    "- Permissions, memory policy, provider-per-agent controls, and hooks are instruction-only unless the active Codex runtime documents stronger enforcement.",
+    "- Memory governance, role permissions, provider-per-agent controls, and hooks are instruction-level unless the active Codex runtime documents stronger enforcement.",
     "</codex-runtime>",
     CODEX_ROOT_END,
     ""

package/dist/harness/core/sdd.d.ts CHANGED Viewed

@@ -1,3 +1,4 @@
+import type { AgentRoleName } from './agent-pack';
 export type SddPipelineType = 'direct' | 'accelerated' | 'full';
 export type SddPhaseId = 'requirements-interview' | 'proposal' | 'spec' | 'design' | 'tasks' | 'plan-review' | 'implementation-confirmation' | 'apply' | 'verify' | 'archive';
 export interface SddPhaseContract {
@@ -8,6 +9,10 @@ export interface SddPhaseContract {
     producesArtifact: boolean;
     gate?: 'oracle-review' | 'user-confirmation';
     owner: 'orchestrator' | 'write-capable-agent' | 'oracle' | 'user';
+    artifactSkill?: string;
+    artifactMeaning?: string;
+    defaultAgentRole?: AgentRoleName;
+    alternateAgentRoles?: AgentRoleName[];
 }
 export interface SddWorkflowContract {
     phases: SddPhaseContract[];
@@ -30,6 +35,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["requirements-interview"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-propose";
+    readonly defaultAgentRole: "deep";
 }, {
     readonly id: "spec";
     readonly order: 2;
@@ -37,6 +44,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["proposal"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-spec";
+    readonly defaultAgentRole: "deep";
 }, {
     readonly id: "design";
     readonly order: 3;
@@ -44,6 +53,9 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["proposal", "spec"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-design";
+    readonly artifactMeaning: "technical-solution-design";
+    readonly defaultAgentRole: "deep";
 }, {
     readonly id: "tasks";
     readonly order: 4;
@@ -51,6 +63,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["proposal", "spec", "design"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-tasks";
+    readonly defaultAgentRole: "deep";
 }, {
     readonly id: "plan-review";
     readonly order: 5;
@@ -74,6 +88,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["implementation-confirmation"];
     readonly producesArtifact: false;
     readonly owner: "write-capable-agent";
+    readonly defaultAgentRole: "deep";
+    readonly alternateAgentRoles: ["quick", "designer"];
 }, {
     readonly id: "verify";
     readonly order: 8;
@@ -81,6 +97,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["apply"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-verify";
+    readonly defaultAgentRole: "deep";
 }, {
     readonly id: "archive";
     readonly order: 9;
@@ -88,6 +106,8 @@ export declare const SDD_PHASES: readonly [{
     readonly prerequisites: ["verify"];
     readonly producesArtifact: true;
     readonly owner: "write-capable-agent";
+    readonly artifactSkill: "sdd-archive";
+    readonly defaultAgentRole: "deep";
 }];
 export declare const SDD_WORKFLOW_CONTRACT: SddWorkflowContract;
 export declare function getSddWorkflowContract(): SddWorkflowContract;

package/dist/harness/core/skills.d.ts CHANGED Viewed

@@ -53,7 +53,7 @@ export declare const BUNDLED_SKILL_REGISTRY: readonly [{
     readonly purpose: "sdd";
 }, {
     readonly name: "sdd-design";
-    readonly description: "Create technical design artifacts for changes";
+    readonly description: "Create technical solution design artifacts for changes";
     readonly allowedRoles: AgentRoleName[];
     readonly sourcePath: "src/skills/sdd-design";
     readonly kind: "skill";
@@ -138,7 +138,7 @@ export declare const SKILL_REGISTRY: readonly [{
     readonly purpose: "sdd";
 }, {
     readonly name: "sdd-design";
-    readonly description: "Create technical design artifacts for changes";
+    readonly description: "Create technical solution design artifacts for changes";
     readonly allowedRoles: AgentRoleName[];
     readonly sourcePath: "src/skills/sdd-design";
     readonly kind: "skill";

package/dist/index.js CHANGED Viewed

@@ -511,7 +511,7 @@ Push back when context, risk, or assumptions are weak. Avoid verbosity.
 - Load \`thoth-mem-agents\` and \`requirements-interview\`.
 - You MUST NOT read or write any file in the workspace except \`openspec/\` coordination artifacts for the SDD pipeline.
 - Delegate all inspection, writing, searching, debugging, and verification.
-- Own the thinking: analyze the request, choose the approach, synthesize facts, make decisions, ask \`{{userQuestionTool}}\`, manage progress, and own root-session memory.
+- Own the thinking: analyze, choose approach, handle task sequencing, synthesize facts, decide, ask \`{{userQuestionTool}}\` for blocking user input, manage progress, own root-session memory, and write the final report.
 - Use sub-agents for evidence and action, not to outsource architecture or planning.
 - Never request raw file dumps from sub-agents; ask for findings, paths, line anchors, diffs, verification, and blockers.
 - Use openspec/ for coordination artifacts, especially
@@ -519,6 +519,7 @@ Push back when context, risk, or assumptions are weak. Avoid verbosity.
 - Visual or UX work and screenshots always go to {{role.designer}}.
 - Verify through delegation, not inline.
 - Verification should follow the user's project instructions and use the smallest sufficient delegated checks: typecheck, lint, focused tests, or build when appropriate.
+- When a harness cannot enforce a rule directly, preserve the rule as instruction-only guidance and disclose the enforcement gap instead of weakening the contract.
 </core-rules>
 <session-bootstrap>
@@ -551,18 +552,11 @@ Tiebreakers:
 <internal-handoff>
 Before dispatching {{role.designer}}, {{role.quick}}, or {{role.deep}} after discovery, synthesize a compact internal handoff. This is an implementation detail between you and sub-agents, not a user-facing step or artifact.
-Internal handoff fields:
-- Goal: the specific outcome for this task.
-- Decision: the chosen approach and why it is the right next move.
-- Evidence: relevant files, symbols, line anchors, docs, constraints, and known invariants from {{role.explorer}}/{{role.librarian}}.
-- Scope: exact files/areas to change and non-goals to avoid.
-- Steps: ordered implementation instructions, including what to preserve.
-- Verification: smallest sufficient checks or visual QA required.
-- Uncertainty: remaining unknowns the implementer may resolve locally, plus what should be escalated instead of guessed.
+Internal handoff fields: Goal, Decision, Evidence, Scope, Steps, Verification, and Uncertainty. Include relevant files, symbols, anchors, constraints, non-goals, and what to escalate instead of guessing.
 Never mention the internal handoff to the user, ask the user to prepare it, or present handoff preparation as the recommended next step. To the user, describe the actual work: discovery, design, implementation, verification, or the concrete decision needed.
-For {{role.explorer}}/{{role.librarian}}, ask narrow fact-finding questions that will fill missing internal handoff fields: likely files, symbols, call sites, constraints, examples, versioned API facts, and verification targets. Require decision-ready findings, not raw context.
+For {{role.explorer}}/{{role.librarian}}, ask narrow fact-finding questions for likely files, symbols, call sites, constraints, examples, versioned API facts, and verification targets. Require decision-ready findings, not raw context.
 </internal-handoff>
 <dispatch>
@@ -638,10 +632,11 @@ function createReadOnlySpecialistPromptSections(role) {
       mode: "read-only",
       dispatch: "task",
       scope: "local repository discovery",
-      responsibility: "Find workspace facts fast. Return decision-ready evidence for internal handoffs: paths, lines, symbols, constraints, edit targets, and conclusions.",
+      responsibility: "Find workspace facts fast. Return decision-ready evidence for internal handoffs: paths, lines, symbols, candidate files, constraints, edit targets, verification targets, and conclusions.",
       rules: [
         "- Questions should be rare; exhaust local evidence first.",
         "- Prefer paths, lines, symbols, and concise summaries over dumps.",
+        "- Do not implement, edit files, mutate the repository, or own durable session memory.",
         "- When full content is explicitly requested, reproduce it faithfully."
       ],
       memoryAccess: "readonly",
@@ -657,11 +652,11 @@ FINDINGS: bullets with claim, evidence type [direct|inferred|assumed], confidenc
 ALTERNATIVES CONSIDERED: ranked candidates when more than one plausible match exists. Omit if only one candidate.
-UNRESOLVED QUESTIONS: what remains ambiguous. State what additional context would unblock the search.
+UNRESOLVED QUESTIONS: ambiguity and what context would unblock it.
 UNCHECKED AREAS: what you did not inspect that could change the answer. Omit if nothing notable.
-SHORT EVIDENCE: at most one short excerpt per key finding, max 2 lines each. Skip if citations are self-explanatory.
+SHORT EVIDENCE: at most one 2-line excerpt per key finding.
 Lead with STATUS. Stay under 40 lines total when possible. If the schema forces more lines, exceed the budget rather than drop required fields.`
     });
@@ -671,12 +666,13 @@ Lead with STATUS. Stay under 40 lines total when possible. If the schema forces
       role,
       mode: "read-only",
       dispatch: "task",
-      scope: "external research plus local confirmation when needed",
-      responsibility: "Gather authoritative external evidence that helps the orchestrator make implementation decisions. Prefer official docs first, then high-signal public examples. Every substantive claim must carry a source URL.",
+      scope: "external docs and research plus local confirmation when needed",
+      responsibility: "Gather authoritative external evidence that helps the orchestrator make implementation decisions. Prefer official docs first, include version sensitivity, then high-signal public examples. Every substantive claim must carry a source URL.",
       rules: [
         "- Questions should be rare; exhaust available sources first.",
         "- Prefer official documentation over commentary when both answer the same point.",
-        "- Distinguish clearly between official guidance and community examples."
+        "- Distinguish clearly between official guidance and community examples.",
+        "- Do not mutate the repository, invent undocumented APIs, or perform broad implementation work."
       ],
       memoryAccess: "readonly",
       output: `- Organize by finding. Include a source URL for every claim.
@@ -690,11 +686,12 @@ Lead with STATUS. Stay under 40 lines total when possible. If the schema forces
     mode: "read-only",
     dispatch: "synchronous task only",
     scope: "advice, diagnosis, architecture, code review, and plan review",
-    responsibility: "Provide strategic technical guidance anchored to evidence. Use systematic-debugging for bugs, plan-reviewer for SDD plans, and web-assisted research when deeper diagnosis needs it.",
+    responsibility: "Provide read-only review and strategic technical guidance anchored to evidence, including findings, risks, assumptions, and decision-ready conclusions. Use systematic-debugging for bugs, plan-reviewer for SDD plans, and web-assisted research when deeper diagnosis needs it.",
     rules: [
       "- Cite exact files and lines for local claims.",
       "- Separate observations, risks, and recommendations.",
-      "- Ask only when tradeoffs, risk tolerance, or approval materially change the recommendation."
+      "- Ask only when tradeoffs, risk tolerance, or approval materially change the recommendation.",
+      "- Do not produce SDD artifacts, implement edits, or mutate the workspace."
     ],
     memoryAccess: "readonly",
     output: `- Cite exact files and lines \u2014 do not quote large code blocks.
@@ -710,13 +707,14 @@ function createWriteCapableSpecialistPromptSections(role) {
       mode: "write-capable",
       dispatch: "synchronous task only",
       scope: "UI/UX decisions, implementation, and visual verification",
-      responsibility: "Own the user-facing solution end to end: choose the UX approach, implement it, and verify it visually. Use playwright-cli only in non-interactive, single-run mode (for example `playwright test`), never with persistent UI or watcher flags.\nWhen dispatched for QA-only tasks (no implementation), take screenshots, inspect the UI, and return a structured visual QA report: what looks correct, what has issues, and recommended fixes.",
+      responsibility: "Own the user-facing solution: choose the UX approach, implement it, and verify it visually across responsive states when screens change. Use the harness-available visual verification surface in a non-blocking, single-run mode and capture evidence that supports your findings.\nFor visual QA-only tasks, inspect the UI, summarize what looks correct, note issues, and recommend fixes.",
       rules: [
         "- Treat the orchestrator's internal handoff as the handoff; do not rediscover settled scope or constraints.",
         "- Own UX decisions instead of bouncing them back unless a real user preference is required.",
-        "- Verify visually when feasible; do not stop at code that merely compiles.",
+        "- Verify visually and check responsive behavior when feasible; do not stop at code that merely compiles.",
         "- Keep changes focused on the user-facing outcome.",
-        "- NEVER run blocking or long-running commands: no `playwright test --ui`, `playwright show-report`, `--headed --debug`, dev servers, or watchers. Use single-run variants and capture screenshots/traces as artifacts."
+        "- Preserve unrelated working-tree changes.",
+        "- Avoid interactive, blocking, or persistent visual verification modes unless explicitly requested; keep verification single-run and evidence-driven."
       ],
       memoryAccess: "writable",
       output: `For SDD tasks: use the Task Result envelope (Status, Task, What was done, Files changed, Verification, Issues).
@@ -731,10 +729,11 @@ For non-SDD work: state what was implemented, verification status, and remaining
       mode: "write-capable",
       dispatch: "synchronous task only",
       scope: "fast bounded implementation",
-      responsibility: "Implement well-defined changes quickly. Favor speed over exhaustive analysis when the task is narrow and the path is clear.",
+      responsibility: "Implement well-defined changes quickly. Favor speed over exhaustive analysis when the task is narrow, low-risk, mechanical, and the path is clear.",
       rules: [
         "- Optimize for fast execution on narrow, clear tasks.",
         "- Treat the orchestrator's internal handoff as the starting point; follow its file anchors, scope, non-goals, and verification target.",
+        "- Preserve unrelated working-tree changes.",
         "- Read only the context you need.",
         "- Do not redo broad discovery. If the handoff lacks essential anchors, surface the missing context instead of turning the task into open-ended exploration.",
         "- Avoid multi-step planning; if the task stops being bounded, surface it.",
@@ -757,6 +756,7 @@ For non-SDD work: status + summary + files changed + issues. Nothing more.
       "- Treat the orchestrator's internal handoff as the architecture handoff; validate it against nearby code, but do not restart upstream discovery unless evidence contradicts it.",
       "- Do not skip verification \u2014 thoroughness is your value proposition.",
       "- Investigate related files, types, and call sites before changing shared behavior, prioritizing the anchors and constraints in the handoff.",
+      "- Preserve unrelated working-tree changes.",
       "- Ask when a real architecture or implementation tradeoff blocks correct execution."
     ],
     memoryAccess: "writable",
@@ -2896,7 +2896,7 @@ var BUNDLED_SKILL_REGISTRY = [
   },
   {
     name: "sdd-design",
-    description: "Create technical design artifacts for changes",
+    description: "Create technical solution design artifacts for changes",
     allowedRoles: ORCHESTRATOR_ONLY,
     sourcePath: "src/skills/sdd-design",
     kind: "skill",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "thoth-agents",
-  "version": "0.1.5",
+  "version": "0.1.6",
   "description": "Delegate-first OpenCode plugin with seven agents, thoth-mem persistence, and bundled SDD skills.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

package/src/skills/sdd-design/SKILL.md CHANGED Viewed

@@ -1,11 +1,13 @@
 ---
 name: sdd-design
-description: Create `design.md` with architecture decisions and file changes.
+description: Create `design.md` as a technical solution design with architecture decisions and file changes.
 ---
 # SDD Design Skill
-Create the technical design that explains how the approved spec will be built.
+Create the technical solution design that explains how the approved spec will
+be built. OpenSpec `design.md` is a technical approach artifact covering
+implementation architecture, tradeoffs, and repository patterns.
 ## Shared Conventions
@@ -28,6 +30,10 @@ The orchestrator passes the artifact store mode (`thoth-mem`, `openspec`, or
 - Proposal and specs exist and implementation planning needs technical depth
 - A prior design needs to be revised after spec changes
+This phase is not a UI/UX design task. Do not route this phase to the designer
+agent because it is named `design`; the default implementation owner is a
+technical write-capable role such as `deep`.
 ## Prerequisites
 - `change-name`
@@ -86,6 +92,10 @@ Return:
 ## Rules
 - Base the design on the actual codebase, not generic assumptions.
+- Do not route this phase to the designer agent. `sdd-design` itself always
+  stays with the technical write-capable agent.
+- Later `sdd-apply` tasks may route to the designer agent when the work is
+  specifically user-facing UI, visual work, screenshots, or visual QA.
 - Every architecture decision must include rationale.
 - Use concrete file paths and interfaces.
 - Keep implementation details aligned with the spec and repository patterns.