@muggleai/works 3.1.1 → 4.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/README.md +80 -22
  2. package/dist/{chunk-YPRFUVHP.js → chunk-AJKZXT7B.js} +7 -6
  3. package/dist/cli.js +1 -1
  4. package/dist/index.js +1 -1
  5. package/dist/plugin/.claude-plugin/plugin.json +9 -3
  6. package/dist/plugin/.cursor-plugin/plugin.json +1 -1
  7. package/dist/plugin/README.md +16 -5
  8. package/dist/plugin/hooks/hooks.json +3 -1
  9. package/dist/plugin/scripts/ensure-electron-app.sh +30 -4
  10. package/dist/plugin/skills/muggle/SKILL.md +30 -0
  11. package/dist/plugin/skills/{do → muggle-do}/SKILL.md +14 -10
  12. package/{plugin/skills/repair → dist/plugin/skills/muggle-repair}/SKILL.md +4 -4
  13. package/{plugin/skills/status → dist/plugin/skills/muggle-status}/SKILL.md +5 -5
  14. package/dist/plugin/skills/{test-feature-local → muggle-test-feature-local}/SKILL.md +3 -29
  15. package/dist/plugin/skills/muggle-upgrade/SKILL.md +21 -0
  16. package/dist/plugin/skills/optimize-descriptions/SKILL.md +212 -0
  17. package/package.json +1 -1
  18. package/plugin/.claude-plugin/plugin.json +9 -3
  19. package/plugin/.cursor-plugin/plugin.json +1 -1
  20. package/plugin/README.md +16 -5
  21. package/plugin/hooks/hooks.json +3 -1
  22. package/plugin/scripts/ensure-electron-app.sh +30 -4
  23. package/plugin/skills/muggle/SKILL.md +30 -0
  24. package/plugin/skills/{do → muggle-do}/SKILL.md +14 -10
  25. package/{dist/plugin/skills/repair → plugin/skills/muggle-repair}/SKILL.md +4 -4
  26. package/{dist/plugin/skills/status → plugin/skills/muggle-status}/SKILL.md +5 -5
  27. package/plugin/skills/{test-feature-local → muggle-test-feature-local}/SKILL.md +3 -29
  28. package/plugin/skills/muggle-upgrade/SKILL.md +21 -0
  29. package/plugin/skills/optimize-descriptions/SKILL.md +212 -0
  30. package/scripts/postinstall.mjs +60 -1
  31. package/dist/plugin/skills/upgrade/SKILL.md +0 -21
  32. package/plugin/skills/upgrade/SKILL.md +0 -21
@@ -0,0 +1,212 @@
1
+ ---
2
+ name: optimize-descriptions
3
+ description: Optimize Muggle AI's customer-facing text (MCP tool descriptions, skill descriptions, plugin metadata, hooks) so that AI agents prioritize muggle tools for QA, UI validation, and browser testing. Internal tool — not published to customers.
4
+ disable-model-invocation: true
5
+ ---
6
+
7
+ # Optimize Agent-Facing Descriptions
8
+
9
+ Optimize Muggle AI's customer-facing text so AI agents (Claude Code, Cursor, Codex, Gemini) prioritize muggle tools when users ask for QA, UI validation, or browser testing.
10
+
11
+ ## Why This Matters
12
+
13
+ AI agents decide which tools to use based on text they see in:
14
+ 1. **MCP server instructions** — appears in every agent system prompt
15
+ 2. **SessionStart hook context injection** — injected into conversation context at session start
16
+ 3. **Skill descriptions** — SKILL.md frontmatter `description` field
17
+ 4. **MCP tool descriptions** — `description` field in tool-registry.ts
18
+ 5. **Plugin metadata** — plugin.json `description` and `keywords`
19
+
20
+ If these descriptions don't match what users actually say ("test my signup flow", "check if the UI works"), agents won't reach for muggle tools — they'll write Playwright scripts or give advice instead.
21
+
22
+ ## The Five Layers
23
+
24
+ ### Layer 1: MCP Server Instructions (highest impact)
25
+
26
+ **File:** `src/server/mcp-server.ts`
27
+ **Where:** `instructions` field in the `Server` constructor's second parameter (ServerOptions)
28
+ **When agents see it:** In the system prompt as `## plugin:muggle:muggle` section
29
+ **Note:** Requires npm rebuild to deploy changes
30
+
31
+ This is the single highest-impact text. It appears in every agent's system prompt when the MCP server connects. Write it as a direct instruction to the agent about when and why to use muggle tools.
32
+
33
+ ### Layer 2: SessionStart Hook Context Injection
34
+
35
+ **Files:** `plugin/scripts/ensure-electron-app.sh` + `plugin/hooks/hooks.json`
36
+ **When agents see it:** At the start of every interactive session (startup, clear, compact)
37
+ **Supports:** Claude Code (`hookSpecificOutput.additionalContext`) and Cursor (`additional_context`)
38
+
39
+ The hook outputs JSON that gets injected into the agent's conversation context. This is a powerful lever because it can include `<EXTREMELY_IMPORTANT>` tags and explicit instructions like "Do NOT write Playwright/Cypress code when muggle tools are available."
40
+
41
+ ### Layer 3: Skill Descriptions
42
+
43
+ **Files:** `plugin/skills/*/SKILL.md` (frontmatter `description` field)
44
+ **When agents see it:** In the available skills list when deciding whether to invoke a skill
45
+
46
+ Skill descriptions determine if the agent invokes `/muggle:test-feature-local` or `/muggle:do`. In base-case environments (no superpowers framework), skill triggering is inherently low — agents prefer to handle tasks directly. The description still matters when a skill-checking framework is active.
47
+
48
+ ### Layer 4: MCP Tool Descriptions
49
+
50
+ **Files:**
51
+ - `packages/mcps/src/mcp/tools/local/tool-registry.ts` (local execution tools)
52
+ - `packages/mcps/src/mcp/tools/qa/tool-registry.ts` (cloud QA tools)
53
+
54
+ **When agents see it:** When scanning available MCP tools to decide which to call
55
+
56
+ Focus on the highest-impact tools:
57
+ - `muggle-local-execute-test-generation` — the main "run a browser test" tool
58
+ - `muggle-local-execute-replay` — the main "regression test" tool
59
+ - `muggle-remote-project-create` — the entry point for new users
60
+ - `muggle-remote-test-case-generate-from-prompt` — natural language test creation
61
+ - `muggle-remote-workflow-start-website-scan` — site discovery
62
+
63
+ ### Layer 5: Plugin Metadata
64
+
65
+ **File:** `plugin/.claude-plugin/plugin.json`
66
+ **When agents see it:** Marketplace discovery, plugin listings
67
+
68
+ Update `description` and `keywords` fields. Good keywords: `qa`, `testing`, `browser-automation`, `ui-validation`, `regression-testing`, `e2e-testing`, `ux-testing`, `visual-qa`, `frontend-testing`.
69
+
70
+ ## Writing Effective Descriptions
71
+
72
+ ### Principles
73
+
74
+ 1. **Name the user's words, not yours** — "test my signup flow" not "execute test generation"
75
+ 2. **Name what you replace** — "prefer over manual browser testing" steals intent from competitors
76
+ 3. **Be pushy in skill descriptions** — "even if they don't mention 'muggle' explicitly"
77
+ 4. **Concrete examples beat abstractions** — "signup, checkout, dashboards, forms" beats "user experience"
78
+ 5. **Chain hints in tool descriptions** — "Create a project first before generating any QA tests" guides workflow
79
+ 6. **Explicitly exclude alternatives** — "Do NOT write Playwright/Cypress/Selenium code when muggle tools are available"
80
+
81
+ ### Trigger Phrases to Include
82
+
83
+ These are the phrases real users say when they need QA tools:
84
+
85
+ - "test my app", "test this feature", "test the signup flow"
86
+ - "check if it works", "make sure it still works"
87
+ - "run QA", "QA my changes"
88
+ - "validate the UI", "validate my changes"
89
+ - "verify the flow", "verify before merging"
90
+ - "regression test", "run regression"
91
+ - "did I break anything?", "does it still work?"
92
+
93
+ ### Anti-Patterns
94
+
95
+ - Marketing speak ("ship quality products") — agents don't respond to this
96
+ - Implementation details ("manage entities in cloud") — users don't think in these terms
97
+ - Internal jargon ("unified workflow entry point") — users don't say this
98
+ - Generic CRUD descriptions ("create a new project") — no intent signal
99
+
100
+ ## Running Trigger Evals
101
+
102
+ ### Prerequisites
103
+
104
+ ```bash
105
+ # Python 3.10+ with anthropic SDK
106
+ python3 -m venv /tmp/muggle-eval/venv
107
+ source /tmp/muggle-eval/venv/bin/activate
108
+ pip install anthropic
109
+ ```
110
+
111
+ ### Creating an Eval Set
112
+
113
+ Create a JSON file with 10 should-trigger and 10 should-not-trigger queries. Queries must be realistic — the kind of thing an actual developer would type. Include personal context, file paths, casual speech, typos.
114
+
115
+ ```json
116
+ [
117
+ {
118
+ "query": "I just changed the checkout flow — can you test if it still works? App's running on localhost:3000",
119
+ "should_trigger": true
120
+ },
121
+ {
122
+ "query": "write unit tests for the UserService class with jest",
123
+ "should_trigger": false
124
+ }
125
+ ]
126
+ ```
127
+
128
+ **Should-trigger:** Prompts where the agent SHOULD use muggle tools. Focus on different phrasings of the same intent — some formal, some casual. Include cases without "muggle" or "QA" in the prompt.
129
+
130
+ **Should-NOT-trigger (near-misses):** Prompts that share keywords but need different tools. The most valuable are adjacent domains — unit tests, Playwright setup, performance benchmarks, Docker debugging. Avoid obviously irrelevant queries.
131
+
132
+ Save to: `eval/test_feature_local_eval_set.json` (or similar)
133
+
134
+ ### Running the Eval
135
+
136
+ Use the skill-creator's `run_eval.py` script:
137
+
138
+ ```bash
139
+ cd ~/.claude/plugins/cache/claude-plugins-official/skill-creator/unknown/skills/skill-creator
140
+
141
+ python3 -m scripts.run_eval \
142
+ --eval-set /path/to/eval_set.json \
143
+ --skill-path /path/to/plugin/skills/test-feature-local \
144
+ --model claude-opus-4-6 \
145
+ --runs-per-query 3 \
146
+ --verbose
147
+ ```
148
+
149
+ This creates a temporary command file, runs `claude -p` for each query (3x for reliability), and reports trigger rates.
150
+
151
+ **Important limitations of this eval:**
152
+ - Uses `claude -p` (headless) which does NOT load plugin hooks or MCP servers
153
+ - Only measures bare skill triggering — cannot test MCP instructions, hook injection, or tool descriptions
154
+ - In base case, skill trigger rate is typically 0% regardless of description quality (structural limitation)
155
+ - Real-world impact must be tested in interactive sessions
156
+
157
+ ### What the Eval Can and Cannot Measure
158
+
159
+ | Layer | Measurable by eval? | How to test instead |
160
+ |-------|---------------------|---------------------|
161
+ | Skill descriptions | Yes (but low ceiling) | Eval + interactive session |
162
+ | MCP server instructions | No | Interactive session — check system prompt |
163
+ | SessionStart hook injection | No | Interactive session — `/clear` then check context |
164
+ | MCP tool descriptions | No | Interactive session — try a trigger prompt |
165
+ | Plugin metadata | No | Marketplace listing |
166
+
167
+ ### Full Optimization Loop (requires ANTHROPIC_API_KEY)
168
+
169
+ If you have an API key, use `run_loop.py` for automated iteration:
170
+
171
+ ```bash
172
+ export ANTHROPIC_API_KEY=sk-ant-...
173
+
174
+ python3 -m scripts.run_loop \
175
+ --eval-set /path/to/eval_set.json \
176
+ --skill-path /path/to/plugin/skills/test-feature-local \
177
+ --model claude-opus-4-6 \
178
+ --max-iterations 5 \
179
+ --verbose
180
+ ```
181
+
182
+ This splits the eval set 60/40 train/test, evaluates the current description, uses Claude with extended thinking to propose improvements, and iterates up to 5 times.
183
+
184
+ ## Updating Documentation
185
+
186
+ After changing descriptions, update the corresponding docs in `muggle-ai-docs/`:
187
+
188
+ | Source file | Docs file to update |
189
+ |-------------|---------------------|
190
+ | `plugin/skills/test-feature-local/SKILL.md` | `local-testing/skills.md` |
191
+ | `plugin/skills/do/SKILL.md` | `local-testing/skills.md` |
192
+ | `packages/mcps/src/mcp/tools/local/tool-registry.ts` | `local-testing/tools-reference.md` |
193
+ | `plugin/.claude-plugin/plugin.json` | `mcp/overview.md`, `getting-started/overview.md` |
194
+ | `README.md` | (is the docs) |
195
+
196
+ ## Checklist
197
+
198
+ When optimizing descriptions, work through these in order:
199
+
200
+ - [ ] Audit current descriptions against trigger phrases users actually say
201
+ - [ ] Update MCP server `instructions` in `src/server/mcp-server.ts`
202
+ - [ ] Update SessionStart hook context in `plugin/scripts/ensure-electron-app.sh`
203
+ - [ ] Update skill descriptions in `plugin/skills/*/SKILL.md`
204
+ - [ ] Update key MCP tool descriptions in `tool-registry.ts` files
205
+ - [ ] Update `plugin.json` description and keywords
206
+ - [ ] Update README.md
207
+ - [ ] Sync changes to cache (`~/.claude/plugins/cache/muggle-works/muggleai/*/`)
208
+ - [ ] Test in interactive Claude Code session
209
+ - [ ] Test in Cursor session
210
+ - [ ] Update muggle-ai-docs/ to match
211
+ - [ ] Create eval set and run baseline eval
212
+ - [ ] Commit and PR
@@ -8,6 +8,7 @@
8
8
  import { createHash } from "crypto";
9
9
  import { exec } from "child_process";
10
10
  import {
11
+ cpSync,
11
12
  readFileSync,
12
13
  appendFileSync,
13
14
  createReadStream,
@@ -19,15 +20,19 @@ import {
19
20
  writeFileSync,
20
21
  } from "fs";
21
22
  import { homedir, platform } from "os";
22
- import { join } from "path";
23
+ import { dirname, join } from "path";
23
24
  import { pipeline } from "stream/promises";
24
25
  import { createRequire } from "module";
26
+ import { fileURLToPath } from "url";
25
27
 
26
28
  const require = createRequire(import.meta.url);
27
29
  const VERSION_DIRECTORY_NAME_PATTERN = /^\d+\.\d+\.\d+(?:[-+][A-Za-z0-9.-]+)?$/;
28
30
  const INSTALL_METADATA_FILE_NAME = ".install-metadata.json";
29
31
  const LOG_FILE_NAME = "postinstall.log";
30
32
  const VERSION_OVERRIDE_FILE_NAME = "electron-app-version-override.json";
33
+ const CURSOR_SKILLS_DIRECTORY_NAME = ".cursor";
34
+ const CURSOR_SKILLS_SUBDIRECTORY_NAME = "skills";
35
+ const MUGGLE_SKILL_PREFIX = "muggle";
31
36
 
32
37
  /**
33
38
  * Get the path to the postinstall log file.
@@ -111,6 +116,59 @@ function getDataDir() {
111
116
  return join(homedir(), ".muggle-ai");
112
117
  }
113
118
 
119
+ /**
120
+ * Get the package root directory.
121
+ * @returns {string} Path to package root
122
+ */
123
+ function getPackageRootDir() {
124
+ return join(dirname(fileURLToPath(import.meta.url)), "..");
125
+ }
126
+
127
+ /**
128
+ * Sync packaged muggle skills into Cursor user skills.
129
+ * This enables npm installs to refresh locally available `muggle-*` skills.
130
+ */
131
+ function syncCursorSkills() {
132
+ const sourceSkillsDirectoryPath = join(getPackageRootDir(), "plugin", "skills");
133
+ if (!existsSync(sourceSkillsDirectoryPath)) {
134
+ log("Cursor skill sync skipped: packaged plugin skills directory not found.");
135
+ return;
136
+ }
137
+
138
+ const cursorSkillsDirectoryPath = join(
139
+ homedir(),
140
+ CURSOR_SKILLS_DIRECTORY_NAME,
141
+ CURSOR_SKILLS_SUBDIRECTORY_NAME,
142
+ );
143
+ mkdirSync(cursorSkillsDirectoryPath, { recursive: true });
144
+
145
+ const skillEntries = readdirSync(sourceSkillsDirectoryPath, { withFileTypes: true });
146
+ let syncedSkillCount = 0;
147
+
148
+ for (const skillEntry of skillEntries) {
149
+ if (!skillEntry.isDirectory()) {
150
+ continue;
151
+ }
152
+
153
+ if (!skillEntry.name.startsWith(MUGGLE_SKILL_PREFIX)) {
154
+ continue;
155
+ }
156
+
157
+ const sourceSkillDirectoryPath = join(sourceSkillsDirectoryPath, skillEntry.name);
158
+ const sourceSkillFilePath = join(sourceSkillDirectoryPath, "SKILL.md");
159
+ if (!existsSync(sourceSkillFilePath)) {
160
+ continue;
161
+ }
162
+
163
+ const targetSkillDirectoryPath = join(cursorSkillsDirectoryPath, skillEntry.name);
164
+ rmSync(targetSkillDirectoryPath, { recursive: true, force: true });
165
+ cpSync(sourceSkillDirectoryPath, targetSkillDirectoryPath, { recursive: true });
166
+ syncedSkillCount += 1;
167
+ }
168
+
169
+ log(`Synced ${syncedSkillCount} muggle skill(s) to ${cursorSkillsDirectoryPath}`);
170
+ }
171
+
114
172
  /**
115
173
  * Get the Electron app directory.
116
174
  * @returns {string} Path to ~/.muggle-ai/electron-app
@@ -592,4 +650,5 @@ async function extractTarGz(tarPath, destDir) {
592
650
  // Run postinstall
593
651
  initLogFile();
594
652
  removeVersionOverrideFile();
653
+ syncCursorSkills();
595
654
  downloadElectronApp().catch(logError);
@@ -1,21 +0,0 @@
1
- ---
2
- name: upgrade
3
- description: Update Muggle AI to the latest version — Electron QA engine and MCP server.
4
- ---
5
-
6
- # Muggle AI Upgrade
7
-
8
- Update all Muggle AI components to the latest published version.
9
-
10
- ## Steps
11
-
12
- 1. Run `/muggle:status` checks to capture current versions.
13
- 2. Run `muggle setup --force` to download the latest Electron QA engine.
14
- 3. Report the upgrade results:
15
- - Previous version vs new version for each component.
16
- - Whether the upgrade succeeded or failed.
17
- 4. Run `/muggle:status` again to confirm everything is healthy after upgrade.
18
-
19
- ## Output
20
-
21
- Show a before/after version comparison. If the upgrade fails at any step, report the error and suggest running `/muggle:repair`.
@@ -1,21 +0,0 @@
1
- ---
2
- name: upgrade
3
- description: Update Muggle AI to the latest version — Electron QA engine and MCP server.
4
- ---
5
-
6
- # Muggle AI Upgrade
7
-
8
- Update all Muggle AI components to the latest published version.
9
-
10
- ## Steps
11
-
12
- 1. Run `/muggle:status` checks to capture current versions.
13
- 2. Run `muggle setup --force` to download the latest Electron QA engine.
14
- 3. Report the upgrade results:
15
- - Previous version vs new version for each component.
16
- - Whether the upgrade succeeded or failed.
17
- 4. Run `/muggle:status` again to confirm everything is healthy after upgrade.
18
-
19
- ## Output
20
-
21
- Show a before/after version comparison. If the upgrade fails at any step, report the error and suggest running `/muggle:repair`.