kushi-agents 5.0.1 → 5.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +47 -7
- package/bin/cli.mjs +73 -45
- package/package.json +55 -51
- package/plugin/agents/kushi.agent.md +1 -1
- package/plugin/instructions/multi-host-install.instructions.md +125 -0
- package/plugin/instructions/skill-evals.instructions.md +130 -0
- package/plugin/skills/aggregate-project/evals/evals.json +33 -0
- package/plugin/skills/apply-ado-update/evals/evals.json +33 -0
- package/plugin/skills/ask-project/evals/evals.json +34 -0
- package/plugin/skills/bootstrap-project/evals/evals.json +34 -0
- package/plugin/skills/build-state/evals/evals.json +31 -0
- package/plugin/skills/consolidate-evidence/evals/evals.json +33 -0
- package/plugin/skills/dashboard/evals/evals.json +33 -0
- package/plugin/skills/emit-vertex/evals/evals.json +33 -0
- package/plugin/skills/eval/SKILL.md +90 -0
- package/plugin/skills/eval/evals.schema.json +73 -0
- package/plugin/skills/eval/run-evals.ps1 +372 -0
- package/plugin/skills/fde-intake/evals/evals.json +33 -0
- package/plugin/skills/fde-report/evals/evals.json +33 -0
- package/plugin/skills/fde-triage/evals/evals.json +33 -0
- package/plugin/skills/intro/evals/evals.json +33 -0
- package/plugin/skills/link-entities/evals/evals.json +31 -0
- package/plugin/skills/project-status/evals/evals.json +33 -0
- package/plugin/skills/propose-ado-update/evals/evals.json +33 -0
- package/plugin/skills/pull-ado/evals/evals.json +35 -0
- package/plugin/skills/pull-crm/evals/evals.json +35 -0
- package/plugin/skills/pull-email/evals/evals.json +35 -0
- package/plugin/skills/pull-loop/evals/evals.json +35 -0
- package/plugin/skills/pull-meetings/evals/evals.json +35 -0
- package/plugin/skills/pull-misc/evals/evals.json +35 -0
- package/plugin/skills/pull-onenote/evals/evals.json +35 -0
- package/plugin/skills/pull-sharepoint/evals/evals.json +35 -0
- package/plugin/skills/pull-teams/evals/evals.json +35 -0
- package/plugin/skills/refresh-project/evals/evals.json +31 -0
- package/plugin/skills/self-check/SKILL.md +2 -0
- package/plugin/skills/self-check/evals/evals.json +28 -0
- package/plugin/skills/self-check/run.ps1 +174 -1
- package/plugin/skills/setup/evals/evals.json +33 -0
- package/plugin/skills/tour/evals/evals.json +33 -0
- package/plugin/skills/vertex-link/evals/evals.json +33 -0
- package/src/constants.mjs +39 -1
- package/src/eval-aggregator.mjs +209 -0
- package/src/eval-aggregator.test.mjs +64 -0
- package/src/eval-runner.test.mjs +69 -0
- package/src/multi-host-install.test.mjs +170 -0
- package/src/multi-host.mjs +277 -0
package/README.md
CHANGED
|
@@ -121,20 +121,38 @@ See [Quickstart](https://gim-home.github.io/kushi/getting-started/quickstart/) f
|
|
|
121
121
|
|
|
122
122
|
## Install
|
|
123
123
|
|
|
124
|
-
|
|
124
|
+
Kushi supports **two host surfaces** as first-class peers (v5.0.2+):
|
|
125
125
|
|
|
126
|
-
|
|
126
|
+
| Host | Install path | Best for |
|
|
127
|
+
|---------------------------------------|------------------------------------|-----------------------------------------|
|
|
128
|
+
| **Clawpilot CLI** | `~/.copilot/m-skills/kushi/` | Scheduled / overnight runs (e.g. `kushi refresh <project>` at 6 AM via automation) |
|
|
129
|
+
| **VS Code Chat** ("GitHub Copilot Chat") | `~/.vscode/chat/skills/kushi/` | Interactive use (`@kushi what's the MACC for X?`) |
|
|
130
|
+
|
|
131
|
+
Both hosts read the **same** Evidence/ tree on disk, so a refresh from one is immediately visible from the other — the same user routinely lives in both. SKILL content is host-agnostic (no per-host branching, enforced by self-check `D32.multi-host`).
|
|
127
132
|
|
|
128
133
|
```bash
|
|
129
|
-
|
|
130
|
-
npx kushi-agents --
|
|
134
|
+
# Install to a single host
|
|
135
|
+
npx kushi-agents --clawpilot # Clawpilot only
|
|
136
|
+
npx kushi-agents --vscode # VS Code Chat only
|
|
137
|
+
|
|
138
|
+
# Install to BOTH at once (auto-detects what's present + targets both)
|
|
139
|
+
npx kushi-agents --all-hosts
|
|
140
|
+
|
|
141
|
+
# Uninstall
|
|
142
|
+
npx kushi-agents --uninstall # all detected hosts
|
|
143
|
+
npx kushi-agents --uninstall --clawpilot # Clawpilot only
|
|
144
|
+
npx kushi-agents --uninstall --vscode # VS Code Chat only
|
|
145
|
+
npx kushi-agents --uninstall --all # both
|
|
146
|
+
|
|
147
|
+
# Legacy workspace install (per-project .kushi/ in cwd)
|
|
148
|
+
npx kushi-agents # default = standard profile
|
|
149
|
+
npx kushi-agents --profile core # aggregator only
|
|
131
150
|
```
|
|
132
151
|
|
|
133
|
-
|
|
152
|
+
The 2-host matrix is a deliberate cap — see [`plugin/instructions/multi-host-install.instructions.md`](plugin/instructions/multi-host-install.instructions.md) for the rationale + per-host layout details.
|
|
134
153
|
|
|
135
154
|
```bash
|
|
136
|
-
npx kushi-agents --clawpilot
|
|
137
|
-
npx kushi-agents --clawpilot --profile full
|
|
155
|
+
npx kushi-agents --clawpilot --profile full # everything
|
|
138
156
|
```
|
|
139
157
|
|
|
140
158
|
To switch profiles later, re-run with `--force` (cleanly handles downgrades):
|
|
@@ -217,6 +235,28 @@ npm pack --dry-run
|
|
|
217
235
|
|
|
218
236
|
The self-check validates frontmatter, agent inventory, prompt → skill routing, profile manifest, reference packs, cross-links, the verbs table in this README, and the layout diagram in `docs/reference/where-things-live.md`. Full reference: [docs/reference/self-check.md](docs/reference/self-check.md).
|
|
219
237
|
|
|
238
|
+
## Evaluating skills (v5.0.3+)
|
|
239
|
+
|
|
240
|
+
Every skill ships per-case evals at `plugin/skills/<name>/evals/evals.json`, aligned with the [agentskills.io evaluating-skills spec](https://agentskills.io/skill-creation/evaluating-skills). Doctrine: [`plugin/instructions/skill-evals.instructions.md`](plugin/instructions/skill-evals.instructions.md).
|
|
241
|
+
|
|
242
|
+
Quickstart:
|
|
243
|
+
|
|
244
|
+
```powershell
|
|
245
|
+
npm run eval:canary # ~6 skills, runs in seconds — what PRs run
|
|
246
|
+
npm run eval:all # full suite (every plugin/skills/<name>/)
|
|
247
|
+
npm run eval -- ask-project # one skill
|
|
248
|
+
npm run eval:baseline # maintainer-only: refresh evals/baseline.json
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
Outputs:
|
|
252
|
+
|
|
253
|
+
- `Evidence/_evals/<utc-ts>.json` — per-run JSON (pass/fail + duration + tokens per case).
|
|
254
|
+
- `Evidence/_evals/benchmark.json` — per-skill mean/stddev for `pass_rate`, `duration_ms`, `tokens_total` + regression flags vs `evals/baseline.json`.
|
|
255
|
+
|
|
256
|
+
Regressions flagged at ≥10pp pass-rate drop OR ≥50% latency/token increase. The canary subset is `ask-project`, `bootstrap-project`, `refresh-project`, `link-entities`, `build-state`, `self-check`.
|
|
257
|
+
|
|
258
|
+
**Privacy:** fixtures under `evals/fixtures/` are synthetic. NEVER copy real customer data into the evals tree.
|
|
259
|
+
|
|
220
260
|
## License
|
|
221
261
|
|
|
222
262
|
See [LICENSE](LICENSE).
|
package/bin/cli.mjs
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
|
|
3
3
|
import { main } from '../src/main.mjs';
|
|
4
|
+
import { runMultiHost } from '../src/multi-host.mjs';
|
|
4
5
|
|
|
5
6
|
const args = process.argv.slice(2);
|
|
6
7
|
|
|
@@ -10,72 +11,99 @@ if (args.includes('--help') || args.includes('-h')) {
|
|
|
10
11
|
|
|
11
12
|
Installs the Kushi multi-source project-evidence + Q&A agent.
|
|
12
13
|
|
|
13
|
-
|
|
14
|
-
--
|
|
15
|
-
--
|
|
16
|
-
--
|
|
14
|
+
Host installs (v5.0.2+ — install into a host's user-global skill folder):
|
|
15
|
+
--clawpilot Install to ~/.copilot/m-skills/kushi/
|
|
16
|
+
--vscode Install to ~/.vscode/chat/skills/kushi/ (a.k.a. GitHub Copilot Chat)
|
|
17
|
+
--all-hosts Install to BOTH hosts
|
|
18
|
+
--uninstall [--clawpilot|--vscode|--all]
|
|
19
|
+
Cleanly remove the kushi install + skills-metadata.json entry
|
|
20
|
+
from the chosen host(s). Default = all detected hosts.
|
|
21
|
+
|
|
22
|
+
Workspace install (legacy / default when no host flag is given):
|
|
23
|
+
--target vscode Install to <cwd>/.kushi/ + update .vscode/settings.json [default]
|
|
24
|
+
--target clawpilot Alias for --clawpilot (kept for back-compat)
|
|
17
25
|
|
|
18
26
|
Profile (controls what gets installed):
|
|
19
|
-
--profile core Aggregator only (pull + consolidate + ask).
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
--profile standard Core + State/ rollup (Kushi's default opinion). [DEFAULT]
|
|
23
|
-
--profile full Standard + report packs (FDE weekly/customer/handoff).
|
|
27
|
+
--profile core Aggregator only (pull + consolidate + ask).
|
|
28
|
+
--profile standard Core + State/ rollup. [DEFAULT]
|
|
29
|
+
--profile full Standard + report packs.
|
|
24
30
|
|
|
25
31
|
Options:
|
|
26
32
|
--force Overwrite existing destination without asking
|
|
27
|
-
--yes, -y Skip the project-root check
|
|
28
|
-
|
|
29
|
-
--no-
|
|
30
|
-
--no-instructions Skip .github/copilot-instructions.md merge (vscode target only)
|
|
33
|
+
--yes, -y Skip the project-root check
|
|
34
|
+
--no-settings Skip .vscode/settings.json update (vscode workspace target only)
|
|
35
|
+
--no-instructions Skip .github/copilot-instructions.md merge (vscode workspace target only)
|
|
31
36
|
|
|
32
37
|
WorkIQ (REQUIRED — Kushi cannot pull evidence without it):
|
|
33
38
|
--with-workiq Auto-install WorkIQ via winget (Windows) / brew (macOS)
|
|
34
39
|
--workiq-path <abs> Use this explicit path to the workiq binary
|
|
35
|
-
--skip-workiq-check Bypass the WorkIQ pre-flight check
|
|
36
|
-
bootstrap/refresh will block until WorkIQ is installed)
|
|
40
|
+
--skip-workiq-check Bypass the WorkIQ pre-flight check
|
|
37
41
|
|
|
38
42
|
--help, -h Show this help
|
|
39
43
|
|
|
40
44
|
After install, talk to Kushi:
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
ask <project> <q> Cited Q&A over Evidence/ (auto-routes) — all profiles
|
|
45
|
+
bootstrap <project> First-time setup
|
|
46
|
+
refresh <project> Incremental refresh + rebuild State/
|
|
47
|
+
state <project> Re-render State/ from existing Evidence
|
|
48
|
+
consolidate <project> Merge per-user evidence
|
|
49
|
+
status <project> Show run-log
|
|
50
|
+
ask <project> <q> Cited Q&A over Evidence/ (auto-routes)
|
|
48
51
|
|
|
49
52
|
In VS Code Chat the prefix is "@Kushi". In Clawpilot just say "kushi <verb>".
|
|
50
53
|
`);
|
|
51
54
|
process.exit(0);
|
|
52
55
|
}
|
|
53
56
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
57
|
+
// ── multi-host mode (v5.0.2+) ───────────────────────────────────────────────
|
|
58
|
+
// Trigger when the user passes any of: --vscode, --all-hosts, --uninstall.
|
|
59
|
+
// --clawpilot ALONE continues to route through the legacy main.mjs path so
|
|
60
|
+
// the existing target=clawpilot flow stays byte-identical.
|
|
61
|
+
const wantsVscode = args.includes('--vscode');
|
|
62
|
+
const wantsAllHosts = args.includes('--all-hosts');
|
|
63
|
+
const wantsUninstall = args.includes('--uninstall');
|
|
64
|
+
|
|
65
|
+
if (wantsVscode || wantsAllHosts || wantsUninstall) {
|
|
66
|
+
const hosts = [];
|
|
67
|
+
if (args.includes('--clawpilot')) hosts.push('clawpilot');
|
|
68
|
+
if (wantsVscode) hosts.push('vscode');
|
|
69
|
+
const all = wantsAllHosts || args.includes('--all');
|
|
70
|
+
|
|
71
|
+
runMultiHost({
|
|
72
|
+
hosts,
|
|
73
|
+
all,
|
|
74
|
+
uninstall: wantsUninstall,
|
|
75
|
+
profile: getFlag('--profile'),
|
|
76
|
+
}).catch((err) => {
|
|
77
|
+
console.error(`\n ${err.message}\n`);
|
|
58
78
|
process.exit(1);
|
|
79
|
+
});
|
|
80
|
+
} else {
|
|
81
|
+
let target = getFlag('--target');
|
|
82
|
+
if (args.includes('--clawpilot')) {
|
|
83
|
+
if (target && target !== 'clawpilot') {
|
|
84
|
+
console.error(`\n Conflicting flags: --target ${target} and --clawpilot.\n`);
|
|
85
|
+
process.exit(1);
|
|
86
|
+
}
|
|
87
|
+
target = 'clawpilot';
|
|
59
88
|
}
|
|
60
|
-
target = 'clawpilot';
|
|
61
|
-
}
|
|
62
89
|
|
|
63
|
-
const options = {
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
};
|
|
74
|
-
|
|
75
|
-
main(options).catch((err) => {
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
});
|
|
90
|
+
const options = {
|
|
91
|
+
force: args.includes('--force'),
|
|
92
|
+
yes: args.includes('--yes') || args.includes('-y'),
|
|
93
|
+
noSettings: args.includes('--no-settings'),
|
|
94
|
+
noInstructions: args.includes('--no-instructions'),
|
|
95
|
+
target,
|
|
96
|
+
profile: getFlag('--profile'),
|
|
97
|
+
withWorkiq: args.includes('--with-workiq'),
|
|
98
|
+
workiqPath: getFlag('--workiq-path'),
|
|
99
|
+
skipWorkiqCheck: args.includes('--skip-workiq-check'),
|
|
100
|
+
};
|
|
101
|
+
|
|
102
|
+
main(options).catch((err) => {
|
|
103
|
+
console.error(`\n ${err.message}\n`);
|
|
104
|
+
process.exit(1);
|
|
105
|
+
});
|
|
106
|
+
}
|
|
79
107
|
|
|
80
108
|
function getFlag(flag) {
|
|
81
109
|
const idx = args.indexOf(flag);
|
|
@@ -85,4 +113,4 @@ function getFlag(flag) {
|
|
|
85
113
|
const prefix = flag + '=';
|
|
86
114
|
const match = args.find((a) => a.startsWith(prefix));
|
|
87
115
|
return match ? match.slice(prefix.length) : undefined;
|
|
88
|
-
}
|
|
116
|
+
}
|
package/package.json
CHANGED
|
@@ -1,52 +1,56 @@
|
|
|
1
|
-
{
|
|
2
|
-
"name": "kushi-agents",
|
|
3
|
-
"version": "5.0.
|
|
4
|
-
"description": "Install Kushi — multi-source project evidence agent with Comprehensive Structured Capture (CSC) into weekly-only files across Email, Teams, OneNote, Loop, SharePoint, Meetings, CRM, ADO. Meetings retain a sibling verbatim/ audit folder. WorkIQ-only for M365 sources (Graph / m365_* FORBIDDEN as fallbacks; user-paste is first-class). Host-agnostic.",
|
|
5
|
-
"type": "module",
|
|
6
|
-
"bin": {
|
|
7
|
-
"kushi-agents": "./bin/cli.mjs"
|
|
8
|
-
},
|
|
9
|
-
"files": [
|
|
10
|
-
"bin/",
|
|
11
|
-
"src/",
|
|
12
|
-
"plugin/",
|
|
13
|
-
".github/copilot-instructions.kushi.md"
|
|
14
|
-
],
|
|
15
|
-
"engines": {
|
|
16
|
-
"node": ">=18.0.0"
|
|
17
|
-
},
|
|
18
|
-
"dependencies": {
|
|
19
|
-
"@mozilla/readability": "^0.6.0",
|
|
20
|
-
"jsdom": "^29.1.1",
|
|
21
|
-
"jsonc-parser": "^3.3.1"
|
|
22
|
-
},
|
|
23
|
-
"keywords": [
|
|
24
|
-
"vscode",
|
|
25
|
-
"copilot",
|
|
26
|
-
"agents",
|
|
27
|
-
"kushi",
|
|
28
|
-
"project-evidence",
|
|
29
|
-
"workiq",
|
|
30
|
-
"m365",
|
|
31
|
-
"ai",
|
|
32
|
-
"cli"
|
|
33
|
-
],
|
|
34
|
-
"repository": {
|
|
35
|
-
"type": "git",
|
|
36
|
-
"url": "git+https://github.com/gim-home/kushi.git"
|
|
37
|
-
},
|
|
38
|
-
"homepage": "https://gim-home.github.io/kushi/",
|
|
39
|
-
"bugs": {
|
|
40
|
-
"url": "https://github.com/gim-home/kushi/issues"
|
|
41
|
-
},
|
|
42
|
-
"license": "MIT",
|
|
43
|
-
"scripts": {
|
|
44
|
-
"test": "node --test src/check-workiq.test.mjs src/seed-config.test.mjs src/sanitize-workiq-input.test.mjs src/detect-vertex-repo.test.mjs src/vertex-validate.test.mjs src/emit-vertex.e2e.test.mjs src/config-root-resolve.test.mjs src/forbidden-workiq-phrasings.test.mjs",
|
|
45
|
-
"test:integration:bootstrap": "node src/bootstrap-dryrun.integration.test.mjs",
|
|
46
|
-
"smoke": "node scripts/smoke.mjs",
|
|
47
|
-
"
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
"
|
|
51
|
-
|
|
1
|
+
{
|
|
2
|
+
"name": "kushi-agents",
|
|
3
|
+
"version": "5.0.3",
|
|
4
|
+
"description": "Install Kushi — multi-source project evidence agent with Comprehensive Structured Capture (CSC) into weekly-only files across Email, Teams, OneNote, Loop, SharePoint, Meetings, CRM, ADO. Meetings retain a sibling verbatim/ audit folder. WorkIQ-only for M365 sources (Graph / m365_* FORBIDDEN as fallbacks; user-paste is first-class). Host-agnostic.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"bin": {
|
|
7
|
+
"kushi-agents": "./bin/cli.mjs"
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
"bin/",
|
|
11
|
+
"src/",
|
|
12
|
+
"plugin/",
|
|
13
|
+
".github/copilot-instructions.kushi.md"
|
|
14
|
+
],
|
|
15
|
+
"engines": {
|
|
16
|
+
"node": ">=18.0.0"
|
|
17
|
+
},
|
|
18
|
+
"dependencies": {
|
|
19
|
+
"@mozilla/readability": "^0.6.0",
|
|
20
|
+
"jsdom": "^29.1.1",
|
|
21
|
+
"jsonc-parser": "^3.3.1"
|
|
22
|
+
},
|
|
23
|
+
"keywords": [
|
|
24
|
+
"vscode",
|
|
25
|
+
"copilot",
|
|
26
|
+
"agents",
|
|
27
|
+
"kushi",
|
|
28
|
+
"project-evidence",
|
|
29
|
+
"workiq",
|
|
30
|
+
"m365",
|
|
31
|
+
"ai",
|
|
32
|
+
"cli"
|
|
33
|
+
],
|
|
34
|
+
"repository": {
|
|
35
|
+
"type": "git",
|
|
36
|
+
"url": "git+https://github.com/gim-home/kushi.git"
|
|
37
|
+
},
|
|
38
|
+
"homepage": "https://gim-home.github.io/kushi/",
|
|
39
|
+
"bugs": {
|
|
40
|
+
"url": "https://github.com/gim-home/kushi/issues"
|
|
41
|
+
},
|
|
42
|
+
"license": "MIT",
|
|
43
|
+
"scripts": {
|
|
44
|
+
"test": "node --test src/check-workiq.test.mjs src/seed-config.test.mjs src/sanitize-workiq-input.test.mjs src/detect-vertex-repo.test.mjs src/vertex-validate.test.mjs src/emit-vertex.e2e.test.mjs src/config-root-resolve.test.mjs src/forbidden-workiq-phrasings.test.mjs src/multi-host-install.test.mjs src/eval-aggregator.test.mjs src/eval-runner.test.mjs",
|
|
45
|
+
"test:integration:bootstrap": "node src/bootstrap-dryrun.integration.test.mjs",
|
|
46
|
+
"smoke": "node scripts/smoke.mjs",
|
|
47
|
+
"eval": "pwsh plugin/skills/eval/run-evals.ps1 -Skill",
|
|
48
|
+
"eval:all": "pwsh plugin/skills/eval/run-evals.ps1 -All",
|
|
49
|
+
"eval:canary": "pwsh plugin/skills/eval/run-evals.ps1 -Canary",
|
|
50
|
+
"eval:baseline": "pwsh plugin/skills/eval/run-evals.ps1 -All -UpdateBaseline",
|
|
51
|
+
"prepublishOnly": "npm test && npm run smoke"
|
|
52
|
+
},
|
|
53
|
+
"publishConfig": {
|
|
54
|
+
"access": "public"
|
|
55
|
+
}
|
|
52
56
|
}
|
|
@@ -16,7 +16,7 @@ Kushi ships in three profiles. The installed profile is recorded in `kushi-insta
|
|
|
16
16
|
|
|
17
17
|
| Profile | What's installed | Verbs available |
|
|
18
18
|
|---|---|---|
|
|
19
|
-
| `core` | Aggregator only: `setup`, `pull-*`, `consolidate-evidence`, `aggregate-project`, `ask-project`, `project-status`, `vertex-link`, `emit-vertex`, `self-check`, `intro` | `setup`, `aggregate`, `consolidate`, `status`, `pull`, `ask`, `vertex-link`, `emit-vertex` |
|
|
19
|
+
| `core` | Aggregator only: `setup`, `pull-*`, `consolidate-evidence`, `aggregate-project`, `ask-project`, `project-status`, `vertex-link`, `emit-vertex`, `self-check`, `eval`, `intro` | `setup`, `aggregate`, `consolidate`, `status`, `pull`, `ask`, `vertex-link`, `emit-vertex` |
|
|
20
20
|
| `standard` *(default)* | core + `bootstrap-project`, `refresh-project`, `fde-intake`, `fde-report`, `fde-triage` + FDE reference pack | core + `bootstrap`, `refresh`, `fde-intake`, `fde-report`, `fde-triage` |
|
|
21
21
|
| `full` | standard + `build-state` | standard + `state` |
|
|
22
22
|
| **`preview`** *(opt-in)* | standard + `propose-ado-update`, `apply-ado-update` | standard + `propose-ado`, `apply-ado` |
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "multi-host-install"
|
|
3
|
+
version: "1.0.0"
|
|
4
|
+
description: "USE WHEN installing kushi to a user-global host (Clawpilot or VS Code Chat), uninstalling, or wondering why kushi ships to two hosts (not three or N) and how the layouts differ. DO NOT USE for workspace-local .kushi/ install — that is the legacy --target vscode flow."
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Multi-host install doctrine (v5.0.2+)
|
|
8
|
+
|
|
9
|
+
Kushi ships as a single host-agnostic skill bundle, installed into a host's
|
|
10
|
+
user-global skill folder by `npx kushi-agents`. **Exactly two hosts are
|
|
11
|
+
supported.** No third host. No N-host generality.
|
|
12
|
+
|
|
13
|
+
## The 2-host matrix
|
|
14
|
+
|
|
15
|
+
| Host ID | Display name | Skill dir | Metadata file |
|
|
16
|
+
|--------------|-----------------------------------------|------------------------------------|----------------------------------------------|
|
|
17
|
+
| `clawpilot` | Clawpilot CLI | `~/.copilot/m-skills/kushi/` | `~/.copilot/m-skills/skills-metadata.json` |
|
|
18
|
+
| `vscode` | VS Code Chat ("GitHub Copilot Chat") | `~/.vscode/chat/skills/kushi/` | `~/.vscode/chat/skills/skills-metadata.json` |
|
|
19
|
+
|
|
20
|
+
> The VS Code Chat path is the canonical user-global skill folder location
|
|
21
|
+
> assumed by this doctrine. If a future VS Code Chat release ships a
|
|
22
|
+
> different canonical path, update `VSCODE_CHAT_DEST_SUBPATH` in
|
|
23
|
+
> `src/constants.mjs` — every other surface reads from there.
|
|
24
|
+
|
|
25
|
+
## Why only these two
|
|
26
|
+
|
|
27
|
+
- **Clawpilot** — primary scheduled / overnight surface (e.g. `kushi refresh <project>`
|
|
28
|
+
run from a Clawpilot automation at 6 AM).
|
|
29
|
+
- **VS Code Chat** — primary interactive surface (`@kushi what's the MACC for X?`
|
|
30
|
+
asked the next morning).
|
|
31
|
+
|
|
32
|
+
The same user routinely lives in both — automation in Clawpilot, follow-up
|
|
33
|
+
questions in VS Code Chat. Both hosts read the **same** Evidence/ tree on disk,
|
|
34
|
+
so a refresh in one is visible from the other immediately.
|
|
35
|
+
|
|
36
|
+
A third host (e.g. a hypothetical web UI) would force per-host SKILL surgery
|
|
37
|
+
or a content-branching shim. The 2-host matrix is the deliberate stopping
|
|
38
|
+
point — it covers ≥ 99 % of the actual usage pattern and keeps cross-host
|
|
39
|
+
parity trivially enforceable.
|
|
40
|
+
|
|
41
|
+
## Per-host layout
|
|
42
|
+
|
|
43
|
+
Both hosts get **byte-identical** content under their skill dir:
|
|
44
|
+
|
|
45
|
+
```text
|
|
46
|
+
<host-skill-dir>/
|
|
47
|
+
├── SKILL.md (mirrored from agents/kushi.agent.md)
|
|
48
|
+
├── agents/kushi.agent.md
|
|
49
|
+
├── instructions/<name>.instructions.md
|
|
50
|
+
├── prompts/<name>.prompt.md
|
|
51
|
+
├── skills/<name>/SKILL.md (+ references/, run.ps1, etc.)
|
|
52
|
+
├── templates/<...>
|
|
53
|
+
├── lib/<...>
|
|
54
|
+
├── reference-packs/<...>
|
|
55
|
+
├── config/
|
|
56
|
+
│ ├── shared/ (team-owned, safe to commit)
|
|
57
|
+
│ └── user/ (per-contributor, gitignored)
|
|
58
|
+
└── kushi-install.json (profile manifest)
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
And at the **parent** dir, `skills-metadata.json` carries the host's skill
|
|
62
|
+
registry. The installer upserts a single `{"name": "kushi", ...}` entry that
|
|
63
|
+
points `instructions` at the host's own `SKILL.md` (absolute path).
|
|
64
|
+
|
|
65
|
+
There is **no per-host content branching**. A SKILL.md that opens with `WHEN
|
|
66
|
+
running on Clawpilot, do X; WHEN running on VS Code Chat, do Y` is a defect.
|
|
67
|
+
|
|
68
|
+
## Detection rules
|
|
69
|
+
|
|
70
|
+
A host is **detected** when its parent dir exists on the file system:
|
|
71
|
+
|
|
72
|
+
| Host | Detection probe |
|
|
73
|
+
|------------|---------------------------------------|
|
|
74
|
+
| clawpilot | `~/.copilot/m-skills/` exists |
|
|
75
|
+
| vscode | `~/.vscode/chat/skills/` exists |
|
|
76
|
+
|
|
77
|
+
`bin/cli.mjs` runs detection before any install/uninstall and prints
|
|
78
|
+
`Detected hosts: …` so the user sees which targets will be touched by the
|
|
79
|
+
default behavior.
|
|
80
|
+
|
|
81
|
+
## Install / uninstall flags
|
|
82
|
+
|
|
83
|
+
| Flag | Effect |
|
|
84
|
+
|--------------------------------------|--------|
|
|
85
|
+
| `--clawpilot` | Install to Clawpilot host (alone: also legacy `--target clawpilot` path) |
|
|
86
|
+
| `--vscode` | Install to VS Code Chat host |
|
|
87
|
+
| `--all-hosts` | Install to BOTH hosts (regardless of detection) |
|
|
88
|
+
| `--uninstall` | Uninstall from all *detected* hosts |
|
|
89
|
+
| `--uninstall --clawpilot` | Uninstall from Clawpilot only |
|
|
90
|
+
| `--uninstall --vscode` | Uninstall from VS Code Chat only |
|
|
91
|
+
| `--uninstall --all` | Uninstall from BOTH hosts |
|
|
92
|
+
| (no flag) | Workspace install (legacy `.kushi/` in cwd) |
|
|
93
|
+
|
|
94
|
+
## `kushi refresh <project>` is host-agnostic
|
|
95
|
+
|
|
96
|
+
`refresh` is just a SKILL — the same `plugin/skills/refresh-project/SKILL.md`
|
|
97
|
+
content runs under both hosts. It writes Evidence to disk at the user-
|
|
98
|
+
configured engagement root (`~/Documents/Engagements/<project>/` by default),
|
|
99
|
+
which lives **outside** the host skill dir. That is why a Clawpilot-driven
|
|
100
|
+
refresh is immediately visible to a VS Code Chat `ask` — neither host owns
|
|
101
|
+
the data.
|
|
102
|
+
|
|
103
|
+
## Cross-host parity rule
|
|
104
|
+
|
|
105
|
+
Every SKILL.md, prompt, instruction, template, and lib asset MUST work
|
|
106
|
+
identically under both hosts. Concretely:
|
|
107
|
+
|
|
108
|
+
- No `if (HOST === 'clawpilot') ...` style branching in any markdown body.
|
|
109
|
+
- No host-specific paths in skill content (always use `<engagement-root>` /
|
|
110
|
+
`<project>` placeholders that resolve at runtime via the standard
|
|
111
|
+
`engagement-root-resolution.instructions.md` chain).
|
|
112
|
+
- The two `SKILL.md` files (one per host) are required to be **byte-identical**
|
|
113
|
+
— enforced by `self-check` `D32.multi-host` in deep mode (temp-dir dry-run).
|
|
114
|
+
|
|
115
|
+
If a future feature genuinely requires host-specific behavior (e.g. UI affordance
|
|
116
|
+
only one host provides), it belongs in a host-specific *helper* outside `plugin/`
|
|
117
|
+
— not inside the shared skill bundle.
|
|
118
|
+
|
|
119
|
+
## See also
|
|
120
|
+
|
|
121
|
+
- `host-portability.instructions.md` — older doctrine on per-host portability of
|
|
122
|
+
individual skill calls (still in force for runtime tool selection).
|
|
123
|
+
- `D32.multi-host` self-check (deep) — validates installer + temp-dir layout.
|
|
124
|
+
- `src/multi-host.mjs` — the installer/uninstaller implementation.
|
|
125
|
+
- `src/multi-host-install.test.mjs` — node:test coverage.
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "v5.0.3 — Skill evals doctrine, adapted from https://agentskills.io/skill-creation/evaluating-skills. Every skill MUST ship an evals/ folder with at least 2 deterministic cases plus structured assertions; a per-skill pass-rate is the objective regression signal. Canary subset runs on every PR; full suite runs on demand. Real customer data is FORBIDDEN in fixtures — use synthetic data only."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Skill evals — doctrine
|
|
6
|
+
|
|
7
|
+
> Inspired by **<https://agentskills.io/skill-creation/evaluating-skills>**. Adapted to kushi's PowerShell + Node test stack and to our 2-host install matrix.
|
|
8
|
+
|
|
9
|
+
## Why
|
|
10
|
+
|
|
11
|
+
Skills are prompts plus a runner. Prompts drift silently. Without an objective per-skill regression signal, every change is a gamble. Evals make that signal cheap:
|
|
12
|
+
|
|
13
|
+
- **Per-skill pass-rate** is the headline metric.
|
|
14
|
+
- **Latency** and **tokens** are secondary metrics (regressions ≥50% latency / ≥10pp pass-rate flag a baseline failure).
|
|
15
|
+
- A **canary subset** runs on every PR (target: < 60s wall clock); the **full suite** runs on demand (`npm run eval:all`).
|
|
16
|
+
|
|
17
|
+
## Where evals live
|
|
18
|
+
|
|
19
|
+
```text
|
|
20
|
+
plugin/skills/<name>/
|
|
21
|
+
├── SKILL.md
|
|
22
|
+
└── evals/
|
|
23
|
+
├── evals.json ← REQUIRED — case list + assertions
|
|
24
|
+
└── fixtures/ ← OPTIONAL per-skill fixtures
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Cross-skill fixtures live at the repo root:
|
|
28
|
+
|
|
29
|
+
```text
|
|
30
|
+
evals/
|
|
31
|
+
├── baseline.json ← Committed; maintainer updates with `npm run eval:baseline`
|
|
32
|
+
└── fixtures/ ← Tiny synthetic evidence trees, ADO fixtures, etc.
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Per-run output goes to `Evidence/_evals/<timestamp>.json` (gitignored; not customer data).
|
|
36
|
+
|
|
37
|
+
## Case schema
|
|
38
|
+
|
|
39
|
+
```jsonc
|
|
40
|
+
{
|
|
41
|
+
"skill": "<skill-name>",
|
|
42
|
+
"cases": [
|
|
43
|
+
{
|
|
44
|
+
"id": "ap-citations-format",
|
|
45
|
+
"name": "ask-project emits weekly-csc citation form",
|
|
46
|
+
"input": "what was decided about MACC for fixture-acme?",
|
|
47
|
+
"fixture": "evals/fixtures/fixture-acme", // optional
|
|
48
|
+
"canary": true,
|
|
49
|
+
"grader_type": "script", // "script" | "llm"
|
|
50
|
+
"expected_assertions": [
|
|
51
|
+
{ "type": "regex-match", "pattern": "\\[source:\\s*fixture-acme/email/weekly/" },
|
|
52
|
+
{ "type": "regex-match", "pattern": "Source-layout:\\s*weekly-csc" }
|
|
53
|
+
]
|
|
54
|
+
}
|
|
55
|
+
]
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Required fields per case
|
|
60
|
+
|
|
61
|
+
- `id` — unique within the skill; kebab-case.
|
|
62
|
+
- `name` — human-readable.
|
|
63
|
+
- `input` — what gets passed to the skill (string OR object).
|
|
64
|
+
- `expected_assertions` — array, **≥ 1** entry (enforced by `D33.evals-have-assertions`).
|
|
65
|
+
- `grader_type` — `"script"` for deterministic graders, `"llm"` for rubric-based.
|
|
66
|
+
|
|
67
|
+
### Optional fields
|
|
68
|
+
|
|
69
|
+
- `fixture` — repo-relative path to the fixture to point the skill at.
|
|
70
|
+
- `canary` — `true` to include in the fast CI subset.
|
|
71
|
+
- `args` — extra args forwarded to the skill script (e.g. `{ "DryRun": true }`).
|
|
72
|
+
- `skip` — `true` to skip (must include `skip_reason`).
|
|
73
|
+
- `timeout_ms` — override the runner default (30 000 ms).
|
|
74
|
+
|
|
75
|
+
## Assertion types
|
|
76
|
+
|
|
77
|
+
| Type | Shape | Passes when |
|
|
78
|
+
|---|---|---|
|
|
79
|
+
| `file-exists` | `{ "type": "file-exists", "path": "..." }` | Path exists post-run (relative to fixture or evidence dir). |
|
|
80
|
+
| `file-contains` | `{ "type": "file-contains", "path": "...", "needle": "..." }` | File exists and substring is present. |
|
|
81
|
+
| `json-path-equals` | `{ "type": "json-path-equals", "path": "...", "json_path": "$.foo.bar", "equals": "v" }` | JSON file parses; dotted path value === expected. |
|
|
82
|
+
| `regex-match` | `{ "type": "regex-match", "pattern": "...", "flags": "i" }` | Captured stdout matches the regex. |
|
|
83
|
+
| `llm-rubric` | `{ "type": "llm-rubric", "rubric": "...", "min_score": 4 }` | LLM grader scores ≥ min on a 1–5 rubric. |
|
|
84
|
+
|
|
85
|
+
## Run modes
|
|
86
|
+
|
|
87
|
+
The runner (`plugin/skills/eval/run-evals.ps1`) supports three dispatch modes:
|
|
88
|
+
|
|
89
|
+
1. **Direct invocation** (default for `script` graders). Runs the skill's executable artifact (`run.ps1`, `*.mjs`, or a small probe stub) with the given input and fixture. Pure deterministic.
|
|
90
|
+
2. **Sub-agent dispatch** (optional, gated by `-Live`). Forwards the case to a sub-agent. Used only for `llm-rubric` cases. Skipped in canary mode.
|
|
91
|
+
3. **Recorded fixture replay** (for `pull-*` skills). Reads a recorded `--cached` output of a real pull and asserts against that, so no live M365 calls are needed.
|
|
92
|
+
|
|
93
|
+
For each case the runner records: `pass`, `duration_ms`, `tokens_in`, `tokens_out`, `stdout`, `stderr`, per-assertion `pass`/`reason`. The aggregate is a JSON file under `Evidence/_evals/` plus a one-line `benchmark.json` summary.
|
|
94
|
+
|
|
95
|
+
## Canary set
|
|
96
|
+
|
|
97
|
+
Marked with `"canary": true`. Kept tiny so PRs stay fast.
|
|
98
|
+
|
|
99
|
+
Default canary set (v5.0.3):
|
|
100
|
+
|
|
101
|
+
- `ask-project`
|
|
102
|
+
- `bootstrap-project`
|
|
103
|
+
- `refresh-project`
|
|
104
|
+
- `link-entities`
|
|
105
|
+
- `build-state`
|
|
106
|
+
- `self-check`
|
|
107
|
+
|
|
108
|
+
## Baseline + regression detection
|
|
109
|
+
|
|
110
|
+
- `evals/baseline.json` is **committed**.
|
|
111
|
+
- Each per-skill record carries the last green `pass_rate`, `mean_duration_ms`, and `mean_tokens_total`.
|
|
112
|
+
- `src/eval-aggregator.mjs` flags **regressions**:
|
|
113
|
+
- `pass_rate` drop ≥ 10 percentage points
|
|
114
|
+
- `mean_duration_ms` increase ≥ 50 %
|
|
115
|
+
- `mean_tokens_total` increase ≥ 50 %
|
|
116
|
+
- Maintainers refresh the baseline with `npm run eval:baseline` after deliberate behavior changes.
|
|
117
|
+
|
|
118
|
+
## Privacy + safety
|
|
119
|
+
|
|
120
|
+
- **No real customer data** in any fixture. Use `fixture-acme`-style synthetic names.
|
|
121
|
+
- `Evidence/_evals/` is in `.gitignore`.
|
|
122
|
+
- `pull-*` evals NEVER hit live M365 endpoints in canary mode. Use recorded `--cached` payloads or `--dry-run`.
|
|
123
|
+
- Tenant IDs / GUIDs in fixtures must be obviously fake (e.g. `00000000-...`).
|
|
124
|
+
|
|
125
|
+
## References
|
|
126
|
+
|
|
127
|
+
- [agentskills.io — evaluating skills](https://agentskills.io/skill-creation/evaluating-skills) (source of truth)
|
|
128
|
+
- `plugin/skills/eval/SKILL.md` (the runner skill)
|
|
129
|
+
- `plugin/skills/eval/evals.schema.json` (JSON schema; self-check D33.evals-schema)
|
|
130
|
+
- `plugin/instructions/agentskills-compliance.instructions.md` (sibling doctrine — size + section caps)
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
{
|
|
2
|
+
"skill": "aggregate-project",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Auto-seeded evals for aggregate-project. Replace with real cases as the skill matures.",
|
|
5
|
+
"cases": [
|
|
6
|
+
{
|
|
7
|
+
"id": "aggregate-project-smoke-1",
|
|
8
|
+
"name": "aggregate-project produces a non-empty response",
|
|
9
|
+
"input": "synthetic aggregate-project probe — canary smoke",
|
|
10
|
+
"canary": false,
|
|
11
|
+
"grader_type": "script",
|
|
12
|
+
"expected_assertions": [
|
|
13
|
+
{
|
|
14
|
+
"type": "regex-match",
|
|
15
|
+
"pattern": ".+"
|
|
16
|
+
}
|
|
17
|
+
]
|
|
18
|
+
},
|
|
19
|
+
{
|
|
20
|
+
"id": "aggregate-project-smoke-2",
|
|
21
|
+
"name": "aggregate-project echoes case id",
|
|
22
|
+
"input": "case-id aggregate-project-smoke-2",
|
|
23
|
+
"canary": false,
|
|
24
|
+
"grader_type": "script",
|
|
25
|
+
"expected_assertions": [
|
|
26
|
+
{
|
|
27
|
+
"type": "regex-match",
|
|
28
|
+
"pattern": "aggregate-project-smoke-2"
|
|
29
|
+
}
|
|
30
|
+
]
|
|
31
|
+
}
|
|
32
|
+
]
|
|
33
|
+
}
|