kushi-agents 5.0.0 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/README.md +30 -7
  2. package/bin/cli.mjs +73 -45
  3. package/package.json +51 -51
  4. package/plugin/instructions/agentskills-compliance.instructions.md +144 -0
  5. package/plugin/instructions/multi-host-install.instructions.md +125 -0
  6. package/plugin/instructions/plan-validate-execute.instructions.md +75 -0
  7. package/plugin/instructions/release-genealogy.instructions.md +52 -0
  8. package/plugin/skills/aggregate-project/SKILL.md +11 -2
  9. package/plugin/skills/apply-ado-update/SKILL.md +11 -2
  10. package/plugin/skills/ask-project/SKILL.md +1 -1
  11. package/plugin/skills/bootstrap-project/SKILL.md +39 -127
  12. package/plugin/skills/bootstrap-project/references/discovery-sweep.md +40 -0
  13. package/plugin/skills/bootstrap-project/references/pull-dispatch.md +50 -0
  14. package/plugin/skills/bootstrap-project/references/registry-persistence.md +55 -0
  15. package/plugin/skills/build-state/SKILL.md +50 -2
  16. package/plugin/skills/consolidate-evidence/SKILL.md +11 -2
  17. package/plugin/skills/dashboard/SKILL.md +20 -1
  18. package/plugin/skills/emit-vertex/SKILL.md +10 -1
  19. package/plugin/skills/fde-intake/SKILL.md +10 -1
  20. package/plugin/skills/fde-report/SKILL.md +10 -1
  21. package/plugin/skills/fde-triage/SKILL.md +10 -1
  22. package/plugin/skills/intro/SKILL.md +1 -1
  23. package/plugin/skills/link-entities/SKILL.md +43 -1
  24. package/plugin/skills/project-status/SKILL.md +1 -1
  25. package/plugin/skills/propose-ado-update/SKILL.md +11 -2
  26. package/plugin/skills/pull-ado/SKILL.md +26 -9
  27. package/plugin/skills/pull-crm/SKILL.md +39 -125
  28. package/plugin/skills/pull-crm/references/dataverse-doctrine.md +108 -0
  29. package/plugin/skills/pull-crm/references/legacy-shape.md +28 -0
  30. package/plugin/skills/pull-email/SKILL.md +33 -79
  31. package/plugin/skills/pull-email/references/retrieval-order.md +43 -0
  32. package/plugin/skills/pull-email/references/two-pass-pull.md +41 -0
  33. package/plugin/skills/pull-loop/SKILL.md +194 -177
  34. package/plugin/skills/pull-meetings/SKILL.md +35 -72
  35. package/plugin/skills/pull-meetings/references/legacy-stream.md +15 -0
  36. package/plugin/skills/pull-meetings/references/verbatim-capture.md +61 -0
  37. package/plugin/skills/pull-misc/SKILL.md +24 -7
  38. package/plugin/skills/pull-onenote/SKILL.md +207 -555
  39. package/plugin/skills/pull-onenote/references/playwright-fallback.md +111 -0
  40. package/plugin/skills/pull-onenote/references/preflight.md +85 -0
  41. package/plugin/skills/pull-onenote/references/runtime-contract.md +118 -0
  42. package/plugin/skills/pull-sharepoint/SKILL.md +26 -9
  43. package/plugin/skills/pull-teams/SKILL.md +26 -9
  44. package/plugin/skills/refresh-project/SKILL.md +24 -2
  45. package/plugin/skills/self-check/SKILL.md +9 -1
  46. package/plugin/skills/self-check/run.ps1 +216 -4
  47. package/plugin/skills/setup/SKILL.md +14 -120
  48. package/plugin/skills/setup/references/onedrive-pin-sync.md +60 -0
  49. package/plugin/skills/setup/references/recovery-prompts.md +81 -0
  50. package/plugin/skills/tour/SKILL.md +18 -1
  51. package/plugin/skills/vertex-link/SKILL.md +1 -1
  52. package/src/constants.mjs +39 -1
  53. package/src/multi-host-install.test.mjs +170 -0
  54. package/src/multi-host.mjs +277 -0
@@ -1,560 +1,212 @@
1
- ---
2
- name: "pull-onenote"
3
- version: "3.0.0"
4
- description: "v3.0.0 (kushi v4.9.0): Pull OneNote evidence as Comprehensive Structured Capture (CSC) blocks written to weekly/YYYY-MM-DD_onenote-csc.md. One block per page touched that week, upserted in _index/entities.yml. WorkIQ natural-language CSC prompts are the PRIMARY path (per workiq-onenote-query-shape.instructions.md). Playwright remains OPT-IN recovery-only fallback; its output now feeds CSC bullets (no snapshot/pages/<title>.md verbatim files). Per-page retry registry now lives as `low_signal`/`status` in `_index/entities.yml`. No snapshot/+stream/ split. Per-contributor _index/. See weekly-csc + comprehensive-structured-capture doctrines."
5
- ---
6
-
7
- # Skill: pull-onenote
8
-
9
- > **v3.0.0 contract change (kushi v4.9.0, 2026-05-26):** Output shape is now **weekly CSC** (`weekly/<YYYY-MM-DD>_onenote-csc.md` + `_index/entities.yml`). WorkIQ is still PRIMARY, Playwright is still OPT-IN recovery-only — but their outputs feed CSC bullets, not verbatim `snapshot/pages/<title>.md` files. The per-page retry registry now lives as `status` / `low_signal` rows in `_index/entities.yml`. See `weekly-csc.instructions.md` and `comprehensive-structured-capture.instructions.md`.
10
-
11
- > **v2.7.0 contract change (kushi v4.7.3, 2026-05-26):** Playwright is **demoted from PRIMARY to opt-in recovery-only fallback**. WorkIQ natural-language queries by display name are now the primary path. See `..\..\instructions\workiq-onenote-query-shape.instructions.md` for the working query shapes and the empirical record. The v3.8.0 pivot to Playwright-primary was driven by HCA body-retrieval rate (1/18 on 2026-05-14) — but per-page retry registries + multi-refresh accumulation close that gap without forcing every contributor onto Edge + Conditional Access + bootstrap-profile complexity.
12
-
13
- > **v4.9.0 contracts** — This skill operates under these HARD-rule doctrines:
14
- > - **`comprehensive-structured-capture.instructions.md`** — CSC block shape (canonical sections, bullets only, no prose).
15
- > - **`weekly-csc.instructions.md`** — weekly/ + _index/ writer contract; multi-user safety.
16
- > - `workiq-onenote-query-shape.instructions.md` — **PRIMARY**. The only working WorkIQ phrasings for OneNote. CSC canonical prompts adapt these query shapes per `workiq-only.instructions.md` § "CSC canonical prompts (kushi v4.9.0+)".
17
- > - `m365-id-registry.instructions.md` — discover-once / consume-deterministically (section IDs, dual page-id schema when both observed).
18
- > - `evidence-thoroughness.instructions.md` (v2.0.0) — per-source minimum bullet thresholds; `low_signal: true` flag.
19
- > - `citation-ledger.instructions.md` — citation format.
20
- > - `capture-learnings.instructions.md` — every fix/discovery is logged to `plugin/learnings/<source>.md`.
21
- > - `cleanup-on-resolution.instructions.md` — stale unavailable-markers are upgraded in the same turn.
22
- > - `run-reports.instructions.md` — every refresh writes a per-user report under `Evidence/<alias>/refresh-reports/`.
23
- > - `evidence-layout-canonical.instructions.md` — `weekly/` + `_index/` are canonical in v4.9.0+.
24
- > - ~~`verbatim-by-default.instructions.md`~~ (LEGACY — superseded by CSC).
25
- > - ~~`snapshot-vs-stream.instructions.md`~~ (LEGACY — superseded by weekly-csc).
26
-
27
- ## v4.9.0 — Comprehensive Structured Capture (CSC) + weekly/ layout (HARD RULE; supersedes all snapshot/+stream/ guidance below)
28
-
29
- Per `comprehensive-structured-capture.instructions.md` and `weekly-csc.instructions.md`:
30
-
31
- - **Output target** (single file per source per ISO week):
32
- `<engagement-root>/<project>/Evidence/<alias>/onenote/weekly/<YYYY-MM-DD>_onenote-csc.md`
33
- where `<YYYY-MM-DD>` is the **Monday** of the ISO week the page was touched.
34
- Empty weeks produce no file.
35
-
36
- - **Block shape (per page touched that week)**: one CSC block under `## <Page title> {#<entity-anchor>}` with canonical section order: Source basis → Coverage window → Last touched → Participants → Topics Discussed → Decisions → Dates & Numbers Shared → Action Items → Next Steps → Open Questions → Risks/Blockers/Dependencies → Customer Asks → Artifacts/Links → Coverage Notes. (Q&A and Who Said What omitted per CSC per-source applicability.) Bullets only — no prose paragraphs.
37
-
38
- - **Entity id (canonical for this source)**: `onenote://page/wdpartid=<id>` (preferred) or `onenote://page/webPageId=<id>`.
39
-
40
- - **WorkIQ prompts**: REPLACED by the CSC canonical prompts in `workiq-only.instructions.md` § "CSC canonical prompts (kushi v4.9.0+)". The Tier C per-page body prompt becomes the OneNote CSC per-page prompt. Do NOT use the v4.8 verbatim prompts.
41
-
42
- - **_index/entities.yml upsert** (per-contributor, per-source): on every write, upsert one row per page:
43
- ```yaml
44
- - id: 'onenote://page/wdpartid=<id>'
45
- display_name: '<page title>'
46
- entity_anchor: '<slug>'
47
- latest_csc_file: 'weekly/<YYYY-MM-DD>_onenote-csc.md'
48
- latest_csc_block_offset: <line>
49
- last_touched: <ISO>
50
- first_seen: <ISO>
51
- weeks_touched: [<YYYY-MM-DD>, ...]
52
- status: captured | body-not-exposed | unavailable | deferred # was last_status pre-v4.9.0
53
- low_signal: false | true # set true when entity cannot meet evidence-thoroughness thresholds
54
- ```
55
- Path: `<engagement-root>/<project>/Evidence/<alias>/onenote/_index/entities.yml`.
56
-
57
- - **Idempotency**: same page, same week, two runs → REPLACE the block in place. Same page, different weeks → each week's file gets its own block.
58
-
59
- - **No body-fetch loop**: drop the AI Narrative Summary requirement and the dedicated verbatim `## Body (verbatim)` section. CSC bullets sourced directly from WorkIQ CSC output (or Playwright fallback output, rendered as CSC).
60
-
61
- - **Legacy `snapshot/` and `stream/` folders**: NOT written by this skill in v4.9.0. Pre-v4.9.0 folders (especially `snapshot/pages/<safe-title>.md`) on disk are left alone; readers fall back to them when `weekly/` is empty.
62
-
63
- ### Source-specific notes (onenote)
64
-
65
- - WorkIQ Tier C per-page body prompt becomes the OneNote CSC per-page prompt (see `workiq-only.instructions.md` § CSC canonical prompts).
66
- - **KEEP Playwright as opt-in recovery-only fallback (unchanged)** — but its output now feeds CSC bullets, not `snapshot/pages/<title>.md` verbatim files. The runner produces structured page content that the skill renders into the canonical CSC block sections.
67
- - **Per-page retry registry stays** but it now lives as `low_signal` / `status` fields in `_index/entities.yml` (not as a separate `one_pages[]` block). Pages that return `body-not-exposed` for 2 consecutive refreshes still escalate to Playwright when opted in.
68
- - **Drop `snapshot/pages/<safe-title>.md` write target.** All output goes to `weekly/<Monday>_onenote-csc.md`.
69
-
70
- Pulls **onenote** evidence in two shapes per `snapshot-vs-stream.instructions.md`:
71
-
72
- - **snapshot/** — full page bodies — one file per page with last-modified + verbatim body
73
- - **stream/** page-edit events with diff summary
74
-
75
- ## Tools (in order)
76
-
77
- 1. **WorkIQ (natural-language by display name)** — **PRIMARY**. Use the approved query shapes from `workiq-onenote-query-shape.instructions.md`: one section per query, display names only, no enumeration verbs, no filter-syntax, no ID-lookup questions. Drives both section/page discovery and body retrieval. Pages that return `BODY-NOT-EXPOSED` are logged into `one_pages[].last_status` for retry on the next refresh — this multi-refresh accumulation is the thoroughness mechanism.
78
- 2. **Playwright (browser-scrape, persisted profile)** **OPT-IN RECOVERY-ONLY FALLBACK**. Only invoked when ALL of: (a) `m365Auth.oneNote.playwrightFallback: true`, (b) `one_pages[]` shows ≥ N pages with `last_status: BODY-NOT-EXPOSED` for ≥ 2 consecutive refreshes (N default 5, configurable via `m365-mutable.json#pullOnenote.playwrightThreshold`), (c) WorkIQ has been attempted in the current run. Profile lives at `~/.kushi/playwright-profile/onenote/`. Implementation: `plugin/skills/pull-onenote/runner.mjs`. See "Playwright fallback (optional)" section below.
79
- 3. **Host (m365_*)** — not used (Graph `Notes.Read.All` denied admin consent in this tenant; also forbidden by kushi's WorkIQ-first doctrine).
80
-
81
- ## Canonical CLI invocations
82
-
83
- ### Primary path: WorkIQ natural-language (kushi v4.7.3+, default for all installs)
84
-
85
- Per `..\..\instructions\workiq-onenote-query-shape.instructions.md`. Resolves section IDs and pulls page bodies via display-name queries:
86
-
87
- ```pwsh
88
- # Per-section discovery + page enumeration (also extracts wdpartid + wdsectionfileid from URL fragments):
89
- workiq ask -q "In the OneNote notebook '<NOTEBOOK DISPLAY NAME>', show me the pages in the section named '<SECTION DISPLAY NAME>'. Return a flat table with: page title, last modified, web URL. No commentary. Do not truncate."
90
-
91
- # Per-page body pull (one page at a time — narrow scope is mandatory):
92
- workiq ask -q "Open the OneNote page titled '<PAGE TITLE>' in section '<SECTION>' of notebook '<NOTEBOOK>'. Return the verbatim page body, no summary, no truncation."
93
-
94
- # Edit-event stream:
95
- workiq ask -q "Show me OneNote page edits in section '<SECTION>' of notebook '<NOTEBOOK>' since <ISO-DATE>. For each edit: page title, edited by, edited at, brief change summary."
96
- ```
97
-
98
- Driver behavior:
99
- 1. Run discovery query parse `wdsectionfileid`, `wdpartid`, `sourcedoc` GUIDs out of the URL fragments in the response.
100
- 2. Persist into `m365-mutable.json#knownSections.<projectKey>` per `m365-id-registry.instructions.md`. `webPageId` is left empty it is only populated if/when Playwright fallback runs.
101
- 3. For each enumerated page, run the body-pull query. Write snapshot files to the canonical layout. Populate `one_pages[].last_status`: `captured` on verbatim body return, `BODY-NOT-EXPOSED` on empty/partial, `workiq-degraded` on classified WorkIQ failure (per `fallback-status-reporting.instructions.md`).
102
- 4. Pages with `BODY-NOT-EXPOSED` are surfaced in the run report and queued for retry on the next refresh. The per-page retry registry is the thoroughness mechanism — over multiple refreshes, transient non-determinism is absorbed.
103
-
104
- ### Playwright fallback (OPT-IN, RECOVERY-ONLY — kushi v4.7.3+)
105
-
106
- **Do NOT bootstrap a Playwright profile on a fresh install.** This path exists only as a recovery valve for contributors whose WorkIQ body-retrieval rate stays poor across multiple refreshes. Enable explicitly:
107
-
108
- ```jsonc
109
- // In <kushi-config-root>/user/m365-auth.json:
110
- "oneNote": {
111
- "enabled": true,
112
- "defaultNotebookName": "...",
113
- "playwrightFallback": true // OPT-IN — default false
114
- }
115
- ```
116
-
117
- Once opted in, the driver auto-escalates to Playwright when `one_pages[]` shows ≥ N pages with `last_status: BODY-NOT-EXPOSED` for ≥ 2 consecutive refreshes (N default 5; configurable via `m365-mutable.json#pullOnenote.playwrightThreshold`). Until both thresholds are met, the driver MUST continue using WorkIQ. Never auto-bootstrap a profile — that is always a user-initiated action.
118
-
119
- The Playwright invocations below remain valid (cookie surfaces, channel requirements, preflight three-way classification are all empirically unchanged) — they are simply the **fallback** path now, not the default.
120
-
121
- These are the empirically validated invocations as of kushi v3.11.3 (still in force for the fallback path). The runner has surprising defaults (visible browser by default, 60s timeout, preflight required) that can derail a run. Use these recipes; do not improvise flags.
122
-
123
- ### Bootstrap the Playwright profile (FALLBACK PATH ONLY — opt-in required)
124
-
125
- ```pwsh
126
- cd <kushi-repo-root>
127
- node plugin/skills/pull-onenote/runner.mjs --bootstrap
128
- ```
129
-
130
- Expected console flow (any deviation = defect):
131
-
132
- ```
133
- [bootstrap] Step 1/2: Sign in to OneNote-for-Web. ...
134
- [bootstrap] Sign-in detected (OneNote chrome rendered). ← MUST appear before Step 2/2
135
- [bootstrap] Step 2/2: Seeding SharePoint cookies at https://microsoft-my.sharepoint.com/
136
- [bootstrap] Step 2/2: Seeding SharePoint cookies at https://microsoft-my.sharepoint-df.com/
137
- [bootstrap] Both surfaces visited. Close the browser window when ready.
138
- [bootstrap] Profile saved at: ...
139
- ```
140
-
141
- If `Sign-in detected` is missing → the v3.11.1-or-earlier `waitForURL` bug is back; the bootstrap is invalid; cookies are not minted; every subsequent scrape will return `auth-required`. Re-bootstrap with v3.11.2+.
142
-
143
- The visible Edge window stays open until you close it. Sign in fully, wait for OneNote to render, then close the window.
144
-
145
- ### Scrape a section (production) use the wrapper, not the bare runner
146
-
147
- The bare `runner.mjs` emits JSON to stdout only. The driver (PowerShell or agent) is then responsible for writing snapshot files, upserting the registry, and emitting a run report. That hand-rolled wiring is brittle (PowerShell's default `Out-File` mangles UTF-8 NBSP becomes ` `) and almost always violates the canonical layout in [`snapshot-vs-stream.instructions.md`](../../instructions/snapshot-vs-stream.instructions.md).
148
-
149
- **HARD rule (kushi v3.11.5+):** drivers MUST call `write-snapshot.mjs`, which invokes the runner internally (child_process, no shell pipe), writes the snapshot in the canonical layout, upserts `m365Mutable.knownSections.<project>.one_pages[]`, and emits a run report. Never hand-write snapshot files from runner JSON.
150
-
151
- ```pwsh
152
- cd <kushi-repo-root>
153
- $url = '<exact-section-url-from-m365-mutable.json#one_sectionWebUrl>'
154
- node plugin/skills/pull-onenote/write-snapshot.mjs `
155
- --section-url $url `
156
- --project "<project>" `
157
- --engagement-root "<engagement-root>" `
158
- --alias <your-alias>
159
- ```
160
-
161
- Output structure (per `snapshot-vs-stream.instructions.md`):
162
-
163
- ```
164
- <engagement-root>/<project>/Evidence/<alias>/onenote/
165
- snapshot/pages/<safe-title>.md ← one file per page (HARD)
166
- refresh-reports/<YYYYMMDD-HHMM>-onenote.md
167
- stream/ ← populated by WorkIQ stream pass (not by this wrapper)
168
- ```
169
-
170
- The wrapper also writes back to `m365-mutable.json`:
171
- - `one_pages[]` — per-page retry registry (title, wdpartid, webPageId, lastModified, last_status, attempts, snapshot_path, captured_at)
172
- - `one_lastPullAt`, `one_lastPullRunStatus`, `one_lastPullKushiVersion`
173
-
174
- #### Bare runner (advanced — only when you need raw JSON)
175
-
176
- If you genuinely need the runner output without writing files (e.g. for diagnostics), call it directly — but DO NOT then pipe through PowerShell `Out-File` (UTF-8 corruption). Either use `node -e` to parse stdout in-process, or use the wrapper's `--json` mode to feed a pre-captured file written with `[IO.File]::WriteAllText` (always UTF-8).
177
-
178
- Flags the wrapper sets for you (all load-bearing):
179
-
180
- | Flag | Why required |
181
- |---|---|
182
- | `--skip-preflight` | Standalone preflight at `https://onenote.cloud.microsoft/` is unreliable post-bootstrap (OneNote SPA forces `prompt=select_account` even when SPO cookies are valid). The Doc.aspx URL uses different cookies and works fine. |
183
- | `--headless` | Without it, a visible browser opens — if the user accidentally closes it during scrape, run fails with `Target page, context or browser has been closed`. |
184
- | `--timeout 120000` | Default 60s is too short under tenant load. 120s is the empirical safety margin for sections up to ~25 pages. Override with `--timeout` on the wrapper. |
185
-
186
- ### Diagnostics (when scrape produces `pages: []`)
187
-
188
- If the scrape returns `pages: []` with `runStatus: "partial"` and `error: "frame.waitForFunction: Timeout"`, the cause is almost always one of:
189
-
190
- 1. **Single-page section regex mismatch** — fixed in v3.11.3. Verify runner is v3.11.3+.
191
- 2. **Page-rail genuinely slow to render** — bump `--timeout` to 180000 and retry once.
192
- 3. **Section URL stale** — re-run `recapture-section-url.mjs` (see C.1).
193
-
194
- Quick in-place diagnostic to see what aria-labels the page is actually emitting (drop this in `plugin/skills/pull-onenote/diag.mjs`, run from kushi root):
195
-
196
- ```js
197
- import { chromium } from 'playwright';
198
- import os from 'os';
199
- import path from 'path';
200
- const ctx = await chromium.launchPersistentContext(
201
- path.join(os.homedir(), '.copilot', 'playwright-profile', 'onenote'),
202
- { headless: true, channel: 'msedge', viewport: { width: 1400, height: 900 } }
203
- );
204
- const page = ctx.pages()[0] || await ctx.newPage();
205
- await page.goto(process.argv[2], { timeout: 60000 });
206
- await page.waitForTimeout(60000);
207
- for (const f of page.frames()) {
208
- try {
209
- const labels = await f.evaluate(() =>
210
- Array.from(document.querySelectorAll('[aria-label]'))
211
- .map(n => n.getAttribute('aria-label'))
212
- .filter(x => x && (x.includes('page') || x.includes('Page')))
213
- );
214
- if (labels.length) console.log(f.url().slice(0, 80), labels.slice(0, 40));
215
- } catch (e) {}
216
- }
217
- await ctx.close();
218
- ```
219
-
220
- Run: `node plugin/skills/pull-onenote/diag.mjs "<section-url>"`. If you see `, Page. Selected.` but not `, page N of M, Page.` → single-page section, runner must be v3.11.3+.
221
-
222
- ## Empirical contract (what is true, validated 2026-05-13/14 + reframed 2026-05-26)
223
-
224
- These facts are HARD-rule and supersede any earlier doctrine in this skill or in older learnings:
225
-
226
- 1. **WorkIQ natural-language by display name IS the primary, working path for OneNote.** Reframed v4.7.3 (2026-05-26) — see `..\..\instructions\workiq-onenote-query-shape.instructions.md`. The only WorkIQ phrasings that work are: (a) narrow, single-section-by-display-name discovery queries; (b) single-page-by-title body pulls; (c) section-scoped edit-event stream queries. Forbidden: enumeration verbs, structured-field filter syntax, ID-lookup questions, broad "list my notebooks" probes. Those all empirically fail (WorkIQ punts to Graph Explorer or routes to summary mode).
227
- 2. **WorkIQ body retrieval is non-deterministic, BUT per-page retry registries close the gap over multiple refreshes.** The v3.8.0 pivot to Playwright was driven by HCA's 1-of-18 body-retrieval rate on 2026-05-14. That rate is the snapshot of one refresh — the retry registry (`one_pages[].last_status`) is the durable mechanism that accumulates captures across refreshes. A page that returns `BODY-NOT-EXPOSED` today is retried tomorrow; over 3-5 refreshes the cumulative capture rate converges. This is acceptable for engagement evidence (which is time-windowed, not real-time). Playwright remains as the opt-in recovery valve for contributors who need faster convergence and have accepted the Edge + Conditional Access + bootstrap complexity.
228
- 3. **The OneNote-for-Web `pageid` and the WorkIQ `wdpartid` are different identifiers.** Both should be persisted per page when both are observed (`webPageId` for browser navigation; `wdpartid` for WorkIQ correlation and stream events). For WorkIQ-only operation, `wdpartid` alone is sufficient — `webPageId` is only populated if/when Playwright fallback runs.
229
- 4. **Browser-scrape via OneNote-for-Web returns higher-fidelity verbatim bodies when it works** — tested against HCA on 2026-05-14: 16 of 16 pages captured (~120KB) in one run. This is why Playwright remains as the opt-in fallback. It is NOT the default because: (a) it requires Edge + Conditional Access compliance — not all contributors have that; (b) it requires a per-machine bootstrap that expires every 3-5 days; (c) most engagements do not need real-time full-fidelity capture — multi-refresh WorkIQ converges adequately.
230
-
231
- ## Pre-flight
232
-
233
- Before any retrieval, validate the Playwright profile and (only if used as fallback) the WorkIQ EULA.
234
-
235
- ### A. Playwright profile (PRIMARY)
236
-
237
- ```pwsh
238
- $prof = "$env:USERPROFILE\.kushi\playwright-profile\onenote"
239
- if (-not (Test-Path $prof)) {
240
- Write-Host "[pull-onenote] Playwright profile not yet seeded."
241
- Write-Host "[pull-onenote] Run plugin/skills/pull-onenote/runner.mjs --bootstrap once interactively to sign in."
242
- # Skill MUST NOT fall through silently. Surface as run-report 'auth-required' on every pending page.
243
- }
244
- ```
245
-
246
- **A.1 Browser channel — HARD rule (kushi v3.10.1+):** the runner MUST launch via Playwright's `channel: 'msedge'` (the user's installed Edge), NOT vanilla Playwright Chromium. The Microsoft tenant's Conditional Access policy denies vanilla Chromium with **"You can't get there from here — this application contains sensitive information and can only be accessed from devices or client applications that meet Microsoft management compliance policy"**. Edge is Intune-trusted; vanilla Chromium is not. This is codified in `runner.mjs` at `chromium.launchPersistentContext(...)`. If the user does not have Edge installed, the runner fails fast with a clear message — do not fall back to Chromium.
247
-
248
- **A.2 Two-surface bootstrap — HARD rule (kushi v3.10.1+, sign-in wait fixed in v3.11.2):** `runner.mjs --bootstrap` MUST visit BOTH cookie domains in sequence within the same persisted session:
249
-
250
- 1. `https://onenote.cloud.microsoft/` — sign-in surface for OneNote-for-Web.
251
- 2. `https://microsoft-my.sharepoint.com/` AND `https://microsoft-my.sharepoint-df.com/` — the actual SPO hosts where Doc.aspx URLs live.
252
-
253
- Cookie domains do not share between `*.cloud.microsoft` and `*.sharepoint(-df).com`. Signing into OneNote alone is insufficient — the runner's section URLs are SPO URLs and need separate cookies. The bootstrap script handles this automatically; do not modify the order.
254
-
255
- **HARD rule (kushi v3.11.2+ post-auth wait):** the bootstrap MUST wait for a real OneNote post-auth UI signal — NOT for the URL to match `onenote.cloud.microsoft` (which is a no-op because we just navigated TO that URL, collapsing the wait to ~0 ms). Required selector set:
256
-
257
- ```
258
- [aria-label*="Account manager" i], [data-automationid="NotebookList"], button[aria-label*="notebook" i], iframe[src*="onenoteframe.aspx"]
259
- ```
260
-
261
- These are the same selectors `preflightOneNoteWeb` uses. The bootstrap log MUST emit `Sign-in detected (OneNote chrome rendered).` between Step 1/2 and Step 2/2 — its absence is the defect signature of the v3.11.1-and-earlier bug (sign-in was silently skipped, and every subsequent scrape returned `auth-required` regardless of how thoroughly the user signed in). Same anti-pattern applies to any future bootstrap surface (SharePoint, Loop, M365 admin).
262
-
263
- **A.3 Profile reset on channel switch:** if migrating from a Chromium profile to Edge (or vice versa), delete `~/.kushi/playwright-profile/onenote/` first. Cookie/cache formats differ.
264
-
265
- **A.4 OneNote-for-Web reachability pre-flight — HARD rule (kushi v3.11.0+):** before navigating to any section URL, the runner MUST probe `https://onenote.cloud.microsoft/` and classify the end-state into exactly one of three buckets:
266
-
267
- | End-state | Detection | runStatus | Driver action |
268
- |---|---|---|---|
269
- | `ok` | Account chrome / notebook list / `onenoteframe.aspx` iframe rendered within `--preflight-timeout` (default 25s) | n/a (proceed) | Navigate to section URL as normal. |
270
- | `auth-required` | URL bounced to `login.microsoftonline.com` or `login.live.com` | `auth-required` | Same as today — surface in run report; next refresh re-attempts. |
271
- | `onenote-web-unavailable` | "Sorry, we ran into a problem" / "Something went wrong" / "We couldn't open" / "This notebook can't be opened" / "There was a problem" detected in any frame's body text, OR pre-flight timeout with no chrome rendered | `notebook-unavailable` | **Do NOT retry blindly.** Surface a clear diagnostic to the user with the recovery checklist below. Mark this run's pages `last_status: notebook-unavailable`. |
272
-
273
- The same three-way classification ALSO applies after navigating to the section URL — if OneNote-for-Web pops the "Sorry" dialog on the section page (rather than rendering the canvas), the runner emits `runStatus: notebook-unavailable`, NOT `auth-required`. The two are distinct failure modes and must be reported as such; conflating them sends the user down the wrong recovery path (re-bootstrap auth when the real fix is to recover the notebook).
274
-
275
- **Standalone pre-flight CLI** (for scripts and gate drivers):
276
-
277
- ```pwsh
278
- node plugin/skills/pull-onenote/runner.mjs --preflight
279
- # Exit 0 = ok, 4 = auth-required, 3 = onenote-web-unavailable, 1 = unexpected
280
- # stdout: { "preflight": { "ok": bool, "reason"?, "detail"? } }
281
- ```
282
-
283
- **Recovery checklist for `notebook-unavailable`** (surface verbatim to the user; do NOT auto-retry):
284
-
285
- 1. Hard-refresh `https://onenote.cloud.microsoft/` — if the root page shows the same error, the issue is service-side or notebook-side.
286
- 2. Open the notebook in **OneNote desktop** and let it fully sync.
287
- 3. If the notebook opens in desktop but not web, wait 10–15 minutes (web index lag) and retry.
288
- 4. If the section specifically fails (root loads, section errors), re-capture `one_sectionWebUrl` from the address bar via `recapture-section-url.mjs` — the persisted URL may be stale or for a moved section.
289
- 5. Only after the user can manually open the notebook in OneNote-for-Web, re-run the kushi pull.
290
-
291
- The runner detects the absence of valid cookies and the redirect to `login.microsoftonline.com`. When that happens:
292
-
293
- - Mark every queued page in this run as `last_status: auth-required`.
294
- - Write a refresh-report entry naming the project and counting affected pages.
295
- - DO NOT re-attempt with WorkIQ for the body — WorkIQ has been empirically proven to make up the ratio with non-bodies. Instead, the next refresh re-attempts the browser path.
296
- - DO use WorkIQ for any stream-only items (page-edit events) — those don't depend on body retrieval.
297
-
298
- ### B. WorkIQ EULA (FALLBACK)
299
-
300
- ```pwsh
301
- workiq accept-eula # idempotent; required once per machine for fallback queries
302
- ```
303
-
304
- ### C. Browser-URL completeness gate (HARD, kushi v3.10.0+)
305
-
306
- The Playwright runner navigates to `one_sectionWebUrl`. That URL is composed from FIVE registry fields, and ALL must be present and accurate:
307
-
308
- | Key | Source | Notes |
309
- |---|---|---|
310
- | `one_sectionName` | OneNote section file name | e.g. `AGCO.one` |
311
- | `one_sectionFileId` | `wd=target(<name>\|<GUID>/)` fragment | Same GUID as WorkIQ's `wdsectionfileid` |
312
- | `one_notebookSourceDoc` | `sourcedoc={<GUID>}` query param | Identifies the parent notebook |
313
- | `one_notebookSpoBaseUrl` | `<scheme>://<host>/personal/<upn>` segment | Tenant-specific |
314
- | `one_sectionWebUrl` | the canonical Doc.aspx URL | **MUST be user-pasted from address bar — see C.1** |
315
-
316
- **C.1 No-synthesis rule (HARD, kushi v3.10.2+):** `one_sectionWebUrl` MUST be the URL the user actually copied from OneNote-for-Web's address bar. There is NO reliable formula to synthesize it from the four sub-fields. Two formulas were tried and BOTH failed in production:
317
-
318
- - `wd=target(<name>|<fileId>/)` → silent "Sorry, we ran into a problem" dialog.
319
- - `wd=target(/<name>/)` → silent "Sorry, we ran into a problem" dialog (even when this exact form works for sibling sections in the same notebook).
320
-
321
- OneNote's routing depends on internal session/tenant tokens we cannot reverse-engineer. Any URL constructed by string concatenation MUST be considered invalid. The recapture script's `tryAutoHeal()` may inherit notebook-level fields (`one_notebookSourceDoc`, `one_notebookSpoBaseUrl`, `one_notebookName`) from a sibling project sharing the same notebook — but it MUST fall through to the interactive paste prompt for the section URL itself.
322
-
323
- **C.2 Auto-heal scope:** `tryAutoHeal()` returns `{ stillNeedsPaste: true }` whenever the section URL is missing, even if all notebook fields were inherited. The gate driver MUST honor this signal and prompt for paste.
324
-
325
- **Rule:** A URL synthesized by template (i.e. one the user has not actually opened in a browser) is NOT acceptable. Common failure mode: an older bootstrap wrote `sharepoint-df.com` (dogfood) when the real tenant is `sharepoint.com`, or wrote a `sourcedoc` GUID that does not match the user's actual notebook. The result is OneNote-for-Web's silent error dialog "Sorry, we ran into a problem" — which the runner cannot distinguish from auth-required.
326
-
327
- **Gate:** Before dispatching the runner for a project, check completeness with:
328
-
329
- ```pwsh
330
- node plugin/skills/pull-onenote/scripts/recapture-section-url.mjs `
331
- --project <name> `
332
- --engagement-root <engagement-root> `
333
- --check
334
- ```
335
-
336
- Exit 0 + `{"status":"ok"}` → proceed to Step A.
337
- Exit 1 + `{"status":"incomplete","missing":[...]}` → invoke recapture (interactive paste prompt):
338
-
339
- ```pwsh
340
- node plugin/skills/pull-onenote/scripts/recapture-section-url.mjs `
341
- --project <name> `
342
- --engagement-root <engagement-root>
343
- ```
344
-
345
- The script prompts the user to open OneNote-for-Web, click into the section, copy the address-bar URL, and paste it. It parses the URL, validates structure, and persists all five fields to `m365-mutable.json`. **This gate runs in BOTH bootstrap and refresh** — if a refresh discovers the gate is open, it MUST self-heal via this script before invoking the runner. This is NOT re-discovery (which is forbidden by `m365-id-registry.instructions.md`); it is a one-time backfill of pre-doctrine or stale registry fields, with the resolved IDs cached forever.
346
-
347
- If the user declines to paste a URL (script exits non-zero with no fields persisted), mark the OneNote source as `disabled: true, reason: section-url-not-captured` in `<project>/integrations.yml#boundaries.onenote`, log a refresh-report entry, and skip the runner. Do NOT fall back to WorkIQ-only — empirical proof shows WorkIQ returns BODY-NOT-EXPOSED for most pages.
348
-
349
- ## Bootstrap discovery (one-time per project)
350
-
351
- For a project we have not pulled OneNote for before, capture into `m365-mutable.json#knownSections.<projectKey>`:
352
-
353
- | Key | Source | Example (HCA) |
354
- |---|---|---|
355
- | `one_sectionName` | OneNote section file name | `HCA.one` |
356
- | `one_sectionFileId` | OneNote-for-Web wd= URL fragment AND WorkIQ wdsectionfileid (they match) | `3d0ad388-fd6b-45a8-8619-60dd709b7ade` |
357
- | `one_notebookSourceDoc` | SharePoint sourcedoc GUID for the parent notebook | `2036c7b1-db1b-47fd-a14f-d8ee94ddd9bc` |
358
- | `one_notebookName` | OneNote notebook display name | `ISE Work` |
359
- | `one_notebookSpoBaseUrl` | the host segment of the SharePoint URL before `/_layouts/15/Doc.aspx` | `https://microsoft-my.sharepoint-df.com/personal/<upn>` |
360
-
361
- Discover by:
362
-
363
- 1. Open OneNote-for-Web at `https://onenote.cloud.microsoft/`. Sign in.
364
- 2. Open the project's notebook → click the project's section.
365
- 3. Read the URL — it contains both `sourcedoc={<notebookSourceDoc>}` and `wd=target(<sectionName>|<sectionFileId>/...)`.
366
- 4. Persist all five values to `m365-mutable.json` per `m365-id-registry.instructions.md`.
367
-
368
- If the section already exists in `knownSections.<projectKey>` from a prior WorkIQ run, only the browser-specific fields (`one_notebookSourceDoc`, `one_notebookName`, `one_notebookSpoBaseUrl`) need to be added. The `one_sectionFileId` is the same for both paths.
369
-
370
- ## Step A — enumerate pages (browser, primary)
371
-
372
- The runner navigates to the section's deep-link URL and waits for the OneNote canvas to render inside the nested `ffc-onenote.officeapps.live.com/onenoteframe.aspx` frame. Page enumeration uses the accessibility tree and MUST handle BOTH aria-label formats (kushi v3.11.3+):
373
-
374
- | Section state | aria-label format | Parsed as |
375
- |---|---|---|
376
- | Multi-page | `<title>, page N of M, Page.` | `{ title, pos: N, total: M }` |
377
- | Single-page | `<title>, Page. Selected.` | `{ title, pos: 1, total: 1 }` |
378
-
379
- ```js
380
- // runner.mjs excerpt (v3.11.3)
381
- const pages = await wac.evaluate(() => {
382
- const out = [];
383
- const seen = new Set();
384
- for (const n of document.querySelectorAll('[aria-label]')) {
385
- const label = n.getAttribute('aria-label') || '';
386
- let m = label.match(/^(.*?), page (\d+) of (\d+), Page/);
387
- if (m) {
388
- const key = `${m[1]}|${m[2]}`;
389
- if (!seen.has(key)) { seen.add(key); out.push({ title: m[1], pos: parseInt(m[2]), total: parseInt(m[3]) }); }
390
- continue;
391
- }
392
- m = label.match(/^(.*?), Page\. Selected\./);
393
- if (m) {
394
- const key = `${m[1]}|1`;
395
- if (!seen.has(key)) { seen.add(key); out.push({ title: m[1], pos: 1, total: 1 }); }
396
- }
397
- }
398
- return out;
399
- });
400
- ```
401
-
402
- **HARD rule (kushi v3.11.3+):** any aria-label-driven enumerator MUST handle the N=1 special-case format. Single-page sections do NOT render `, page N of M, Page.` — assuming they do is the defect signature `frame.waitForFunction: Timeout` with `pages: []` on a section that visibly has one page in the rail. Same anti-pattern applies to `waitForFunction` — its predicate must accept either format.
403
-
404
- This returns the canonical, ordered, complete page list as OneNote itself sees it — no SharePoint search-index residue.
405
-
406
- After clicking each page, the runner reads the URL to capture the `webPageId` from the `wd=target(...|<title>|<webPageId>/)` segment. Persist both `wdpartid` (from any prior WorkIQ enumeration, if available) and `webPageId` per entry.
407
-
408
- ## Step B — fetch verbatim body (browser, primary)
409
-
410
- For each page, click its `aria-label` entry in the page rail, wait 2.5s for the canvas to settle, then:
411
-
412
- ```js
413
- const body = await wac.evaluate(() => {
414
- const node = document.querySelector('#PageContentWrapper')
415
- || document.querySelector('.Page')
416
- || document.querySelector('[role="main"]');
417
- return node ? node.innerText : '';
418
- });
419
- ```
420
-
421
- Acceptance check (HARD rule):
422
-
423
- - A captured body must contain at minimum the page title line and one non-whitespace non-toolbar paragraph.
424
- - A body shorter than 50 chars is acceptable IF the page genuinely is sparse (verify by visiting in OneNote desktop), otherwise mark `last_status: short-suspect` and retry next refresh.
425
- - If the page redirected to `login.microsoftonline.com` instead of rendering, mark `last_status: auth-required` and exit the run loop early — do not continue with stale auth.
426
-
427
- ## Step B' — fallback: WorkIQ verbatim probe
428
-
429
- Used ONLY when:
430
-
431
- - The Playwright profile is absent on this machine, OR
432
- - The previous run hit auth-required and retry-cooldown has not elapsed (24h), OR
433
- - The user explicitly requests WorkIQ-only for diagnostic comparison.
434
-
435
- Canonical WorkIQ Step B query (natural-language pattern):
436
-
437
- > Return the FULL readable content verbatim of the page titled `<title>` in my OneNote section `<one_sectionName>` in notebook `<one_notebookName>`. Do not summarize. Do not paraphrase. If the body is not exposed, say so explicitly with the literal phrase `BODY-NOT-EXPOSED` on its own line.
438
-
439
- Acceptance: real body OR literal `BODY-NOT-EXPOSED`. Anything else is rejected and counted as `last_status: workiq-degraded`.
440
-
441
- ## Per-page retry registry (dual-ID schema)
442
-
443
- Persisted at `m365-mutable.json#knownSections.<projectKey>.one_pages[]`. Each entry:
444
-
445
- ```json
446
- {
447
- "title": "4/3 - HCA with Jay and Martin",
448
- "wdpartid": "2233ac5a-007a-4b70-9d93-d6113f318ba3",
449
- "webPageId": "78aac9b5-0629-4daa-ada3-f2436cb2381c",
450
- "lastModified": "April 3rd",
451
- "last_status": "captured",
452
- "captured_via": "browser",
453
- "attempts": 2,
454
- "last_attempt_at": "2026-05-14T03:55:00Z",
455
- "snapshot_path": "Evidence/ushak/onenote/snapshot/pages/4-3---HCA-with-Jay-and-Martin.md",
456
- "captured_at": "2026-05-14T03:55:00Z"
457
- }
458
- ```
459
-
460
- `last_status` is one of:
461
-
462
- | Status | Meaning | Next-run action |
463
- |---|---|---|
464
- | `captured` | Browser scrape returned full body | None unless `lastModified` advanced |
465
- | `user-pasted` | Human pasted body into snapshot file | Treat as captured; do not overwrite |
466
- | `auth-required` | Browser hit MFA / sign-in page | Retry browser; surface in run report |
467
- | `notebook-unavailable` | OneNote-for-Web error dialog ("Sorry, we ran into a problem" / "We couldn't open") instead of rendering — service- or notebook-side, NOT auth | Do NOT auto-retry. Surface recovery checklist (SKILL §A.4); next run re-checks pre-flight. |
468
- | `workiq-degraded` | Browser unavailable AND WorkIQ returned BODY-NOT-EXPOSED | Retry browser when profile valid again |
469
- | `BODY-NOT-EXPOSED` | WorkIQ-fallback explicitly returned the literal marker | Retry browser next run |
470
- | `short-suspect` | Body < 50 chars, may not be genuinely sparse | Verify in OneNote desktop, then retry |
471
- | `enumeration-only` | Page enumerated but Step B not yet attempted | Attempt Step B on next run |
472
-
473
- Retry doctrine: every refresh re-runs Step A and re-runs Step B for any page where `last_status NOT IN ('captured', 'user-pasted')`. Pages with `last_status='captured'` are re-fetched only if `lastModified` differs from the registry's last-known value.
474
-
475
- ## Step C — stream events
476
-
477
- > **LEGACY (pre-v4.9.0).** Superseded by v4.9.0 weekly/ + CSC writer above. Page-edit events now surface as the `Last touched` line and Coverage Notes in the per-page CSC block. Kept for historical reference; not executed.
478
-
479
- Stream still uses WorkIQ — page-edit events are surfaced through search-index activity, not through page bodies, and that surface IS deterministic. No change vs v2.5.0.
480
-
481
- ## Snapshot file shape
482
-
483
- > **LEGACY (pre-v4.9.0).** Superseded by v4.9.0 weekly/ + CSC writer above. `snapshot/pages/<safe-title>.md` is no longer written; per-page content lands as a CSC block in `weekly/<Monday>_onenote-csc.md`. The page metadata previously kept in front-matter (page_title, section, wdpartid, webPageId, last_modified, last_status, captured_via, etc.) now lives as fields on the `_index/entities.yml` row for the page. Kept below for historical reference; not executed.
484
-
485
- Front-matter (yaml) at top of every `snapshot/pages/<safe-title>.md`:
486
-
487
- ```yaml
488
- ---
489
- page_title: "4/3 - HCA with Jay and Martin"
490
- section: "HCA.one"
491
- notebook: "ISE Work"
492
- section_id: "3d0ad388-fd6b-45a8-8619-60dd709b7ade"
493
- wdpartid: "2233ac5a-007a-4b70-9d93-d6113f318ba3"
494
- webPageId: "78aac9b5-0629-4daa-ada3-f2436cb2381c"
495
- last_modified: "April 3rd"
496
- last_status: "captured"
497
- captured_via: "browser"
498
- attempts: 2
499
- last_attempt: "2026-05-14T03:55:00Z"
500
- captured_at: "2026-05-14T03:55:00Z"
501
- ---
502
- ```
503
-
504
- Below the front-matter:
505
-
506
- - IF `last_status` ∈ {`captured`, `user-pasted`}: a `## AI Narrative Summary` section (3+ paragraphs) AND a `## Body (verbatim)` section with the exact text the runner extracted (or the user pasted). NEVER paraphrase a captured body.
507
- - IF `last_status ∈ {auth-required, workiq-degraded, BODY-NOT-EXPOSED}`: an explicit unavailable-marker block, a `### next_step` block, and a metadata table. Do NOT fabricate body content from emails or chats.
508
- - IF `last_status == enumeration-only`: a transient marker only. Do NOT include any body content.
509
- - IF `last_status == short-suspect`: include whatever was captured plus an `### unconfirmed` note.
510
-
511
- ## Depth bar (per `evidence-thoroughness.instructions.md` v2.0.0)
512
-
513
- Per-source minimum bullet thresholds replace the v4.8 AI Narrative Summary depth bar. For OneNote pages: **≥ 10 material bullets, ≥ 4 sections populated** (must include Topics, Dates & Numbers or Decisions, Action Items or Next Steps, Artifacts).
514
-
515
- An entity that cannot meet the threshold is flagged `low_signal: true` in `_index/entities.yml` and rendered as heading + Coverage Notes only — do NOT pad.
516
-
517
- For pages with `status: body-not-exposed` / `unavailable` / `deferred`, the CSC block is the heading + `Source basis: <status>` + `Coverage Notes` explaining the failure mode + `_None surfaced._` for every other applicable section. Do NOT infer content from adjacent pages, emails, or chats.
518
-
519
- ## Depth bar (legacy)
520
-
521
- > **LEGACY (pre-v4.9.0).** Superseded by the v4.9.0 depth bar above. Kept for historical reference.
522
-
523
- For any captured page, the `## AI Narrative Summary` must be self-contained — the reader must not need to consult the verbatim body to understand what the page is about. Cover: meeting/topic, who participated, what was decided, what's open, dates referenced. No fabrication: nothing in the summary or verbatim body section may originate from sources outside the page itself.
524
-
525
- ## Run report
526
-
527
- Every refresh writes `Evidence/<alias>/refresh-reports/<YYYY-MM-DD>-<HHMM>-onenote.md` with:
528
-
529
- - Total pages enumerated (browser-authoritative count)
530
- - Per-status counts (captured / auth-required / workiq-degraded / etc.)
531
- - List of pages whose `last_modified` advanced since prior run
532
- - Whether the runner exited early due to `auth-required`
533
- - Any short-suspect entries needing user verification
534
-
535
-
1
+ ---
2
+ name: "pull-onenote"
3
+ version: "3.1.0"
4
+ description: "USE WHEN refresh-project / bootstrap-project dispatches OneNote source AND project boundaries.onenote.section_ids list is non-empty, OR the user says \"pull OneNote for <X>\". DO NOT USE for global OneNote search or non-kushi note capture. Capability: pulls OneNote evidence as CSC blocks written to weekly/YYYY-MM-DD_onenote-csc.md, one block per page touched, upserted in _index/entities.yml. WorkIQ-primary; Playwright opt-in recovery-only fallback."
5
+ ---
6
+
7
+ # Skill: pull-onenote
8
+
9
+ > **v3.0.0 contract change (kushi v4.9.0, 2026-05-26):** Output shape is now **weekly CSC** (`weekly/<YYYY-MM-DD>_onenote-csc.md` + `_index/entities.yml`). WorkIQ is still PRIMARY, Playwright is still OPT-IN recovery-only — but their outputs feed CSC bullets, not verbatim `snapshot/pages/<title>.md` files. The per-page retry registry now lives as `status` / `low_signal` rows in `_index/entities.yml`. See `weekly-csc.instructions.md` and `comprehensive-structured-capture.instructions.md`.
10
+
11
+ > **v2.7.0 contract change (kushi v4.7.3, 2026-05-26):** Playwright is **demoted from PRIMARY to opt-in recovery-only fallback**. WorkIQ natural-language queries by display name are now the primary path. See `..\..\instructions\workiq-onenote-query-shape.instructions.md` for the working query shapes and the empirical record. The v3.8.0 pivot to Playwright-primary was driven by HCA body-retrieval rate (1/18 on 2026-05-14) — but per-page retry registries + multi-refresh accumulation close that gap without forcing every contributor onto Edge + Conditional Access + bootstrap-profile complexity.
12
+
13
+ > **v4.9.0 contracts** — This skill operates under these HARD-rule doctrines:
14
+ > - **`comprehensive-structured-capture.instructions.md`** — CSC block shape (canonical sections, bullets only, no prose).
15
+ > - **`weekly-csc.instructions.md`** — weekly/ + _index/ writer contract; multi-user safety.
16
+ > - `workiq-onenote-query-shape.instructions.md` — **PRIMARY**. The only working WorkIQ phrasings for OneNote. CSC canonical prompts adapt these query shapes per `workiq-only.instructions.md` § "CSC canonical prompts (kushi v4.9.0+)".
17
+ > - `m365-id-registry.instructions.md` — discover-once / consume-deterministically (section IDs, dual page-id schema when both observed).
18
+ > - `evidence-thoroughness.instructions.md` (v2.0.0) — per-source minimum bullet thresholds; `low_signal: true` flag.
19
+ > - `citation-ledger.instructions.md` — citation format.
20
+ > - `capture-learnings.instructions.md` — every fix/discovery is logged to `plugin/learnings/<source>.md`.
21
+ > - `cleanup-on-resolution.instructions.md` — stale unavailable-markers are upgraded in the same turn.
22
+ > - `run-reports.instructions.md` — every refresh writes a per-user report under `Evidence/<alias>/refresh-reports/`.
23
+ > - `evidence-layout-canonical.instructions.md` — `weekly/` + `_index/` are canonical in v4.9.0+.
24
+ > - ~~`verbatim-by-default.instructions.md`~~ (LEGACY — superseded by CSC).
25
+ > - ~~`snapshot-vs-stream.instructions.md`~~ (LEGACY — superseded by weekly-csc).
26
+
27
+ ## Gotchas
28
+
29
+ - **Moved pages keep original wdsectionfileid** → if a section returns `page-body-unavailable` or zero `wdpartid` entries, the pages were authored elsewhere and moved. Open OneNote desktop, drag pages back into the target section, force-sync, retry.
30
+ - **WorkIQ Tier C requires ONE page per call** — batching multiple pages into a single ask returns empty results or summary-mode refusal. Loop over pageIds, one WorkIQ call per page.
31
+ - **Notebook-unavailable masquerades as auth-required** — OneNote-for-Web error dialogs ("Sorry, we ran into a problem") look like sign-in failures to the runner. Run `--preflight` first; the three-way classifier distinguishes `ok` / `auth-required` / `notebook-unavailable`. Never re-bootstrap auth for `notebook-unavailable`.
32
+ - **Forbidden bulk-enumerate phrasings** — bulk "list my … notebooks", notebook-ID lookups, structured-field section searches, and `wdsectionfileid` filter syntax all route WorkIQ to summary mode or Graph Explorer. Only display-name content queries work. See `workiq-onenote-query-shape.instructions.md` for the canonical forbidden list.
33
+ - **Conditional Access requires Edge channel for Playwright fallback** if `playwrightFallback: true` and bootstrap runs on Chromium/Firefox, cookies mint but every scrape returns `auth-required`. Use `--channel msedge`. Cookie domains do NOT transfer between SPO surfaces — bootstrap MUST visit both `microsoft-my.sharepoint.com` AND `microsoft-my.sharepoint-df.com`.
34
+
35
+ ## v4.9.0 — Comprehensive Structured Capture (CSC) + weekly/ layout (HARD RULE; supersedes all snapshot/+stream/ guidance below)
36
+
37
+ Per `comprehensive-structured-capture.instructions.md` and `weekly-csc.instructions.md`:
38
+
39
+ - **Output target** (single file per source per ISO week):
40
+ `<engagement-root>/<project>/Evidence/<alias>/onenote/weekly/<YYYY-MM-DD>_onenote-csc.md`
41
+ where `<YYYY-MM-DD>` is the **Monday** of the ISO week the page was touched.
42
+ Empty weeks produce no file.
43
+
44
+ - **Block shape (per page touched that week)**: one CSC block under `## <Page title> {#<entity-anchor>}` with canonical section order: Source basis → Coverage window → Last touched → Participants → Topics Discussed → Decisions → Dates & Numbers Shared → Action Items → Next Steps → Open Questions → Risks/Blockers/Dependencies → Customer Asks → Artifacts/Links → Coverage Notes. (Q&A and Who Said What omitted per CSC per-source applicability.) Bullets only — no prose paragraphs.
45
+
46
+ - **Entity id (canonical for this source)**: `onenote://page/wdpartid=<id>` (preferred) or `onenote://page/webPageId=<id>`.
47
+
48
+ - **WorkIQ prompts**: REPLACED by the CSC canonical prompts in `workiq-only.instructions.md` § "CSC canonical prompts (kushi v4.9.0+)". The Tier C per-page body prompt becomes the OneNote CSC per-page prompt. Do NOT use the v4.8 verbatim prompts.
49
+
50
+ - **_index/entities.yml upsert** (per-contributor, per-source): on every write, upsert one row per page:
51
+ ```yaml
52
+ - id: 'onenote://page/wdpartid=<id>'
53
+ display_name: '<page title>'
54
+ entity_anchor: '<slug>'
55
+ latest_csc_file: 'weekly/<YYYY-MM-DD>_onenote-csc.md'
56
+ latest_csc_block_offset: <line>
57
+ last_touched: <ISO>
58
+ first_seen: <ISO>
59
+ weeks_touched: [<YYYY-MM-DD>, ...]
60
+ status: captured | body-not-exposed | unavailable | deferred # was last_status pre-v4.9.0
61
+ low_signal: false | true # set true when entity cannot meet evidence-thoroughness thresholds
62
+ ```
63
+ Path: `<engagement-root>/<project>/Evidence/<alias>/onenote/_index/entities.yml`.
64
+
65
+ - **Idempotency**: same page, same week, two runs → REPLACE the block in place. Same page, different weeks each week's file gets its own block.
66
+
67
+ - **No body-fetch loop**: drop the AI Narrative Summary requirement and the dedicated verbatim `## Body (verbatim)` section. CSC bullets sourced directly from WorkIQ CSC output (or Playwright fallback output, rendered as CSC).
68
+
69
+ - **Legacy `snapshot/` and `stream/` folders**: NOT written by this skill in v4.9.0. Pre-v4.9.0 folders (especially `snapshot/pages/<safe-title>.md`) on disk are left alone; readers fall back to them when `weekly/` is empty.
70
+
71
+ ### Source-specific notes (onenote)
72
+
73
+ - WorkIQ Tier C per-page body prompt becomes the OneNote CSC per-page prompt (see `workiq-only.instructions.md` § CSC canonical prompts).
74
+ - **KEEP Playwright as opt-in recovery-only fallback (unchanged)** — but its output now feeds CSC bullets, not `snapshot/pages/<title>.md` verbatim files. The runner produces structured page content that the skill renders into the canonical CSC block sections.
75
+ - **Per-page retry registry stays** — but it now lives as `low_signal` / `status` fields in `_index/entities.yml` (not as a separate `one_pages[]` block). Pages that return `body-not-exposed` for ≥ 2 consecutive refreshes still escalate to Playwright when opted in.
76
+ - **Drop `snapshot/pages/<safe-title>.md` write target.** All output goes to `weekly/<Monday>_onenote-csc.md`.
77
+
78
+ Pulls **onenote** evidence in two shapes per `snapshot-vs-stream.instructions.md`:
79
+
80
+ - **snapshot/** — full page bodies — one file per page with last-modified + verbatim body
81
+ - **stream/** page-edit events with diff summary
82
+
83
+ ## Tools (in order)
84
+
85
+ 1. **WorkIQ (natural-language by display name)** — **PRIMARY**. Use the approved query shapes from `workiq-onenote-query-shape.instructions.md`: one section per query, display names only, no enumeration verbs, no filter-syntax, no ID-lookup questions. Drives both section/page discovery and body retrieval. Pages that return `BODY-NOT-EXPOSED` are logged into `one_pages[].last_status` for retry on the next refresh — this multi-refresh accumulation is the thoroughness mechanism.
86
+ 2. **Playwright (browser-scrape, persisted profile)** — **OPT-IN RECOVERY-ONLY FALLBACK**. Only invoked when ALL of: (a) `m365Auth.oneNote.playwrightFallback: true`, (b) `one_pages[]` shows ≥ N pages with `last_status: BODY-NOT-EXPOSED` for ≥ 2 consecutive refreshes (N default 5, configurable via `m365-mutable.json#pullOnenote.playwrightThreshold`), (c) WorkIQ has been attempted in the current run. Profile lives at `~/.kushi/playwright-profile/onenote/`. Implementation: `plugin/skills/pull-onenote/runner.mjs`. See "Playwright fallback (optional)" section below.
87
+ 3. **Host (m365_*)** — not used (Graph `Notes.Read.All` denied admin consent in this tenant; also forbidden by kushi's WorkIQ-first doctrine).
88
+
89
+ ## Canonical CLI invocations
90
+
91
+ ### Primary path: WorkIQ natural-language (kushi v4.7.3+, default for all installs)
92
+
93
+ Per `..\..\instructions\workiq-onenote-query-shape.instructions.md`. Resolves section IDs and pulls page bodies via display-name queries:
94
+
95
+ ```pwsh
96
+ # Per-section discovery + page enumeration (also extracts wdpartid + wdsectionfileid from URL fragments):
97
+ workiq ask -q "In the OneNote notebook '<NOTEBOOK DISPLAY NAME>', show me the pages in the section named '<SECTION DISPLAY NAME>'. Return a flat table with: page title, last modified, web URL. No commentary. Do not truncate."
98
+
99
+ # Per-page body pull (one page at a time narrow scope is mandatory):
100
+ workiq ask -q "Open the OneNote page titled '<PAGE TITLE>' in section '<SECTION>' of notebook '<NOTEBOOK>'. Return the verbatim page body, no summary, no truncation."
101
+
102
+ # Edit-event stream:
103
+ workiq ask -q "Show me OneNote page edits in section '<SECTION>' of notebook '<NOTEBOOK>' since <ISO-DATE>. For each edit: page title, edited by, edited at, brief change summary."
104
+ ```
105
+
106
+ Driver behavior:
107
+ 1. Run discovery query → parse `wdsectionfileid`, `wdpartid`, `sourcedoc` GUIDs out of the URL fragments in the response.
108
+ 2. Persist into `m365-mutable.json#knownSections.<projectKey>` per `m365-id-registry.instructions.md`. `webPageId` is left empty — it is only populated if/when Playwright fallback runs.
109
+ 3. For each enumerated page, run the body-pull query. Write snapshot files to the canonical layout. Populate `one_pages[].last_status`: `captured` on verbatim body return, `BODY-NOT-EXPOSED` on empty/partial, `workiq-degraded` on classified WorkIQ failure (per `fallback-status-reporting.instructions.md`).
110
+ 4. Pages with `BODY-NOT-EXPOSED` are surfaced in the run report and queued for retry on the next refresh. The per-page retry registry is the thoroughness mechanism — over multiple refreshes, transient non-determinism is absorbed.
111
+
112
+ ### Playwright fallback (OPT-IN, RECOVERY-ONLY — kushi v4.7.3+)
113
+
114
+ **Do NOT bootstrap a Playwright profile on a fresh install.** This path exists only as a recovery valve for contributors whose WorkIQ body-retrieval rate stays poor across multiple refreshes. Enable explicitly:
115
+
116
+ ```jsonc
117
+ // In <kushi-config-root>/user/m365-auth.json:
118
+ "oneNote": {
119
+ "enabled": true,
120
+ "defaultNotebookName": "...",
121
+ "playwrightFallback": true // OPT-IN default false
122
+ }
123
+ ```
124
+
125
+ Once opted in, the driver auto-escalates to Playwright when `one_pages[]` shows ≥ N pages with `last_status: BODY-NOT-EXPOSED` for ≥ 2 consecutive refreshes (N default 5; configurable via `m365-mutable.json#pullOnenote.playwrightThreshold`). Until both thresholds are met, the driver MUST continue using WorkIQ. Never auto-bootstrap a profile — that is always a user-initiated action.
126
+
127
+ The Playwright invocations below remain valid (cookie surfaces, channel requirements, preflight three-way classification are all empirically unchanged) — they are simply the **fallback** path now, not the default.
128
+
129
+ These are the empirically validated invocations as of kushi v3.11.3 (still in force for the fallback path). The runner has surprising defaults (visible browser by default, 60s timeout, preflight required) that can derail a run. Use these recipes; do not improvise flags.
130
+
131
+ > **Load `references/playwright-fallback.md`** for the runner bootstrap procedure, the `write-snapshot.mjs` scrape wrapper, channel/cookie HARD rules, and diagnostics for `pages: []` runs. Load ONLY when the Playwright fallback gate (above) has tripped — do NOT preload on fresh installs.
132
+
133
+
134
+ ## Empirical contract + Pre-flight gates
135
+
136
+ > **Load `references/preflight.md`** when constructing or debugging the pre-flight gate (Playwright profile check, OneNote-for-Web three-way reachability classifier, browser-URL completeness gate, no-synthesis rule).
137
+ >
138
+ > **Load `references/runtime-contract.md`** for the four HARD-rule empirical facts (WorkIQ-primary doctrine, non-determinism + retry registry, dual page-id schema, when browser-scrape wins).
139
+
140
+
141
+ ## Bootstrap discovery + Runner steps
142
+
143
+ > **Load `references/runtime-contract.md`** for the one-time-per-project bootstrap discovery table, runner Step A (page enumeration), Step B (verbatim body fetch), Step B' (WorkIQ fallback), the per-page retry registry status enum, and the legacy `snapshot/pages/<title>.md` file shape (reader fallback only). Implementer-only — readers should not need this file.
144
+
145
+ ## Depth bar (per `evidence-thoroughness.instructions.md` v2.0.0)
146
+
147
+ Per-source minimum bullet thresholds replace the v4.8 AI Narrative Summary depth bar. For OneNote pages: **≥ 10 material bullets, 4 sections populated** (must include Topics, Dates & Numbers or Decisions, Action Items or Next Steps, Artifacts).
148
+
149
+ An entity that cannot meet the threshold is flagged `low_signal: true` in `_index/entities.yml` and rendered as heading + Coverage Notes only do NOT pad.
150
+
151
+ For pages with `status: body-not-exposed` / `unavailable` / `deferred`, the CSC block is the heading + `Source basis: <status>` + `Coverage Notes` explaining the failure mode + `_None surfaced._` for every other applicable section. Do NOT infer content from adjacent pages, emails, or chats.
152
+
153
+ ## Depth bar (legacy)
154
+
155
+ > **LEGACY (pre-v4.9.0).** Superseded by the v4.9.0 depth bar above. Kept for historical reference.
156
+
157
+ For any captured page, the `## AI Narrative Summary` must be self-contained — the reader must not need to consult the verbatim body to understand what the page is about. Cover: meeting/topic, who participated, what was decided, what's open, dates referenced. No fabrication: nothing in the summary or verbatim body section may originate from sources outside the page itself.
158
+
159
+ ## Run report
160
+
161
+ Every refresh writes `Evidence/<alias>/refresh-reports/<YYYY-MM-DD>-<HHMM>-onenote.md` with:
162
+
163
+ - Total pages enumerated (browser-authoritative count)
164
+ - Per-status counts (captured / auth-required / workiq-degraded / etc.)
165
+ - List of pages whose `last_modified` advanced since prior run
166
+ - Whether the runner exited early due to `auth-required`
167
+ - Any short-suspect entries needing user verification
168
+
169
+
536
170
 
537
171
  ## References (v4.4.7)
538
172
 
539
173
  - Name → ID resolution follows ..\..\instructions\fuzzy-disambiguation.instructions.md (universal fuzzy contract).
540
174
  - After this pull completes, the per-source verification gate runs: ..\..\instructions\per-source-verification-gate.instructions.md (retry once, then write FOLLOW-UPS.md).
541
-
542
-
543
- ## Issue Recovery
544
-
545
- When this skill exposes a reusable defect (auth pattern, doctrine gap, layout mismatch), apply the [Issue Recovery Rule](../../instructions/issue-recovery.instructions.md): fix the smallest correct repo-owned artifact first, prefer durable fixes over per-run workarounds, then re-run the narrowest failed check. Do NOT use memory as a substitute for correcting the workflow surface.
546
-
547
- ## Tools (v4.9.0 update)
548
-
549
- WorkIQ (PRIMARY) — use **CSC canonical prompts** from `workiq-only.instructions.md` § 'CSC canonical prompts (kushi v4.9.0+)'. The pre-v4.9.0 verbatim prompts are LEGACY (kept only for pull-meetings Half A). Playwright fallback unchanged in invocation, but its output is now rendered into CSC sections rather than written as a verbatim `snapshot/pages/<title>.md` file.
550
-
551
- ## Changelog
552
-
553
- - **v3.0.0 (kushi v4.9.0, 2026-05-26)**: BREAKING. Output is now a single weekly CSC file per ISO week
554
- (`weekly/<YYYY-MM-DD>_onenote-csc.md`) + per-contributor `_index/entities.yml`. `snapshot/pages/`
555
- and `stream/` writes removed. WorkIQ prompts switched to CSC canonical prompts (Tier C per-page
556
- body prompt becomes OneNote CSC per-page prompt). AI Narrative Summary requirement removed (CSC
557
- bulleted sections carry the load). Per-page retry registry now lives as `low_signal` / `status`
558
- fields in `_index/entities.yml`. Playwright fallback unchanged in invocation; output rendered as
559
- CSC bullets, not verbatim file. Legacy snapshot/+stream/ folders left readable; no migration.
560
- - **v2.x.x**: prior snapshot/+stream/ + verbatim-by-default shape. See git history.
175
+
176
+
177
+ ## Issue Recovery
178
+
179
+ When this skill exposes a reusable defect (auth pattern, doctrine gap, layout mismatch), apply the [Issue Recovery Rule](../../instructions/issue-recovery.instructions.md): fix the smallest correct repo-owned artifact first, prefer durable fixes over per-run workarounds, then re-run the narrowest failed check. Do NOT use memory as a substitute for correcting the workflow surface.
180
+
181
+ ## Tools (v4.9.0 update)
182
+
183
+ WorkIQ (PRIMARY) — use **CSC canonical prompts** from `workiq-only.instructions.md` § 'CSC canonical prompts (kushi v4.9.0+)'. The pre-v4.9.0 verbatim prompts are LEGACY (kept only for pull-meetings Half A). Playwright fallback unchanged in invocation, but its output is now rendered into CSC sections rather than written as a verbatim `snapshot/pages/<title>.md` file.
184
+
185
+ ## Changelog
186
+
187
+ ## Changelog
188
+
189
+ - **v3.1.0 (kushi v5.0.1, 2026-05-26)**: agentskills.io spec-compliance pass. Extracted Playwright
190
+ fallback (bootstrap, scrape wrapper, channel rules, diagnostics) `references/playwright-fallback.md`;
191
+ pre-flight gates (three-way reachability classifier, browser-URL completeness gate)
192
+ `references/preflight.md`; empirical contract + Step A/B/B'/C runner contract + legacy snapshot
193
+ file shape `references/runtime-contract.md`. SKILL.md trimmed from 568 to ~200 lines. Behaviour
194
+ unchanged; load-on-trigger pointers added so readers pull only what each task needs. Added
195
+ `## Gotchas` block and `## Validation loop` per agentskills compliance doctrine.
196
+ - **v3.0.0 (kushi v4.9.0, 2026-05-26)**: BREAKING. Output is now a single weekly CSC file per ISO week
197
+ (`weekly/<YYYY-MM-DD>_onenote-csc.md`) + per-contributor `_index/entities.yml`. `snapshot/pages/`
198
+ and `stream/` writes removed. WorkIQ prompts switched to CSC canonical prompts (Tier C per-page
199
+ body prompt becomes OneNote CSC per-page prompt). AI Narrative Summary requirement removed (CSC
200
+ bulleted sections carry the load). Per-page retry registry now lives as `low_signal` / `status`
201
+ fields in `_index/entities.yml`. Playwright fallback unchanged in invocation; output rendered as
202
+ CSC bullets, not verbatim file. Legacy snapshot/+stream/ folders left readable; no migration.
203
+ - **v2.x.x**: prior snapshot/+stream/ + verbatim-by-default shape. See git history.
204
+
205
+ ## Validation loop
206
+
207
+ After writing outputs:
208
+
209
+ 1. Run self-check targeted at this skill: `pwsh plugin/skills/self-check/run.ps1 -Targeted pull-onenote`
210
+ 2. If failures: fix and re-run the affected step (not the whole skill).
211
+ 3. Repeat until self-check exits 0.
212
+ 4. Only then update `run-log.yml` with success status.