kushi-agents 3.4.2 → 3.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/copilot-instructions.kushi.md +38 -0
- package/README.md +33 -0
- package/bin/cli.mjs +2 -0
- package/package.json +17 -4
- package/plugin/agents/kushi.agent.md +155 -147
- package/plugin/instructions/ado-bootstrap-discovery.instructions.md +111 -0
- package/plugin/instructions/ado-engagement-tree.instructions.md +73 -0
- package/plugin/instructions/answer-from-evidence.instructions.md +1 -1
- package/plugin/instructions/auth-and-retry.instructions.md +51 -16
- package/plugin/instructions/azure-auth-patterns.instructions.md +13 -6
- package/plugin/instructions/bootstrap-status-format.instructions.md +113 -0
- package/plugin/instructions/capture-learnings.instructions.md +95 -0
- package/plugin/instructions/cleanup-on-resolution.instructions.md +69 -0
- package/plugin/instructions/crm-bootstrap-discovery.instructions.md +79 -0
- package/plugin/instructions/crm-internal-vs-confirmed.instructions.md +79 -0
- package/plugin/instructions/evidence-confidence-ladder.instructions.md +66 -0
- package/plugin/instructions/evidence-layout-canonical.instructions.md +115 -0
- package/plugin/instructions/evidence-thoroughness.instructions.md +82 -12
- package/plugin/instructions/full-view-gate.instructions.md +91 -0
- package/plugin/instructions/m365-id-registry.instructions.md +134 -0
- package/plugin/instructions/meetings-verbatim-required.instructions.md +176 -0
- package/plugin/instructions/run-reports.instructions.md +129 -0
- package/plugin/instructions/scope-boundaries.instructions.md +218 -0
- package/plugin/instructions/snapshot-vs-stream.instructions.md +2 -0
- package/plugin/instructions/update-ledger.instructions.md +132 -0
- package/plugin/instructions/verbatim-by-default.instructions.md +73 -0
- package/plugin/instructions/workiq-first.instructions.md +15 -31
- package/plugin/instructions/workiq-only.instructions.md +193 -0
- package/plugin/learnings/README.md +50 -0
- package/plugin/learnings/ado.md +45 -0
- package/plugin/learnings/crm.md +96 -0
- package/plugin/learnings/cross-cutting.md +36 -0
- package/plugin/learnings/email.md +33 -0
- package/plugin/learnings/meetings.md +30 -0
- package/plugin/learnings/misc.md +46 -0
- package/plugin/learnings/onenote.md +215 -0
- package/plugin/learnings/sharepoint.md +5 -0
- package/plugin/learnings/teams.md +5 -0
- package/plugin/plugin.json +22 -2
- package/plugin/prompts/apply-ado.prompt.md +14 -0
- package/plugin/prompts/propose-ado.prompt.md +12 -0
- package/plugin/reference-packs/fde/crm-field-manifest.md +165 -0
- package/plugin/skills/apply-ado-update/SKILL.md +125 -0
- package/plugin/skills/ask-project/SKILL.md +2 -0
- package/plugin/skills/bootstrap-project/SKILL.md +81 -3
- package/plugin/skills/propose-ado-update/SKILL.md +108 -0
- package/plugin/skills/pull-ado/SKILL.md +173 -23
- package/plugin/skills/pull-crm/SKILL.md +168 -15
- package/plugin/skills/pull-email/SKILL.md +139 -22
- package/plugin/skills/pull-meetings/SKILL.md +109 -25
- package/plugin/skills/pull-misc/README.md +84 -0
- package/plugin/skills/pull-misc/SKILL.md +257 -0
- package/plugin/skills/pull-misc/runner.mjs +280 -0
- package/plugin/skills/pull-onenote/README.md +90 -0
- package/plugin/skills/pull-onenote/SKILL.md +400 -51
- package/plugin/skills/pull-onenote/runner.mjs +356 -0
- package/plugin/skills/pull-onenote/scripts/recapture-section-url.mjs +295 -0
- package/plugin/skills/pull-onenote/write-snapshot.mjs +271 -0
- package/plugin/skills/pull-sharepoint/SKILL.md +44 -12
- package/plugin/skills/pull-teams/SKILL.md +40 -11
- package/plugin/skills/refresh-project/SKILL.md +33 -2
- package/plugin/skills/self-check/run.ps1 +186 -4
- package/plugin/templates/ado-update/discussion-comment.template.md +26 -0
- package/plugin/templates/ado-update/integrations-ado-writes.example.yml +49 -0
- package/plugin/templates/ado-update/proposed.template.md +78 -0
- package/plugin/templates/init/external-links.template.txt +30 -0
- package/plugin/templates/init/project-integrations.template.yml +57 -2
- package/plugin/templates/snapshot/meeting-verbatim.template.md +110 -0
- package/plugin/templates/snapshot/meetings-series-index.template.md +3 -1
- package/plugin/templates/snapshot/onenote-page.template.md +92 -23
- package/plugin/templates/weekly/meetings-stream.template.md +11 -6
- package/src/copilot-instructions.mjs +80 -0
- package/src/main.mjs +18 -1
|
@@ -1,82 +1,431 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: "pull-onenote"
|
|
3
|
-
version: "2.1
|
|
4
|
-
description: "Pull OneNote evidence (snapshot: full page bodies; stream: page-edit events).
|
|
3
|
+
version: "2.5.1"
|
|
4
|
+
description: "Pull OneNote evidence (snapshot: full page bodies; stream: page-edit events). Browser-scrape (OneNote-for-Web via Playwright with persisted profile) is the PRIMARY capture path because it is the only mechanism that returns reliable, complete, verbatim page bodies in this tenant. WorkIQ remains the fallback when the browser auth profile expires (Conditional Access / MFA). Per-page retry registry persists both ID forms (browser webPageId + WorkIQ wdpartid) and tracks last_status across runs."
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Skill: pull-onenote
|
|
8
8
|
|
|
9
|
+
> **v3.8.0 contracts** — This skill operates under six HARD-rule doctrines:
|
|
10
|
+
> - `verbatim-by-default.instructions.md` — full bodies by default; no preview-grade pulls accepted.
|
|
11
|
+
> - `m365-id-registry.instructions.md` — discover-once / consume-deterministically (section IDs, dual page-id schema).
|
|
12
|
+
> - `capture-learnings.instructions.md` — every fix/discovery is logged to `plugin/learnings/<source>.md`.
|
|
13
|
+
> - `cleanup-on-resolution.instructions.md` — when a value resolves, all stale unavailable-markers are upgraded in the same turn.
|
|
14
|
+
> - `run-reports.instructions.md` — every refresh writes a per-user report under `Evidence/<alias>/refresh-reports/`.
|
|
15
|
+
> - `thoroughness-detector.instructions.md` — runtime detector + auto-retry + paste-prompt for any page where the primary path returns nothing or partial. The per-page retry registry IS this skill's thoroughness contract.
|
|
16
|
+
|
|
17
|
+
> **Canonical evidence layout** (HARD, kushi v3.12.1+): all artifacts produced by this skill MUST be written under `<project>/Evidence/<alias>/<source>/{snapshot,stream,...}/` — sibling folders under `<project>/` (e.g. `<project>/<source>-context/`, `<project>/<source>/`, `<project>/_Weekly Summaries/`) are FORBIDDEN. See `evidence-layout-canonical.instructions.md`.
|
|
18
|
+
|
|
9
19
|
Pulls **onenote** evidence in two shapes per `snapshot-vs-stream.instructions.md`:
|
|
10
20
|
|
|
11
|
-
- **snapshot/** — full page bodies — one file per page with last-modified +
|
|
12
|
-
- **stream/** — page-edit events with diff summary
|
|
21
|
+
- **snapshot/** — full page bodies — one file per page with last-modified + verbatim body
|
|
22
|
+
- **stream/** — page-edit events with diff summary
|
|
23
|
+
|
|
24
|
+
## Tools (in order)
|
|
25
|
+
|
|
26
|
+
1. **Playwright (browser-scrape, persisted profile)** — PRIMARY. Drives OneNote-for-Web to enumerate pages and read verbatim bodies via `#PageContentWrapper`. Profile lives at `~/.copilot/playwright-profile/onenote/` and is reused across runs. Implementation: `plugin/skills/pull-onenote/runner.mjs`.
|
|
27
|
+
2. **WorkIQ** — FALLBACK only. Used when the Playwright profile is auth-expired (`auth-required`) and for the stream/edit-event source. Always cite `m365-id-registry` doctrine when invoking; never use as primary for body retrieval (proven non-deterministic).
|
|
28
|
+
3. **Host (m365_*)** — not used (Graph `Notes.Read.All` denied admin consent in this tenant).
|
|
29
|
+
|
|
30
|
+
## Canonical CLI invocations (do NOT re-derive — copy these exactly)
|
|
31
|
+
|
|
32
|
+
These are the empirically validated invocations as of kushi v3.11.3. The runner has surprising defaults (visible browser by default, 60s timeout, preflight required) that can derail a run. Use these recipes; do not improvise flags.
|
|
33
|
+
|
|
34
|
+
### Bootstrap (one-time per machine; ~3-5 days between runs as cookies expire)
|
|
35
|
+
|
|
36
|
+
```pwsh
|
|
37
|
+
cd C:\Usha\ISERepos\kushi
|
|
38
|
+
node plugin/skills/pull-onenote/runner.mjs --bootstrap
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Expected console flow (any deviation = defect):
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
[bootstrap] Step 1/2: Sign in to OneNote-for-Web. ...
|
|
45
|
+
[bootstrap] Sign-in detected (OneNote chrome rendered). ← MUST appear before Step 2/2
|
|
46
|
+
[bootstrap] Step 2/2: Seeding SharePoint cookies at https://microsoft-my.sharepoint.com/
|
|
47
|
+
[bootstrap] Step 2/2: Seeding SharePoint cookies at https://microsoft-my.sharepoint-df.com/
|
|
48
|
+
[bootstrap] Both surfaces visited. Close the browser window when ready.
|
|
49
|
+
[bootstrap] Profile saved at: ...
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
If `Sign-in detected` is missing → the v3.11.1-or-earlier `waitForURL` bug is back; the bootstrap is invalid; cookies are not minted; every subsequent scrape will return `auth-required`. Re-bootstrap with v3.11.2+.
|
|
53
|
+
|
|
54
|
+
The visible Edge window stays open until you close it. Sign in fully, wait for OneNote to render, then close the window.
|
|
55
|
+
|
|
56
|
+
### Scrape a section (production) — use the wrapper, not the bare runner
|
|
57
|
+
|
|
58
|
+
The bare `runner.mjs` emits JSON to stdout only. The driver (PowerShell or agent) is then responsible for writing snapshot files, upserting the registry, and emitting a run report. That hand-rolled wiring is brittle (PowerShell's default `Out-File` mangles UTF-8 — NBSP becomes `┬á`) and almost always violates the canonical layout in [`snapshot-vs-stream.instructions.md`](../../instructions/snapshot-vs-stream.instructions.md).
|
|
59
|
+
|
|
60
|
+
**HARD rule (kushi v3.11.5+):** drivers MUST call `write-snapshot.mjs`, which invokes the runner internally (child_process, no shell pipe), writes the snapshot in the canonical layout, upserts `m365Mutable.knownSections.<project>.one_pages[]`, and emits a run report. Never hand-write snapshot files from runner JSON.
|
|
61
|
+
|
|
62
|
+
```pwsh
|
|
63
|
+
cd C:\Usha\ISERepos\kushi
|
|
64
|
+
$url = '<exact-section-url-from-m365-mutable.json#one_sectionWebUrl>'
|
|
65
|
+
node plugin/skills/pull-onenote/write-snapshot.mjs `
|
|
66
|
+
--section-url $url `
|
|
67
|
+
--project "<project>" `
|
|
68
|
+
--engagement-root "<engagement-root>" `
|
|
69
|
+
--alias ushak
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Output structure (per `snapshot-vs-stream.instructions.md`):
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
<engagement-root>/<project>/Evidence/<alias>/onenote/
|
|
76
|
+
snapshot/pages/<safe-title>.md ← one file per page (HARD)
|
|
77
|
+
refresh-reports/<YYYYMMDD-HHMM>-onenote.md
|
|
78
|
+
stream/ ← populated by WorkIQ stream pass (not by this wrapper)
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
The wrapper also writes back to `m365-mutable.json`:
|
|
82
|
+
- `one_pages[]` — per-page retry registry (title, wdpartid, webPageId, lastModified, last_status, attempts, snapshot_path, captured_at)
|
|
83
|
+
- `one_lastPullAt`, `one_lastPullRunStatus`, `one_lastPullKushiVersion`
|
|
84
|
+
|
|
85
|
+
#### Bare runner (advanced — only when you need raw JSON)
|
|
86
|
+
|
|
87
|
+
If you genuinely need the runner output without writing files (e.g. for diagnostics), call it directly — but DO NOT then pipe through PowerShell `Out-File` (UTF-8 corruption). Either use `node -e` to parse stdout in-process, or use the wrapper's `--json` mode to feed a pre-captured file written with `[IO.File]::WriteAllText` (always UTF-8).
|
|
88
|
+
|
|
89
|
+
Flags the wrapper sets for you (all load-bearing):
|
|
90
|
+
|
|
91
|
+
| Flag | Why required |
|
|
92
|
+
|---|---|
|
|
93
|
+
| `--skip-preflight` | Standalone preflight at `https://onenote.cloud.microsoft/` is unreliable post-bootstrap (OneNote SPA forces `prompt=select_account` even when SPO cookies are valid). The Doc.aspx URL uses different cookies and works fine. |
|
|
94
|
+
| `--headless` | Without it, a visible browser opens — if the user accidentally closes it during scrape, run fails with `Target page, context or browser has been closed`. |
|
|
95
|
+
| `--timeout 120000` | Default 60s is too short under tenant load. 120s is the empirical safety margin for sections up to ~25 pages. Override with `--timeout` on the wrapper. |
|
|
96
|
+
|
|
97
|
+
### Diagnostics (when scrape produces `pages: []`)
|
|
98
|
+
|
|
99
|
+
If the scrape returns `pages: []` with `runStatus: "partial"` and `error: "frame.waitForFunction: Timeout"`, the cause is almost always one of:
|
|
100
|
+
|
|
101
|
+
1. **Single-page section regex mismatch** — fixed in v3.11.3. Verify runner is v3.11.3+.
|
|
102
|
+
2. **Page-rail genuinely slow to render** — bump `--timeout` to 180000 and retry once.
|
|
103
|
+
3. **Section URL stale** — re-run `recapture-section-url.mjs` (see C.1).
|
|
104
|
+
|
|
105
|
+
Quick in-place diagnostic to see what aria-labels the page is actually emitting (drop this in `plugin/skills/pull-onenote/diag.mjs`, run from kushi root):
|
|
106
|
+
|
|
107
|
+
```js
|
|
108
|
+
import { chromium } from 'playwright';
|
|
109
|
+
import os from 'os';
|
|
110
|
+
import path from 'path';
|
|
111
|
+
const ctx = await chromium.launchPersistentContext(
|
|
112
|
+
path.join(os.homedir(), '.copilot', 'playwright-profile', 'onenote'),
|
|
113
|
+
{ headless: true, channel: 'msedge', viewport: { width: 1400, height: 900 } }
|
|
114
|
+
);
|
|
115
|
+
const page = ctx.pages()[0] || await ctx.newPage();
|
|
116
|
+
await page.goto(process.argv[2], { timeout: 60000 });
|
|
117
|
+
await page.waitForTimeout(60000);
|
|
118
|
+
for (const f of page.frames()) {
|
|
119
|
+
try {
|
|
120
|
+
const labels = await f.evaluate(() =>
|
|
121
|
+
Array.from(document.querySelectorAll('[aria-label]'))
|
|
122
|
+
.map(n => n.getAttribute('aria-label'))
|
|
123
|
+
.filter(x => x && (x.includes('page') || x.includes('Page')))
|
|
124
|
+
);
|
|
125
|
+
if (labels.length) console.log(f.url().slice(0, 80), labels.slice(0, 40));
|
|
126
|
+
} catch (e) {}
|
|
127
|
+
}
|
|
128
|
+
await ctx.close();
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Run: `node plugin/skills/pull-onenote/diag.mjs "<section-url>"`. If you see `, Page. Selected.` but not `, page N of M, Page.` → single-page section, runner must be v3.11.3+.
|
|
132
|
+
|
|
133
|
+
## Empirical contract (what is true, validated 2026-05-13/14)
|
|
134
|
+
|
|
135
|
+
These three facts are HARD-rule and supersede any earlier doctrine in this skill or in older learnings:
|
|
136
|
+
|
|
137
|
+
1. **Browser-scrape via OneNote-for-Web is the only reliable verbatim-body path.** Tested against HCA on 2026-05-14: 16 of 16 pages captured (~120KB), vs WorkIQ's 1 of 18 (~7KB) on the same section minutes earlier. Browser-scrape uses the same SharePoint-resident OneNote canvas the user sees, so what it returns IS the page.
|
|
138
|
+
2. **WorkIQ is non-deterministic for OneNote bodies and shall not be the primary path.** Same page returned a verbatim body and `BODY-NOT-EXPOSED` 6 minutes apart with no edits. WorkIQ is retained ONLY as the auth-degraded fallback (when the browser profile cannot complete sign-in) and as the source of stream/edit-event data.
|
|
139
|
+
3. **The OneNote-for-Web `pageid` and the WorkIQ `wdpartid` are different identifiers.** Both must be persisted per page (`webPageId` for browser navigation; `wdpartid` for WorkIQ correlation and stream events). Neither alone is sufficient.
|
|
140
|
+
|
|
141
|
+
## Pre-flight
|
|
142
|
+
|
|
143
|
+
Before any retrieval, validate the Playwright profile and (only if used as fallback) the WorkIQ EULA.
|
|
144
|
+
|
|
145
|
+
### A. Playwright profile (PRIMARY)
|
|
146
|
+
|
|
147
|
+
```pwsh
|
|
148
|
+
$prof = "$env:USERPROFILE\.copilot\playwright-profile\onenote"
|
|
149
|
+
if (-not (Test-Path $prof)) {
|
|
150
|
+
Write-Host "[pull-onenote] Playwright profile not yet seeded."
|
|
151
|
+
Write-Host "[pull-onenote] Run plugin/skills/pull-onenote/runner.mjs --bootstrap once interactively to sign in."
|
|
152
|
+
# Skill MUST NOT fall through silently. Surface as run-report 'auth-required' on every pending page.
|
|
153
|
+
}
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**A.1 Browser channel — HARD rule (kushi v3.10.1+):** the runner MUST launch via Playwright's `channel: 'msedge'` (the user's installed Edge), NOT vanilla Playwright Chromium. The Microsoft tenant's Conditional Access policy denies vanilla Chromium with **"You can't get there from here — this application contains sensitive information and can only be accessed from devices or client applications that meet Microsoft management compliance policy"**. Edge is Intune-trusted; vanilla Chromium is not. This is codified in `runner.mjs` at `chromium.launchPersistentContext(...)`. If the user does not have Edge installed, the runner fails fast with a clear message — do not fall back to Chromium.
|
|
157
|
+
|
|
158
|
+
**A.2 Two-surface bootstrap — HARD rule (kushi v3.10.1+, sign-in wait fixed in v3.11.2):** `runner.mjs --bootstrap` MUST visit BOTH cookie domains in sequence within the same persisted session:
|
|
159
|
+
|
|
160
|
+
1. `https://onenote.cloud.microsoft/` — sign-in surface for OneNote-for-Web.
|
|
161
|
+
2. `https://microsoft-my.sharepoint.com/` AND `https://microsoft-my.sharepoint-df.com/` — the actual SPO hosts where Doc.aspx URLs live.
|
|
162
|
+
|
|
163
|
+
Cookie domains do not share between `*.cloud.microsoft` and `*.sharepoint(-df).com`. Signing into OneNote alone is insufficient — the runner's section URLs are SPO URLs and need separate cookies. The bootstrap script handles this automatically; do not modify the order.
|
|
164
|
+
|
|
165
|
+
**HARD rule (kushi v3.11.2+ post-auth wait):** the bootstrap MUST wait for a real OneNote post-auth UI signal — NOT for the URL to match `onenote.cloud.microsoft` (which is a no-op because we just navigated TO that URL, collapsing the wait to ~0 ms). Required selector set:
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
[aria-label*="Account manager" i], [data-automationid="NotebookList"], button[aria-label*="notebook" i], iframe[src*="onenoteframe.aspx"]
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
These are the same selectors `preflightOneNoteWeb` uses. The bootstrap log MUST emit `Sign-in detected (OneNote chrome rendered).` between Step 1/2 and Step 2/2 — its absence is the defect signature of the v3.11.1-and-earlier bug (sign-in was silently skipped, and every subsequent scrape returned `auth-required` regardless of how thoroughly the user signed in). Same anti-pattern applies to any future bootstrap surface (SharePoint, Loop, M365 admin).
|
|
172
|
+
|
|
173
|
+
**A.3 Profile reset on channel switch:** if migrating from a Chromium profile to Edge (or vice versa), delete `~/.copilot/playwright-profile/onenote/` first. Cookie/cache formats differ.
|
|
174
|
+
|
|
175
|
+
**A.4 OneNote-for-Web reachability pre-flight — HARD rule (kushi v3.11.0+):** before navigating to any section URL, the runner MUST probe `https://onenote.cloud.microsoft/` and classify the end-state into exactly one of three buckets:
|
|
176
|
+
|
|
177
|
+
| End-state | Detection | runStatus | Driver action |
|
|
178
|
+
|---|---|---|---|
|
|
179
|
+
| `ok` | Account chrome / notebook list / `onenoteframe.aspx` iframe rendered within `--preflight-timeout` (default 25s) | n/a (proceed) | Navigate to section URL as normal. |
|
|
180
|
+
| `auth-required` | URL bounced to `login.microsoftonline.com` or `login.live.com` | `auth-required` | Same as today — surface in run report; next refresh re-attempts. |
|
|
181
|
+
| `onenote-web-unavailable` | "Sorry, we ran into a problem" / "Something went wrong" / "We couldn't open" / "This notebook can't be opened" / "There was a problem" detected in any frame's body text, OR pre-flight timeout with no chrome rendered | `notebook-unavailable` | **Do NOT retry blindly.** Surface a clear diagnostic to the user with the recovery checklist below. Mark this run's pages `last_status: notebook-unavailable`. |
|
|
13
182
|
|
|
14
|
-
|
|
183
|
+
The same three-way classification ALSO applies after navigating to the section URL — if OneNote-for-Web pops the "Sorry" dialog on the section page (rather than rendering the canvas), the runner emits `runStatus: notebook-unavailable`, NOT `auth-required`. The two are distinct failure modes and must be reported as such; conflating them sends the user down the wrong recovery path (re-bootstrap auth when the real fix is to recover the notebook).
|
|
15
184
|
|
|
16
|
-
|
|
185
|
+
**Standalone pre-flight CLI** (for scripts and gate drivers):
|
|
17
186
|
|
|
18
|
-
|
|
19
|
-
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
187
|
+
```pwsh
|
|
188
|
+
node plugin/skills/pull-onenote/runner.mjs --preflight
|
|
189
|
+
# Exit 0 = ok, 4 = auth-required, 3 = onenote-web-unavailable, 1 = unexpected
|
|
190
|
+
# stdout: { "preflight": { "ok": bool, "reason"?, "detail"? } }
|
|
191
|
+
```
|
|
23
192
|
|
|
24
|
-
|
|
193
|
+
**Recovery checklist for `notebook-unavailable`** (surface verbatim to the user; do NOT auto-retry):
|
|
25
194
|
|
|
26
|
-
1. `
|
|
27
|
-
2.
|
|
28
|
-
3.
|
|
195
|
+
1. Hard-refresh `https://onenote.cloud.microsoft/` — if the root page shows the same error, the issue is service-side or notebook-side.
|
|
196
|
+
2. Open the notebook in **OneNote desktop** and let it fully sync.
|
|
197
|
+
3. If the notebook opens in desktop but not web, wait 10–15 minutes (web index lag) and retry.
|
|
198
|
+
4. If the section specifically fails (root loads, section errors), re-capture `one_sectionWebUrl` from the address bar via `recapture-section-url.mjs` — the persisted URL may be stale or for a moved section.
|
|
199
|
+
5. Only after the user can manually open the notebook in OneNote-for-Web, re-run the kushi pull.
|
|
29
200
|
|
|
30
|
-
|
|
201
|
+
The runner detects the absence of valid cookies and the redirect to `login.microsoftonline.com`. When that happens:
|
|
31
202
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
- Have `one_sectionName` only? Query: `Get full page bodies from OneNote section <section> in notebook <notebook> modified between <start> and <end>`.
|
|
37
|
-
- Neither? Query: `Find OneNote pages or sections that mention <project> or <project-aliases>. Return page titles, section names, last modified dates, and direct links.`
|
|
38
|
-
2. Apply retry pattern (max 2 retries, 3s/6s backoff). Throttling → narrow scope once (sectionFileId if available), then stop.
|
|
39
|
-
3. **Do NOT attempt Graph `/me/onenote/*`** — workspace policy: OneNote via Graph repeatedly fails 401 here. Record `graph-skipped-by-policy` if Graph is the only remaining option.
|
|
40
|
-
4. If WorkIQ exhausts → write evidence file with `❌ all paths failed` marker and `next_step: ask user to paste page text from <section>`.
|
|
203
|
+
- Mark every queued page in this run as `last_status: auth-required`.
|
|
204
|
+
- Write a refresh-report entry naming the project and counting affected pages.
|
|
205
|
+
- DO NOT re-attempt with WorkIQ for the body — WorkIQ has been empirically proven to make up the ratio with non-bodies. Instead, the next refresh re-attempts the browser path.
|
|
206
|
+
- DO use WorkIQ for any stream-only items (page-edit events) — those don't depend on body retrieval.
|
|
41
207
|
|
|
42
|
-
|
|
208
|
+
### B. WorkIQ EULA (FALLBACK)
|
|
43
209
|
|
|
44
|
-
|
|
210
|
+
```pwsh
|
|
211
|
+
workiq accept-eula # idempotent; required once per machine for fallback queries
|
|
212
|
+
```
|
|
45
213
|
|
|
46
|
-
|
|
214
|
+
### C. Browser-URL completeness gate (HARD, kushi v3.10.0+)
|
|
47
215
|
|
|
48
|
-
|
|
216
|
+
The Playwright runner navigates to `one_sectionWebUrl`. That URL is composed from FIVE registry fields, and ALL must be present and accurate:
|
|
49
217
|
|
|
50
|
-
|
|
218
|
+
| Key | Source | Notes |
|
|
219
|
+
|---|---|---|
|
|
220
|
+
| `one_sectionName` | OneNote section file name | e.g. `AGCO.one` |
|
|
221
|
+
| `one_sectionFileId` | `wd=target(<name>\|<GUID>/)` fragment | Same GUID as WorkIQ's `wdsectionfileid` |
|
|
222
|
+
| `one_notebookSourceDoc` | `sourcedoc={<GUID>}` query param | Identifies the parent notebook |
|
|
223
|
+
| `one_notebookSpoBaseUrl` | `<scheme>://<host>/personal/<upn>` segment | Tenant-specific |
|
|
224
|
+
| `one_sectionWebUrl` | the canonical Doc.aspx URL | **MUST be user-pasted from address bar — see C.1** |
|
|
51
225
|
|
|
52
|
-
|
|
226
|
+
**C.1 No-synthesis rule (HARD, kushi v3.10.2+):** `one_sectionWebUrl` MUST be the URL the user actually copied from OneNote-for-Web's address bar. There is NO reliable formula to synthesize it from the four sub-fields. Two formulas were tried and BOTH failed in production:
|
|
53
227
|
|
|
54
|
-
|
|
228
|
+
- `wd=target(<name>|<fileId>/)` → silent "Sorry, we ran into a problem" dialog.
|
|
229
|
+
- `wd=target(/<name>/)` → silent "Sorry, we ran into a problem" dialog (even when this exact form works for sibling sections in the same notebook).
|
|
230
|
+
|
|
231
|
+
OneNote's routing depends on internal session/tenant tokens we cannot reverse-engineer. Any URL constructed by string concatenation MUST be considered invalid. The recapture script's `tryAutoHeal()` may inherit notebook-level fields (`one_notebookSourceDoc`, `one_notebookSpoBaseUrl`, `one_notebookName`) from a sibling project sharing the same notebook — but it MUST fall through to the interactive paste prompt for the section URL itself.
|
|
232
|
+
|
|
233
|
+
**C.2 Auto-heal scope:** `tryAutoHeal()` returns `{ stillNeedsPaste: true }` whenever the section URL is missing, even if all notebook fields were inherited. The gate driver MUST honor this signal and prompt for paste.
|
|
234
|
+
|
|
235
|
+
**Rule:** A URL synthesized by template (i.e. one the user has not actually opened in a browser) is NOT acceptable. Common failure mode: an older bootstrap wrote `sharepoint-df.com` (dogfood) when the real tenant is `sharepoint.com`, or wrote a `sourcedoc` GUID that does not match the user's actual notebook. The result is OneNote-for-Web's silent error dialog "Sorry, we ran into a problem" — which the runner cannot distinguish from auth-required.
|
|
236
|
+
|
|
237
|
+
**Gate:** Before dispatching the runner for a project, check completeness with:
|
|
238
|
+
|
|
239
|
+
```pwsh
|
|
240
|
+
node plugin/skills/pull-onenote/scripts/recapture-section-url.mjs `
|
|
241
|
+
--project <name> `
|
|
242
|
+
--engagement-root <engagement-root> `
|
|
243
|
+
--check
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
Exit 0 + `{"status":"ok"}` → proceed to Step A.
|
|
247
|
+
Exit 1 + `{"status":"incomplete","missing":[...]}` → invoke recapture (interactive paste prompt):
|
|
248
|
+
|
|
249
|
+
```pwsh
|
|
250
|
+
node plugin/skills/pull-onenote/scripts/recapture-section-url.mjs `
|
|
251
|
+
--project <name> `
|
|
252
|
+
--engagement-root <engagement-root>
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
The script prompts the user to open OneNote-for-Web, click into the section, copy the address-bar URL, and paste it. It parses the URL, validates structure, and persists all five fields to `m365-mutable.json`. **This gate runs in BOTH bootstrap and refresh** — if a refresh discovers the gate is open, it MUST self-heal via this script before invoking the runner. This is NOT re-discovery (which is forbidden by `m365-id-registry.instructions.md`); it is a one-time backfill of pre-doctrine or stale registry fields, with the resolved IDs cached forever.
|
|
256
|
+
|
|
257
|
+
If the user declines to paste a URL (script exits non-zero with no fields persisted), mark the OneNote source as `disabled: true, reason: section-url-not-captured` in `<project>/integrations.yml#boundaries.onenote`, log a refresh-report entry, and skip the runner. Do NOT fall back to WorkIQ-only — empirical proof shows WorkIQ returns BODY-NOT-EXPOSED for most pages.
|
|
258
|
+
|
|
259
|
+
## Bootstrap discovery (one-time per project)
|
|
260
|
+
|
|
261
|
+
For a project we have not pulled OneNote for before, capture into `m365-mutable.json#knownSections.<projectKey>`:
|
|
262
|
+
|
|
263
|
+
| Key | Source | Example (HCA) |
|
|
264
|
+
|---|---|---|
|
|
265
|
+
| `one_sectionName` | OneNote section file name | `HCA.one` |
|
|
266
|
+
| `one_sectionFileId` | OneNote-for-Web wd= URL fragment AND WorkIQ wdsectionfileid (they match) | `3d0ad388-fd6b-45a8-8619-60dd709b7ade` |
|
|
267
|
+
| `one_notebookSourceDoc` | SharePoint sourcedoc GUID for the parent notebook | `2036c7b1-db1b-47fd-a14f-d8ee94ddd9bc` |
|
|
268
|
+
| `one_notebookName` | OneNote notebook display name | `ISE Work` |
|
|
269
|
+
| `one_notebookSpoBaseUrl` | the host segment of the SharePoint URL before `/_layouts/15/Doc.aspx` | `https://microsoft-my.sharepoint-df.com/personal/<upn>` |
|
|
270
|
+
|
|
271
|
+
Discover by:
|
|
272
|
+
|
|
273
|
+
1. Open OneNote-for-Web at `https://onenote.cloud.microsoft/`. Sign in.
|
|
274
|
+
2. Open the project's notebook → click the project's section.
|
|
275
|
+
3. Read the URL — it contains both `sourcedoc={<notebookSourceDoc>}` and `wd=target(<sectionName>|<sectionFileId>/...)`.
|
|
276
|
+
4. Persist all five values to `m365-mutable.json` per `m365-id-registry.instructions.md`.
|
|
277
|
+
|
|
278
|
+
If the section already exists in `knownSections.<projectKey>` from a prior WorkIQ run, only the browser-specific fields (`one_notebookSourceDoc`, `one_notebookName`, `one_notebookSpoBaseUrl`) need to be added. The `one_sectionFileId` is the same for both paths.
|
|
279
|
+
|
|
280
|
+
## Step A — enumerate pages (browser, primary)
|
|
281
|
+
|
|
282
|
+
The runner navigates to the section's deep-link URL and waits for the OneNote canvas to render inside the nested `ffc-onenote.officeapps.live.com/onenoteframe.aspx` frame. Page enumeration uses the accessibility tree and MUST handle BOTH aria-label formats (kushi v3.11.3+):
|
|
283
|
+
|
|
284
|
+
| Section state | aria-label format | Parsed as |
|
|
285
|
+
|---|---|---|
|
|
286
|
+
| Multi-page | `<title>, page N of M, Page.` | `{ title, pos: N, total: M }` |
|
|
287
|
+
| Single-page | `<title>, Page. Selected.` | `{ title, pos: 1, total: 1 }` |
|
|
288
|
+
|
|
289
|
+
```js
|
|
290
|
+
// runner.mjs excerpt (v3.11.3)
|
|
291
|
+
const pages = await wac.evaluate(() => {
|
|
292
|
+
const out = [];
|
|
293
|
+
const seen = new Set();
|
|
294
|
+
for (const n of document.querySelectorAll('[aria-label]')) {
|
|
295
|
+
const label = n.getAttribute('aria-label') || '';
|
|
296
|
+
let m = label.match(/^(.*?), page (\d+) of (\d+), Page/);
|
|
297
|
+
if (m) {
|
|
298
|
+
const key = `${m[1]}|${m[2]}`;
|
|
299
|
+
if (!seen.has(key)) { seen.add(key); out.push({ title: m[1], pos: parseInt(m[2]), total: parseInt(m[3]) }); }
|
|
300
|
+
continue;
|
|
301
|
+
}
|
|
302
|
+
m = label.match(/^(.*?), Page\. Selected\./);
|
|
303
|
+
if (m) {
|
|
304
|
+
const key = `${m[1]}|1`;
|
|
305
|
+
if (!seen.has(key)) { seen.add(key); out.push({ title: m[1], pos: 1, total: 1 }); }
|
|
306
|
+
}
|
|
307
|
+
}
|
|
308
|
+
return out;
|
|
309
|
+
});
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
**HARD rule (kushi v3.11.3+):** any aria-label-driven enumerator MUST handle the N=1 special-case format. Single-page sections do NOT render `, page N of M, Page.` — assuming they do is the defect signature `frame.waitForFunction: Timeout` with `pages: []` on a section that visibly has one page in the rail. Same anti-pattern applies to `waitForFunction` — its predicate must accept either format.
|
|
313
|
+
|
|
314
|
+
This returns the canonical, ordered, complete page list as OneNote itself sees it — no SharePoint search-index residue.
|
|
315
|
+
|
|
316
|
+
After clicking each page, the runner reads the URL to capture the `webPageId` from the `wd=target(...|<title>|<webPageId>/)` segment. Persist both `wdpartid` (from any prior WorkIQ enumeration, if available) and `webPageId` per entry.
|
|
317
|
+
|
|
318
|
+
## Step B — fetch verbatim body (browser, primary)
|
|
319
|
+
|
|
320
|
+
For each page, click its `aria-label` entry in the page rail, wait 2.5s for the canvas to settle, then:
|
|
321
|
+
|
|
322
|
+
```js
|
|
323
|
+
const body = await wac.evaluate(() => {
|
|
324
|
+
const node = document.querySelector('#PageContentWrapper')
|
|
325
|
+
|| document.querySelector('.Page')
|
|
326
|
+
|| document.querySelector('[role="main"]');
|
|
327
|
+
return node ? node.innerText : '';
|
|
328
|
+
});
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
Acceptance check (HARD rule):
|
|
332
|
+
|
|
333
|
+
- A captured body must contain at minimum the page title line and one non-whitespace non-toolbar paragraph.
|
|
334
|
+
- A body shorter than 50 chars is acceptable IF the page genuinely is sparse (verify by visiting in OneNote desktop), otherwise mark `last_status: short-suspect` and retry next refresh.
|
|
335
|
+
- If the page redirected to `login.microsoftonline.com` instead of rendering, mark `last_status: auth-required` and exit the run loop early — do not continue with stale auth.
|
|
336
|
+
|
|
337
|
+
## Step B' — fallback: WorkIQ verbatim probe
|
|
338
|
+
|
|
339
|
+
Used ONLY when:
|
|
340
|
+
|
|
341
|
+
- The Playwright profile is absent on this machine, OR
|
|
342
|
+
- The previous run hit auth-required and retry-cooldown has not elapsed (24h), OR
|
|
343
|
+
- The user explicitly requests WorkIQ-only for diagnostic comparison.
|
|
344
|
+
|
|
345
|
+
Canonical WorkIQ Step B query (the Nova-pattern):
|
|
346
|
+
|
|
347
|
+
> Return the FULL readable content verbatim of the page titled `<title>` in my OneNote section `<one_sectionName>` in notebook `<one_notebookName>`. Do not summarize. Do not paraphrase. If the body is not exposed, say so explicitly with the literal phrase `BODY-NOT-EXPOSED` on its own line.
|
|
348
|
+
|
|
349
|
+
Acceptance: real body OR literal `BODY-NOT-EXPOSED`. Anything else is rejected and counted as `last_status: workiq-degraded`.
|
|
350
|
+
|
|
351
|
+
## Per-page retry registry (dual-ID schema)
|
|
352
|
+
|
|
353
|
+
Persisted at `m365-mutable.json#knownSections.<projectKey>.one_pages[]`. Each entry:
|
|
354
|
+
|
|
355
|
+
```json
|
|
356
|
+
{
|
|
357
|
+
"title": "4/3 - HCA with Jay and Martin",
|
|
358
|
+
"wdpartid": "2233ac5a-007a-4b70-9d93-d6113f318ba3",
|
|
359
|
+
"webPageId": "78aac9b5-0629-4daa-ada3-f2436cb2381c",
|
|
360
|
+
"lastModified": "April 3rd",
|
|
361
|
+
"last_status": "captured",
|
|
362
|
+
"captured_via": "browser",
|
|
363
|
+
"attempts": 2,
|
|
364
|
+
"last_attempt_at": "2026-05-14T03:55:00Z",
|
|
365
|
+
"snapshot_path": "Evidence/ushak/onenote/snapshot/pages/4-3---HCA-with-Jay-and-Martin.md",
|
|
366
|
+
"captured_at": "2026-05-14T03:55:00Z"
|
|
367
|
+
}
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
`last_status` is one of:
|
|
371
|
+
|
|
372
|
+
| Status | Meaning | Next-run action |
|
|
373
|
+
|---|---|---|
|
|
374
|
+
| `captured` | Browser scrape returned full body | None unless `lastModified` advanced |
|
|
375
|
+
| `user-pasted` | Human pasted body into snapshot file | Treat as captured; do not overwrite |
|
|
376
|
+
| `auth-required` | Browser hit MFA / sign-in page | Retry browser; surface in run report |
|
|
377
|
+
| `notebook-unavailable` | OneNote-for-Web error dialog ("Sorry, we ran into a problem" / "We couldn't open") instead of rendering — service- or notebook-side, NOT auth | Do NOT auto-retry. Surface recovery checklist (SKILL §A.4); next run re-checks pre-flight. |
|
|
378
|
+
| `workiq-degraded` | Browser unavailable AND WorkIQ returned BODY-NOT-EXPOSED | Retry browser when profile valid again |
|
|
379
|
+
| `BODY-NOT-EXPOSED` | WorkIQ-fallback explicitly returned the literal marker | Retry browser next run |
|
|
380
|
+
| `short-suspect` | Body < 50 chars, may not be genuinely sparse | Verify in OneNote desktop, then retry |
|
|
381
|
+
| `enumeration-only` | Page enumerated but Step B not yet attempted | Attempt Step B on next run |
|
|
382
|
+
|
|
383
|
+
Retry doctrine: every refresh re-runs Step A and re-runs Step B for any page where `last_status NOT IN ('captured', 'user-pasted')`. Pages with `last_status='captured'` are re-fetched only if `lastModified` differs from the registry's last-known value.
|
|
384
|
+
|
|
385
|
+
## Step C — stream events
|
|
386
|
+
|
|
387
|
+
Stream still uses WorkIQ — page-edit events are surfaced through search-index activity, not through page bodies, and that surface IS deterministic. No change vs v2.5.0.
|
|
388
|
+
|
|
389
|
+
## Snapshot file shape
|
|
390
|
+
|
|
391
|
+
Front-matter (yaml) at top of every `snapshot/pages/<safe-title>.md`:
|
|
392
|
+
|
|
393
|
+
```yaml
|
|
394
|
+
---
|
|
395
|
+
page_title: "4/3 - HCA with Jay and Martin"
|
|
396
|
+
section: "HCA.one"
|
|
397
|
+
notebook: "ISE Work"
|
|
398
|
+
section_id: "3d0ad388-fd6b-45a8-8619-60dd709b7ade"
|
|
399
|
+
wdpartid: "2233ac5a-007a-4b70-9d93-d6113f318ba3"
|
|
400
|
+
webPageId: "78aac9b5-0629-4daa-ada3-f2436cb2381c"
|
|
401
|
+
last_modified: "April 3rd"
|
|
402
|
+
last_status: "captured"
|
|
403
|
+
captured_via: "browser"
|
|
404
|
+
attempts: 2
|
|
405
|
+
last_attempt: "2026-05-14T03:55:00Z"
|
|
406
|
+
captured_at: "2026-05-14T03:55:00Z"
|
|
407
|
+
---
|
|
408
|
+
```
|
|
55
409
|
|
|
56
|
-
|
|
410
|
+
Below the front-matter:
|
|
57
411
|
|
|
58
|
-
|
|
412
|
+
- IF `last_status` ∈ {`captured`, `user-pasted`}: a `## AI Narrative Summary` section (3+ paragraphs) AND a `## Body (verbatim)` section with the exact text the runner extracted (or the user pasted). NEVER paraphrase a captured body.
|
|
413
|
+
- IF `last_status ∈ {auth-required, workiq-degraded, BODY-NOT-EXPOSED}`: an explicit unavailable-marker block, a `### next_step` block, and a metadata table. Do NOT fabricate body content from emails or chats.
|
|
414
|
+
- IF `last_status == enumeration-only`: a transient marker only. Do NOT include any body content.
|
|
415
|
+
- IF `last_status == short-suspect`: include whatever was captured plus an `### unconfirmed` note.
|
|
59
416
|
|
|
60
|
-
##
|
|
417
|
+
## Depth bar
|
|
61
418
|
|
|
62
|
-
|
|
419
|
+
For any captured page, the `## AI Narrative Summary` must be self-contained — the reader must not need to consult the verbatim body to understand what the page is about. Cover: meeting/topic, who participated, what was decided, what's open, dates referenced. No fabrication: nothing in the summary or verbatim body section may originate from sources outside the page itself.
|
|
63
420
|
|
|
64
|
-
|
|
65
|
-
- `one_sectionName` — display name
|
|
66
|
-
- `one_sectionWebUrl` — direct link
|
|
421
|
+
## Run report
|
|
67
422
|
|
|
68
|
-
|
|
423
|
+
Every refresh writes `Evidence/<alias>/refresh-reports/<YYYY-MM-DD>-<HHMM>-onenote.md` with:
|
|
69
424
|
|
|
70
|
-
|
|
71
|
-
-
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
75
|
-
- `sources.onenote.errors = [...]` per `auth-and-retry.instructions.md` schema
|
|
76
|
-
- For each week touched, add to `weekly_files` index.
|
|
425
|
+
- Total pages enumerated (browser-authoritative count)
|
|
426
|
+
- Per-status counts (captured / auth-required / workiq-degraded / etc.)
|
|
427
|
+
- List of pages whose `last_modified` advanced since prior run
|
|
428
|
+
- Whether the runner exited early due to `auth-required`
|
|
429
|
+
- Any short-suspect entries needing user verification
|
|
77
430
|
|
|
78
|
-
## Stop conditions
|
|
79
431
|
|
|
80
|
-
- Hint missing AND fuzzy resolution returns 0 candidates → ask user once, persist answer to mutable, continue.
|
|
81
|
-
- Multiple plausible candidates → ask user to pick, persist answer.
|
|
82
|
-
- All paths failed → write evidence file with `❌ all paths failed` + actionable `next_step`, log to run-log errors, continue with rest of run.
|