npm - kushi-agents - Versions diffs - 3.4.2 → 3.13.0 - Mend

kushi-agents 3.4.2 → 3.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

package/.github/copilot-instructions.kushi.md +38 -0
package/README.md +33 -0
package/bin/cli.mjs +2 -0
package/package.json +17 -4
package/plugin/agents/kushi.agent.md +155 -147
package/plugin/instructions/ado-bootstrap-discovery.instructions.md +111 -0
package/plugin/instructions/ado-engagement-tree.instructions.md +73 -0
package/plugin/instructions/answer-from-evidence.instructions.md +1 -1
package/plugin/instructions/auth-and-retry.instructions.md +51 -16
package/plugin/instructions/azure-auth-patterns.instructions.md +13 -6
package/plugin/instructions/bootstrap-status-format.instructions.md +113 -0
package/plugin/instructions/capture-learnings.instructions.md +95 -0
package/plugin/instructions/cleanup-on-resolution.instructions.md +69 -0
package/plugin/instructions/crm-bootstrap-discovery.instructions.md +79 -0
package/plugin/instructions/crm-internal-vs-confirmed.instructions.md +79 -0
package/plugin/instructions/evidence-confidence-ladder.instructions.md +66 -0
package/plugin/instructions/evidence-layout-canonical.instructions.md +115 -0
package/plugin/instructions/evidence-thoroughness.instructions.md +82 -12
package/plugin/instructions/full-view-gate.instructions.md +91 -0
package/plugin/instructions/m365-id-registry.instructions.md +134 -0
package/plugin/instructions/meetings-verbatim-required.instructions.md +176 -0
package/plugin/instructions/run-reports.instructions.md +129 -0
package/plugin/instructions/scope-boundaries.instructions.md +218 -0
package/plugin/instructions/snapshot-vs-stream.instructions.md +2 -0
package/plugin/instructions/update-ledger.instructions.md +132 -0
package/plugin/instructions/verbatim-by-default.instructions.md +73 -0
package/plugin/instructions/workiq-first.instructions.md +15 -31
package/plugin/instructions/workiq-only.instructions.md +193 -0
package/plugin/learnings/README.md +50 -0
package/plugin/learnings/ado.md +45 -0
package/plugin/learnings/crm.md +96 -0
package/plugin/learnings/cross-cutting.md +36 -0
package/plugin/learnings/email.md +33 -0
package/plugin/learnings/meetings.md +30 -0
package/plugin/learnings/misc.md +46 -0
package/plugin/learnings/onenote.md +215 -0
package/plugin/learnings/sharepoint.md +5 -0
package/plugin/learnings/teams.md +5 -0
package/plugin/plugin.json +22 -2
package/plugin/prompts/apply-ado.prompt.md +14 -0
package/plugin/prompts/propose-ado.prompt.md +12 -0
package/plugin/reference-packs/fde/crm-field-manifest.md +165 -0
package/plugin/skills/apply-ado-update/SKILL.md +125 -0
package/plugin/skills/ask-project/SKILL.md +2 -0
package/plugin/skills/bootstrap-project/SKILL.md +81 -3
package/plugin/skills/propose-ado-update/SKILL.md +108 -0
package/plugin/skills/pull-ado/SKILL.md +173 -23
package/plugin/skills/pull-crm/SKILL.md +168 -15
package/plugin/skills/pull-email/SKILL.md +139 -22
package/plugin/skills/pull-meetings/SKILL.md +109 -25
package/plugin/skills/pull-misc/README.md +84 -0
package/plugin/skills/pull-misc/SKILL.md +257 -0
package/plugin/skills/pull-misc/runner.mjs +280 -0
package/plugin/skills/pull-onenote/README.md +90 -0
package/plugin/skills/pull-onenote/SKILL.md +400 -51
package/plugin/skills/pull-onenote/runner.mjs +356 -0
package/plugin/skills/pull-onenote/scripts/recapture-section-url.mjs +295 -0
package/plugin/skills/pull-onenote/write-snapshot.mjs +271 -0
package/plugin/skills/pull-sharepoint/SKILL.md +44 -12
package/plugin/skills/pull-teams/SKILL.md +40 -11
package/plugin/skills/refresh-project/SKILL.md +33 -2
package/plugin/skills/self-check/run.ps1 +186 -4
package/plugin/templates/ado-update/discussion-comment.template.md +26 -0
package/plugin/templates/ado-update/integrations-ado-writes.example.yml +49 -0
package/plugin/templates/ado-update/proposed.template.md +78 -0
package/plugin/templates/init/external-links.template.txt +30 -0
package/plugin/templates/init/project-integrations.template.yml +57 -2
package/plugin/templates/snapshot/meeting-verbatim.template.md +110 -0
package/plugin/templates/snapshot/meetings-series-index.template.md +3 -1
package/plugin/templates/snapshot/onenote-page.template.md +92 -23
package/plugin/templates/weekly/meetings-stream.template.md +11 -6
package/src/copilot-instructions.mjs +80 -0
package/src/main.mjs +18 -1

package/plugin/skills/pull-meetings/SKILL.md CHANGED Viewed

@@ -1,17 +1,29 @@
 ---
 name: "pull-meetings"
-version: "2.1.0"
-description: "Pull Meetings evidence (snapshot: series-index; stream: per-meeting transcripts + decisions + actions). WorkIQ-first; falls back to chat reconstruction when transcripts don't exist."
+version: "2.2.1"
+description: "Pull Meetings evidence in THREE shapes: snapshot/ series-index, stream/ per-meeting curated blocks, AND verbatim/ raw immutable folder per meeting (meetings expire — verbatim/ MUST contain the full transcript text, not just chat). v2.4.0: WorkIQ-ONLY for transcript capture per workiq-only.instructions.md (m365_get_transcript / Graph REST FORBIDDEN — they have near-100% failure rate in this workspace). Chat is parallel supporting evidence (via m365_list_chat_messages structured dump), NEVER a transcript substitute."
 ---
 # Skill: pull-meetings
-Pulls **meetings** evidence in two shapes per `snapshot-vs-stream.instructions.md`:
-- **snapshot/** — series-index.md listing all recurring meeting series for this project (subject, recurrence, organizer, current attendees)
-- **stream/** — per-meeting blocks: attendees (req/opt/actual), agenda, **chronological transcript walk-through with verbatim quotes + timestamps**, decisions, actions, open questions, artifact links
+> **v3.7.6 + v3.10.0 + v3.11.0 contracts** — This skill operates under six HARD-rule doctrines:
+> - `verbatim-by-default.instructions.md` — full bodies/notetext/fields by default; no preview-grade pulls accepted.
+> - `meetings-verbatim-required.instructions.md` (v3.10.0) — meetings are an EXPIRING evidence class; every captured meeting MUST also produce a sibling `verbatim/<YYYY-MM-DD-HHMM>_<slug>/` folder with raw chat + transcript + recording URL. Curated snapshot alone is a defect.
+> - **`workiq-only.instructions.md` (v3.11.0)** — transcripts, facilitator notes, calendar discovery, and chat-thread human rendering go through WorkIQ ONLY. `m365_get_transcript`, `m365_get_facilitator_notes`, `m365_list_meetings`, `m365_list_events`, and Graph REST URLs are FORBIDDEN as fallbacks (they fail nearly every call). The canonical WorkIQ prompts are codified in that instruction — do not re-discover them.
+> - `capture-learnings.instructions.md` — every fix/discovery is logged to `plugin/learnings/<source>.md` immediately.
+> - `cleanup-on-resolution.instructions.md` — when a value resolves, all stale `no-match` / `not yet` notes referencing the prior unresolved state must be rewritten in the same turn.
+> - `run-reports.instructions.md` — every refresh writes a per-user report under `Evidence/<alias>/refresh-reports/YYYY-MM-DD-HHMM_refresh.md`.
-Auth + retry + error logging per `auth-and-retry.instructions.md`. WorkIQ-first per `workiq-first.instructions.md`. Thoroughness per `evidence-thoroughness.instructions.md`; runtime detector + auto-retry + paste-prompt per `thoroughness-detector.instructions.md`. Citations per `citation-ledger.instructions.md`.
+> **Canonical evidence layout** (HARD, kushi v3.12.1+): all artifacts produced by this skill MUST be written under `<project>/Evidence/<alias>/<source>/{snapshot,stream,...}/` — sibling folders under `<project>/` (e.g. `<project>/<source>-context/`, `<project>/<source>/`, `<project>/_Weekly Summaries/`) are FORBIDDEN. See `evidence-layout-canonical.instructions.md`.
+Pulls **meetings** evidence in three shapes per `snapshot-vs-stream.instructions.md` + `meetings-verbatim-required.instructions.md`:
+- **snapshot/** — `series-index.md` listing all recurring meeting series for this project (subject, recurrence, organizer, current attendees)
+- **stream/** — per-meeting curated blocks: attendees (req/opt/actual), agenda, **chronological transcript walk-through with verbatim quotes + timestamps**, decisions, actions, open questions, artifact links. Each block MUST cite the matching verbatim/ folder.
+- **verbatim/** (REQUIRED, NEW v2.2.0) — per-meeting subfolder `verbatim/<YYYY-MM-DD-HHMM>_<slug>/` containing the raw immutable capture: `captured-at.txt`, `chat-messages.json`, `chat-messages.md`, `transcript.vtt` or `transcript-source.md`, `recording-url.txt`, `recap-card.md`, `attachments/`, `coverage.md`. See `templates/snapshot/meeting-verbatim.template.md` for the contract.
+Auth + retry + error logging per `auth-and-retry.instructions.md`. WorkIQ-only per `workiq-only.instructions.md`. Thoroughness per `evidence-thoroughness.instructions.md`; runtime detector + auto-retry + paste-prompt per `thoroughness-detector.instructions.md`. Citations per `citation-ledger.instructions.md`.
 ## Inputs
@@ -20,39 +32,110 @@ Auth + retry + error logging per `auth-and-retry.instructions.md`. WorkIQ-first
 - `<window>` — date range. For snapshot: ignored (always full re-fetch). For stream: `(from, to)`.
 - (read) `<engagement-root>/.project-evidence/m365/m365-mutable.json m365Mutable.knownSections.<project>` — pinned hints (`calendarContext.subjectKeywords`, `calendarContext.knownSeries`).
-## Discovery (snapshot pass)
+## Discovery (snapshot pass — WorkIQ ONLY per `workiq-only.instructions.md`)
-1. `m365_list_meetings` over the window with `subjectKeywords` filter (post-filter — Graph search is relevance-ranked, not exact).
-2. For each match: capture `id`, `subject`, `start`, `end`, `organizer`, `joinUrl`. Persist to mutable as `calendarContext.knownSeries[]`.
-3. Cross-reference with Teams chat IDs from `pull-teams` (meeting chat IDs of form `19:meeting_…@thread.v2`). Each match enriches with attendee roster + chat ID.
+1. **WorkIQ meeting discovery** (canonical prompt from `workiq-only.instructions.md` Calendar/online meetings section):
+   > `List my Teams meetings between <start> and <end> where the subject contains "<token>". Return subject, date, start time, organizer, joinUrl, and Teams chat id.`
+   - Issued ONCE per token in `calendarContext.subjectKeywords`.
+   - For each match: capture `subject`, `start`, `end`, `organizer`, `joinUrl`, `chatId`. Persist to mutable as `calendarContext.knownSeries[]`.
+2. Cross-reference returned `chatId` values with `boundaries.teams.chat_ids[]` — every meeting whose chat-id falls inside the teams boundary is treated as in-scope. This enriches each meeting with the attendee roster the `pull-teams` snapshot already captured.
+**FORBIDDEN** for discovery: `m365_list_meetings`, `m365_list_events`, Graph REST `/me/calendar/events`, Graph REST `/me/onlineMeetings`. Use WorkIQ.
 Write `snapshot/series-index.md` listing all matched series. Updated every refresh (organizers/attendees may change).
-## Per-meeting stream pass — path cascade
+## Boundaries (REQUIRED — see `scope-boundaries.instructions.md`)
+This skill REFUSES to query unless `<engagement-root>/<project>/integrations.yml#boundaries.meetings` is satisfied:
+- `boundaries.meetings.series_join_urls` — REQUIRED, non-empty (pinned join URLs; no subject-keyword fuzz).
+- `boundaries.meetings.organizer_emails` — optional additional filter.
+- `boundaries.date_window_days` — defaults to 30 if absent.
+The pre-existing `subjectKeywords` / `knownSeries` discovery loop is now a **bootstrap-time aid only** — it helps populate `boundaries.meetings.series_join_urls` once. At pull time, only meetings whose `joinUrl` is in the boundary list are processed.
+Refusal message when boundary is missing:
+```
+meetings-boundary-missing — add boundaries.meetings.series_join_urls to
+<engagement-root>/<project>/integrations.yml. See doctrine in
+plugin/instructions/scope-boundaries.instructions.md.
+```
+## Per-meeting capture cascade — verbatim/ FIRST, then curated stream
+For EACH meeting in the window, the cascade has **two halves**: (A) write raw artifacts to `verbatim/<YYYY-MM-DD-HHMM>_<slug>/`, then (B) produce the curated stream block citing those verbatim files. Half A is mandatory before Half B per `meetings-verbatim-required.instructions.md`.
+### Half A — verbatim/ capture (REQUIRED, transcript-first, WorkIQ-only per workiq-only.instructions.md)
+For every meeting, BEFORE writing any curated text:
+0. Compute slug + timestamp, create `Evidence/<alias>/meetings/verbatim/<YYYY-MM-DD-HHMM>_<slug>/`, write `captured-at.txt` (started_at, kushi_version).
-For EACH meeting in the window, attempt to capture transcript text via this cascade. Apply retry pattern per `auth-and-retry.instructions.md`. Record errors[] entry for each path attempted.
+1. **Transcript cascade — WorkIQ-only, EXHAUSTIVE, in order. Do NOT stop at first weak result; record each path's result in `coverage.md`.** Per `meetings-verbatim-required.instructions.md` "Transcript capture cascade":
+   - **a. WorkIQ full-transcript pull (REQUIRED first attempt)** with the canonical prompt from `workiq-only.instructions.md`:
+     > `Find the Teams meeting titled "<subject>" that occurred on <YYYY-MM-DD>. Return the full transcript verbatim with speaker labels and timestamps. Do not summarize.`
+     - If WorkIQ returns full speaker-turn text (≥ ~10 distinct `Name:` lines) → strip `request-id:` prefix, save body as `transcript.txt` with header (source, query, request-id, fidelity).
+     - If WorkIQ returns a summary (paragraphs only, no speaker turns) → run the **doubled-strict retry** from `workiq-only.instructions.md` once. If still summary-only, save as `transcript-source.md` with the `WARNING: NOT a verbatim transcript` header.
+   - **b. WorkIQ Copilot recap (SUPPLEMENTARY, always attempt)**:
+     > `Get the Copilot meeting recap (decisions, action items, key points) for the Teams meeting "<subject>" on <YYYY-MM-DD>. Return the FULL recap card verbatim. Do not summarize.`
+     - Save as `facilitator-notes.md`. Supplementary; does NOT replace the transcript.
+   - **c. WorkIQ recording-URL discovery**:
+     > `For the meeting "<subject>" on <YYYY-MM-DD>, return the SharePoint Stream recording URL if it exists, and the calendar event body.`
+     - Save URL(s) to `recording-url.txt`. If a `.mp4` URL is present, downloading the recording binary is governed by the `pull-meetings` recording-download carve-out (size < 200MB) — this is the ONLY allowed `m365_download_file` call in any kushi pull-* skill, because the recording is binary media and WorkIQ does not download binaries.
+   - **d. User-paste fallback (first-class, NOT a degradation)** — when (a) returns empty/`body-unavailable` after doubled-strict retry AND (b) returns nothing usable: prompt the user to paste verbatim transcript or notes. Save to `transcript.txt` with header `User-pasted; WorkIQ returned no transcript on <ISO timestamp>`.
-1. **WorkIQ**: `Get full transcript and Copilot recap for meeting "<subject>" on <date>`.
-2. **`m365_get_transcript`** with the meeting's `joinUrl`.
-3. **Reconstruction from chat** (NOT a failure — first-class fallback). When transcripts simply don't exist (transcription was off, or VTT not yet attached), reconstruct evidence from:
-   - Meeting chat thread: `m365_list_chat_messages(chatId)` — captured already by `pull-teams`.
-   - Pre/post-meeting context messages.
-   - Any Copilot recap card posted to the chat.
-   Mark the per-meeting block `Source basis: reconstructed from meeting chat (no transcript)`. This is **evidence**, not a gap.
-4. **Ask user**: paste verbatim transcript or notes if all above produced nothing usable.
+   **FORBIDDEN** (do NOT attempt; do NOT log as cascade steps in coverage.md): `m365_get_transcript`, `m365_get_facilitator_notes`, `m365_list_meetings`, `m365_list_events`, Graph REST `/me/onlineMeetings/.../transcripts/.../content`. These have a near-100% failure rate in this workspace and pollute the coverage trail. Use WorkIQ steps (a)/(b)/(c).
-A meeting block built from chat reconstruction with attendees + key decisions + actions is **valid evidence**. Do NOT mark such a meeting as `failed`. Mark it `partial` only if NO source produced any content (no transcript AND no chat messages).
+2. **Chat capture (ALWAYS, parallel with step 1 — allowed exception to workiq-only)**: `m365_list_chat_messages(chatId)` → `chat-messages.json` (raw structured-data dump) + `chat-messages.md` (rendered, one heading-block per message). Chat is **supporting evidence**; it is **NEVER** a transcript substitute. If `m365_list_chat_messages` fails, run the WorkIQ chat-thread prompt as fallback (per `workiq-only.instructions.md` Teams chat thread section).
+3. Walk chat for attachments → `m365_download_file` each → `attachments/<original-name>` (binary download carve-out, same as recording.mp4).
+4. Walk chat for Copilot recap card → `recap-card.md` verbatim.
+5. **Classify** the verbatim folder per the doctrine:
+   - `transcript-complete` — at least one of `transcript.vtt` / `transcript.txt` / `transcript-source.md` is non-empty.
+   - `transcript-missing` — only `chat-messages.*` present; all WorkIQ cascade paths (1a–1d) returned empty.
+6. Write `coverage.md` with classification + per-path attempt log (every WorkIQ path attempted, with request-id, even successful ones) + source-basis classification for stream/.
+7. Update `captured-at.txt` with `completed_at` + `final_status` (one of `transcript-complete-text` / `transcript-complete-summary-only` / `transcript-missing-chat-only` / `unrecoverable`).
+If 1a–1d ALL return empty AND step 2 chat is also empty AND user paste is declined → `unrecoverable`. Apply retry pattern per `auth-and-retry.instructions.md`.
+### Half B — curated stream/ block (cites verbatim/)
+Build the per-meeting block in `stream/<week>_meetings-stream.md` using ONLY content already persisted in verbatim/. Every assertion cites a verbatim file. Preferred citation chain:
+- `[source: Evidence/<alias>/meetings/verbatim/<dir>/transcript.vtt · <date>]` when transcript-complete-vtt.
+- `[source: Evidence/<alias>/meetings/verbatim/<dir>/transcript.txt · <date>]` when transcript-complete-text.
+- `[source: Evidence/<alias>/meetings/verbatim/<dir>/transcript-source.md · <date>]` when only Copilot summary (always note the warning).
+- `[source: Evidence/<alias>/meetings/verbatim/<dir>/chat-messages.md · <date>]` only for assertions backed by chat (not transcript content).
+`Source basis` line in the curated block reflects verbatim classification:
+- `transcript (raw VTT)` — transcript.vtt present
+- `transcript (plain text)` — transcript.txt present, no VTT
+- `transcript (WorkIQ Copilot summary — NOT verbatim)` — only transcript-source.md
+- `❌ no-transcript-recovered-chat-only` — only chat-messages.*
+- `❌ unrecoverable` — only captured-at.txt + coverage.md
+A meeting with `transcript-missing-chat-only` is valid evidence in a degraded sense — actions/decisions citeable from chat are still useful — but the curated block MUST flag the gap prominently and the run-log MUST record `transcript-unrecoverable` for that meeting.
+### Ask-user fallback
+When the entire cascade (1a–1d + chat) produces nothing usable, ask the user to paste verbatim transcript or notes. Persist any paste to `verbatim/<dir>/transcript.txt` with header `User-pasted; automated paths returned no transcript on <ISO timestamp>`.
 ## Stream pass
-Per `evidence-thoroughness.instructions.md`: every meeting in the window gets a full per-meeting block. **A 30-line meetings file for a week with 2 meetings is a defect — expect 200+ lines.**
+Per `evidence-thoroughness.instructions.md`: every meeting in the window gets a full per-meeting block whose **FIRST sub-section is an AI Narrative Summary (REQUIRED, 5+ paragraphs)** covering the whole meeting end-to-end — context, what was discussed in what order, who took which position, the reasoning, the back-and-forth, soft signals, sentiment, what landed and what stayed open (cited to transcript timestamps). Then attendees → agenda → Detailed Discussion Summary (topic-organized) → chronological transcript walk-through with verbatim quotes → Decisions → Open Questions → Next Steps → Action Items → Risks → Customer Asks → Coverage Notes. **A 30-line meetings file for a week with 2 meetings is a defect — expect 300+ lines (the AI Narrative Summary alone is typically 30-50 lines per meeting).** A per-meeting block missing the AI Narrative Summary as the first sub-section is a defect even if every other section is present.
 Write to: `<engagement-root>/<project>/Evidence/<alias>/Meetings/stream/<YYYY-MM-DD>_meetings-stream.md` (date = Monday of the ISO week).
-Use template: `templates/weekly/meetings-summary.template.md`
+Use template: `templates/weekly/meetings-stream.template.md`
 If a week file already exists, MERGE (dedupe by event ID, append new events, keep existing).
+**Verbatim sibling required.** Every per-meeting block written here MUST have a matching `Evidence/<alias>/meetings/verbatim/<YYYY-MM-DD-HHMM>_<slug>/` folder produced by Half A. The block's `Source basis` line and `Artifacts` section must cite the verbatim folder's relative path. Self-check D13 enforces this.
 ## Mutable hints to upsert (during the run, not at the end)
 If discovered, immediately write to `m365Mutable.knownSections.<project>.calendarContext` with `discoveredOn` + `confidence`:
@@ -73,5 +156,6 @@ After the pass:
 ## Stop conditions
 - Subject keyword resolution returns 0 candidates → ask user once for keywords, persist to mutable, continue.
-- A meeting has neither transcript NOR chat messages → write `❌ no source available` block, log error, continue.
-- All paths failed for ALL meetings in the window → mark source `failed`, write a single `❌ all paths failed` evidence file with actionable next step.
+- A meeting has neither transcript NOR chat messages → write `verbatim/<dir>/coverage.md` documenting every failed path + write `❌ source-expired-or-unrecoverable` block in stream/, log error, continue.
+- All paths failed for ALL meetings in the window → mark source `failed`, write a single `❌ all paths failed` evidence file with actionable next step. Verbatim/ folders for each meeting still get created with captured-at.txt + coverage.md (the empty-folder audit trail is itself evidence).
+- A per-meeting block lacks a sibling `verbatim/<YYYY-MM-DD-HHMM>_<slug>/` directory → **defect** per `meetings-verbatim-required.instructions.md`. Re-run Half A immediately; do NOT ship the curated block alone.

package/plugin/skills/pull-misc/README.md ADDED Viewed

@@ -0,0 +1,84 @@
+# pull-misc runner
+Pulls evidence from a user-curated link list (`<project>/external-links.txt`). Handles types that don't fit a dedicated `pull-*` skill: Loop pages, public web pages, learn.microsoft.com / docs sites, GitHub repos, local files. Delegated types (onenote, sharepoint, ado) are recorded as `delegated` and pulled by their dedicated skills.
+See `SKILL.md` for the full doctrine.
+## One-time setup
+```pwsh
+# Install deps in the kushi repo
+cd C:\Usha\ISERepos\kushi
+npm install playwright jsdom @mozilla/readability
+npx playwright install chromium
+# (Loop links only) Seed the Playwright profile by reusing the OneNote profile
+# If you've already done OneNote bootstrap, you're done — Loop reuses it.
+node plugin/skills/pull-onenote/runner.mjs --bootstrap
+```
+## Per-project run
+```pwsh
+node plugin/skills/pull-misc/runner.mjs `
+  --project "ABN AMRO" `
+  --links-file "C:\Users\ushak\OneDrive - Microsoft\ISE\Engagement Assets\ABN AMRO\external-links.txt" `
+  --engagement-root "C:\Users\ushak\OneDrive - Microsoft\ISE\Engagement Assets" `
+  --headless
+```
+## Output
+Single JSON object on stdout:
+```json
+{
+  "project": "ABN AMRO",
+  "linksFile": "...",
+  "runStatus": "ok | auth-required | partial",
+  "counts": {
+    "total": 12,
+    "captured": 8,
+    "placeholder": 2,
+    "delegated": 1,
+    "auth_required": 0,
+    "fetch_failed": 1,
+    "skipped_binary": 0
+  },
+  "links": [
+    { "type": "web", "title": "...", "url": "...", "last_status": "captured", "captured_via": "http",
+      "http_status": 200, "content_type": "text/html", "etag": "...", "last_modified_http": "...",
+      "body": "...", "char_count": 18420, "captured_at": "..." }
+    /* ... */
+  ]
+}
+```
+## Filtering
+```pwsh
+# Only loop links
+node runner.mjs --project ABN ... --types loop
+# Only specific titles (substring match)
+node runner.mjs --project ABN ... --titles "Workshop,Architecture"
+# Combine
+node runner.mjs --project ABN ... --types web,loop --titles "Architecture"
+```
+## Scheduled / unattended runs
+- Browser branch (loop) reuses `~/.copilot/playwright-profile/onenote/`. When MFA fires, runner marks loop links `auth-required` and exits cleanly. User does one interactive bootstrap; next scheduled run silent.
+- HTTP branch needs no auth for public links. For sites behind SSO (auth'd Confluence, internal dashboards) it returns `auth-required` and is currently NOT supported — paste the content into a `file` link as a workaround, OR add a per-site auth handler in a future version.
+- `placeholder` links surface in the run report every refresh until the user fills them in.
+## Troubleshooting
+| Symptom | Cause | Fix |
+|---|---|---|
+| `Cannot find module 'jsdom'` | deps not installed | `npm install jsdom @mozilla/readability` in repo root |
+| All loop links `auth-required` | profile expired | `node plugin/skills/pull-onenote/runner.mjs --bootstrap` |
+| `web` link returns near-empty body | site is SPA / requires JS render | Add it as a `loop` type to use browser path; or paste content as `file` |
+| `confluence` link returns auth-required | private Confluence, no anon access | not supported in v0.1; paste content as `file` |
+| `pdf` link returns "text extraction not yet implemented" | pdfjs not bundled | text extract is a v0.2 item; manually extract for now |

package/plugin/skills/pull-misc/SKILL.md ADDED Viewed

@@ -0,0 +1,257 @@
+---
+name: "pull-misc"
+version: "2.0.1"
+description: "Pull miscellaneous evidence from a user-curated link list (<project>/external-links.txt). Handles links that don't fit the per-source pull-* skills: Loop pages, public web pages, learn.microsoft.com / docs sites, GitHub repos, local files, anything else linkable. Routes by type: delegated types (onenote, sharepoint, ado) are skipped because dedicated pull-* skills handle them. Per-link retry registry tracks last_status across runs."
+---
+# Skill: pull-misc
+> **v3.9.0 contracts** — This skill operates under six HARD-rule doctrines:
+> - `verbatim-by-default.instructions.md` — full bodies by default; no preview-grade pulls accepted.
+> - `m365-id-registry.instructions.md` — discover-once / consume-deterministically (per-link retry registry under `misc_links[]`).
+> - `capture-learnings.instructions.md` — every fix/discovery is logged to `plugin/learnings/misc.md`.
+> - `cleanup-on-resolution.instructions.md` — when a placeholder URL resolves, all stale unavailable-markers are upgraded in the same turn.
+> - `run-reports.instructions.md` — every refresh writes a per-user report under `Evidence/<alias>/refresh-reports/`.
+> - `thoroughness-detector.instructions.md` — the per-link retry registry IS this skill's thoroughness contract.
+> **Canonical evidence layout** (HARD, kushi v3.12.1+): all artifacts produced by this skill MUST be written under `<project>/Evidence/<alias>/<source>/{snapshot,stream,...}/` — sibling folders under `<project>/` (e.g. `<project>/<source>-context/`, `<project>/<source>/`, `<project>/_Weekly Summaries/`) are FORBIDDEN. See `evidence-layout-canonical.instructions.md`.
+Pulls **misc** evidence in two shapes per `snapshot-vs-stream.instructions.md`:
+- **snapshot/** — full content per link — one file per link with last-fetched + verbatim body
+- **stream/** — change events (HTTP `last-modified` header drift, git commit deltas where applicable, Loop edit signals via WorkIQ when available)
+## Tools (in order)
+1. **Playwright (browser-scrape, persisted profile)** — for `loop` links. Reuses `~/.copilot/playwright-profile/onenote/` because it's the same M365 cookie scope. Implementation: `plugin/skills/pull-misc/runner.mjs` branch `loop`.
+2. **HTTP fetch + Readability** — for `web` / `confluence` (anonymous) / `learn` / `docs` / `pdf` links. Implementation: `plugin/skills/pull-misc/runner.mjs` branch `web`.
+3. **Local file read** — for `file` links pointing at project-relative paths. Implementation: `plugin/skills/pull-misc/runner.mjs` branch `file`.
+4. **WorkIQ** — used only for stream/edit-event signals on `loop` links (Loop edit events surface in the M365 search index, same as OneNote stream events). NOT used for body retrieval — Loop bodies require the browser path.
+5. **Delegation** — `onenote`, `sharepoint`, `ado` are NOT pulled here; the runner records `captured_via: delegated` so the registry stays complete and the user can see which links flow through which skill. The actual capture is done by `pull-onenote`, `pull-sharepoint`, `pull-ado` respectively.
+## The link list (`<project>/external-links.txt`)
+This file already exists in most projects (per memory: the External Links Context doctrine). pull-misc preserves the existing format **without breaking changes** and adds `loop` as a recognized type.
+Format:
+```text
+# Comments start with #
+# One link per line:
+<type>|<owner>|<title>|<url-or-path>|<notes>
+```
+**Recognized types (v3.9.0):**
+| Type | Routed to | Captured via |
+|---|---|---|
+| `onenote` | pull-onenote | delegated |
+| `sharepoint` | pull-sharepoint | delegated |
+| `ado` | pull-ado | delegated |
+| `loop` **(new)** | pull-misc | browser (Playwright, OneNote profile) |
+| `web` | pull-misc | http (fetch + readability) |
+| `confluence` | pull-misc | http (anonymous) — auth'd Confluence not yet supported |
+| `learn` | pull-misc | http |
+| `docs` | pull-misc | http |
+| `pdf` | pull-misc | http (binary download + text extract via pdfjs) |
+| `github` | pull-misc | http (raw README + repo metadata) |
+| `file` | pull-misc | file (local read; path is project-relative) |
+| anything else | pull-misc | http (best-effort) |
+**Placeholder URLs** matching `<PASTE_*_URL>` or `<TODO*>` are SKIPPED with `last_status: placeholder`. They surface in the run report so the user can fill them in.
+## Empirical contract
+These three facts are HARD-rule:
+1. **The user's external-links.txt IS the source of truth for what counts as "misc evidence" for that project.** No fuzzy auto-discovery. If a link isn't in the file, it isn't pulled. Discovery is a separate (manual) step where the user paste links in.
+2. **Web fetches use HTTP first, browser only if HTTP fails or returns interactive shell.** SPAs that don't server-render (Loop, some Confluence, dashboards) drop to browser. Don't run a browser when curl works.
+3. **Loop pages MUST go through browser.** Same gap as OneNote — Loop has no Graph API for component content. The /loop skill already proved this; pull-misc reuses the same Playwright profile.
+## Pre-flight
+```pwsh
+# A. Playwright profile (only required if external-links.txt contains loop links)
+$prof = "$env:USERPROFILE\.copilot\playwright-profile\onenote"
+if (-not (Test-Path $prof)) {
+  Write-Host "[pull-misc] Playwright profile not yet seeded — loop links will be skipped with last_status: auth-required."
+  Write-Host "[pull-misc] Run plugin/skills/pull-onenote/runner.mjs --bootstrap once to seed it."
+}
+# B. Node deps
+$nodeDeps = @('playwright', 'jsdom', '@mozilla/readability')
+# Runner checks at startup; if missing, exits with "missing-deps" status and lists them.
+```
+## Bootstrap discovery
+For each project, the bootstrap step:
+1. **Locates `<project>/external-links.txt`.** If absent, write a starter template (same format as ABN AMRO has today) and mark `boundaries.misc.externalLinksPath` in `integrations.yml`.
+2. **Parses the file.** Skip comments, skip blank lines, skip placeholder URLs (`<PASTE_*_URL>`, `<TODO*>`).
+3. **Initializes `misc_links[]`** in `m365-mutable.json#knownSections.<projectKey>` with one entry per non-placeholder link, all `last_status: not-yet-attempted`.
+4. **Records** `boundaries.misc.linkCount` and `placeholderCount` to `bootstrap-status.md`.
+## Step A — enumerate (every refresh)
+```text
+1. Read external-links.txt; rebuild link inventory (in case the user added/removed links since last run).
+2. For each existing entry in misc_links[], match by (type, url) tuple — preserve attempts, last_attempt_at, snapshot_path.
+3. For new entries, create with last_status: not-yet-attempted.
+4. For deleted entries (in registry but no longer in file), mark last_status: removed and DO NOT delete the snapshot file (audit trail).
+```
+## Step B — fetch (per link)
+The runner branches per `type`:
+### B.1 `loop` — browser
+```js
+// pages.goto(loopUrl); detect login redirect → auth-required.
+// Wait for primary canvas selector; for Loop, that's the Fluid container.
+// Read text via document.body.innerText after settle.
+```
+If login redirect detected: `last_status: auth-required`, `captured_via: browser`, exit early for this link.
+### B.2 `web` / `confluence` / `learn` / `docs` / `github` / unknown — HTTP
+```js
+// fetch(url, { headers: { 'User-Agent': 'kushi-pull-misc/0.1' } });
+// If 200 + HTML → run @mozilla/readability for main content extract.
+// If 200 + text/* → use raw body.
+// If 200 + application/pdf → buffer + pdfjs text extract.
+// If 401/403 → last_status: auth-required (cannot resolve unattended for non-MS hosts).
+// If 404 / DNS fail → last_status: fetch-failed with HTTP code.
+// If 5xx → last_status: fetch-failed (transient — retry next refresh).
+```
+### B.3 `file` — local read
+```js
+// path is project-relative; resolve under <engagement-root>/<project>/
+// Read text; if binary, mark last_status: skipped-binary (use a different pipeline if needed).
+```
+### B.4 `onenote` / `sharepoint` / `ado` — delegate
+Skip fetch. Write registry entry with `captured_via: delegated`, `delegated_to: pull-onenote|pull-sharepoint|pull-ado`. Per-skill captures already happen in their own pipelines.
+## Per-link retry registry
+Stored at `m365-mutable.json#knownSections.<projectKey>.misc_links[]`. Schema:
+```jsonc
+{
+  "type":             "loop | web | learn | pdf | github | file | onenote | sharepoint | ado | ...",
+  "owner":            "<as in external-links.txt>",
+  "title":            "<as in external-links.txt>",
+  "url":              "<as in external-links.txt>",                  // or path for type=file
+  "notes":            "<as in external-links.txt>",
+  "last_status":      "captured | placeholder | auth-required | fetch-failed | skipped-binary | removed | not-yet-attempted | delegated",
+  "captured_via":     "browser | http | file | delegated",
+  "delegated_to":     "pull-onenote | pull-sharepoint | pull-ado",   // only when captured_via=delegated
+  "http_status":      200,                                            // only for HTTP
+  "content_type":     "text/html",                                    // only for HTTP
+  "char_count":       12345,
+  "attempts":         3,
+  "last_attempt_at":  "<ISO-8601>",
+  "captured_at":      "<ISO-8601, only when last_status=captured>",
+  "snapshot_path":    "Evidence/<alias>/misc/snapshot/<safe-type>__<safe-title>.md",
+  "etag":             "<HTTP ETag if returned>",                      // for change detection
+  "last_modified_http":"<HTTP Last-Modified if returned>"
+}
+```
+## Snapshot files
+One file per non-delegated link at:
+```
+<project>/Evidence/<alias>/misc/snapshot/<safe-type>__<safe-title>.md
+```
+Schema:
+```yaml
+---
+type: web
+owner: "Microsoft"
+title: "Microsoft Fabric Documentation Hub"
+url: "https://learn.microsoft.com/fabric/"
+last_status: captured
+captured_via: http
+http_status: 200
+content_type: text/html
+char_count: 18420
+captured_at: 2026-05-14T04:35:00Z
+schema: v0.1.0-misc
+---
+# {title}
+> Source: {url}
+> Captured via {captured_via} ({captured_at})
+> Owner: {owner}
+> Type: {type}
+## AI Narrative Summary
+{1-2 paragraph summary derived ONLY from the body text below — never fabricate, never extrapolate beyond the captured content}
+## Body (verbatim)
+```
+{full extracted text}
+```
+```
+## Step C — stream
+For supported types, emit edit events:
+- `web` / `learn` / `docs` / `pdf`: compare current `etag` / `last_modified_http` to registry; if changed since last refresh, emit `{ts, type, title, url, change: 'updated'}`.
+- `loop`: WorkIQ has Loop edit-event signals via search index — query when available. Otherwise infer from body diff.
+- `file`: compare file mtime; emit if changed.
+- `github`: query repo commits since last `captured_at`; emit per-commit events.
+- `onenote` / `sharepoint` / `ado`: NOT emitted here (their respective pull-* skills handle stream events for their content).
+## Run report
+Every refresh writes one block to `Evidence/<alias>/refresh-reports/<date>.md`:
+```markdown
+### pull-misc — <project>
+- external-links.txt: <N> total links (<P> placeholders, <D> delegated, <M> handled here)
+- captured: <C> / <M>
+- auth-required: <A>
+- fetch-failed: <F>
+- placeholders to fill: <list of titles where url contains <PASTE_*_URL>>
+- removed since last run: <list>
+- new since last run: <list>
+```
+## Anti-patterns (HARD)
+1. ❌ Auto-discovering links by crawling project SharePoint or Teams chats. The user's `external-links.txt` IS the boundary.
+2. ❌ Storing only `url` (not `(type, url)` tuple) as the registry key. Two types may share a URL (e.g. an onenote URL flagged as `web` for ad-hoc fetch test).
+3. ❌ Silently dropping a link from the registry when it disappears from `external-links.txt`. Mark `removed` and keep the snapshot for audit.
+4. ❌ Pulling delegated types (onenote / sharepoint / ado). Those have dedicated skills. Double-pulling creates conflicting evidence.
+5. ❌ Persisting placeholder URLs as `fetch-failed`. They're `placeholder`, surfaced in the run report so the user knows to fill them in.
+6. ❌ Loop link without Playwright profile silently writes empty snapshot. Must mark `auth-required`.
+## Self-check enforcement
+`plugin/skills/self-check/run.ps1` D-token contract for `pull-misc`: the SKILL.md must contain `external-links`, `misc_links`, `placeholder`, `delegated`. These prove the skill carries the v0.1.0 link-list pipeline + delegation contract.
+## Citation
+Citations from misc evidence MUST follow:
+```
+[source: misc/<type>/<safe-title> · YYYY-MM-DD]
+```
+Example: `[source: misc/loop/ABN-Core-Team-Sync-2026-03-27 · 2026-05-14]`