npm - @codyswann/lisa - Versions diffs - 2.9.1 → 2.10.0 - Mend

@codyswann/lisa 2.9.1 → 2.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/plugins/src/base/agents/tracker-mining-specialist.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: tracker-mining-specialist
+description: "Tracker mining specialist for the Debrief flow. Walks every work item in a shipped initiative — description, comments, status transitions, child sub-tasks added during implementation, and bugs filed afterward referencing the item — and produces a structured findings list. Pairs with pr-mining-specialist (parallel) and feeds learnings-synthesizer."
+skills:
+  - jira-read-ticket
+  - github-read-issue
+  - tracker-read
+---
+# Tracker Mining Specialist Agent
+You are a tracker mining specialist. Your job is to walk a closed initiative's tickets exhaustively and surface every signal that could become a learning, from the tracker side only. PR mining is owned by `pr-mining-specialist` running in parallel — do not duplicate that work.
+## Scope
+You answer one question per work item: **What did the tracker record about this work that wasn't in the original spec?**
+Adjacent questions other agents own:
+| Question | Owner |
+|----------|-------|
+| What did PR review threads, late commits, and added tests reveal? | `pr-mining-specialist` |
+| Across all tracker + PR findings, what is a candidate learning vs. noise? | `learnings-synthesizer` |
+| Does the shipped work match the spec? | `spec-conformance-specialist` |
+You are exhaustive, not selective. Surface the candidate; let the synthesizer judge.
+## Inputs
+The team lead provides a list of `(work_item_key_or_id, tracker_type)` tuples. For each one, you walk the full ticket graph:
+- The ticket itself: description, all fields, current status
+- Every comment in chronological order, including agent-posted evidence comments and CodeRabbit summaries that landed on the ticket
+- Status transitions and the duration spent in each status (long stalls are signals)
+- Child sub-tasks — especially ones added *after* the original Plan run (those represent scope discovered during implementation)
+- Issue links — `blocks`, `is blocked by`, `relates to`, `duplicates`, `clones` — and any new bug tickets filed *after* this one closed that reference it (regression signals)
+Use the matching read skills (`jira-read-ticket` / `github-read-issue`) via `tracker-read`. Do not call MCP write tools.
+## Mining checklist (per work item)
+Walk every item against this list. A finding is not "interesting" or "boring" — that judgment is the synthesizer's. You log every signal that matches a checklist row.
+1. **Description vs. final state divergence** — did the description list acceptance criteria that the comments reveal were silently changed? Note the original AC and what actually shipped.
+2. **Comments hinting at edge cases discovered during implementation** — phrases like "found that", "turns out", "edge case where", "we'll also need to handle", "broke when". Capture the comment author, timestamp, and quoted text.
+3. **Engineering decisions made in comments rather than the description** — these are convention drift candidates; the next agent reading a similar ticket has no way to find this decision.
+4. **Status stalls** — any status where the item sat longer than the median for its type (use a simple heuristic: > 3x the median duration of other items in this initiative for the same status). Long stalls usually indicate friction or an external dependency.
+5. **Sub-tasks added after the parent's Plan run** — every late-added sub-task is a scope-creep or missed-edge-case signal. Capture the sub-task summary and the parent's original AC.
+6. **Reopen / re-close cycles** — items that were closed and reopened indicate the original "done" was wrong. Capture each transition.
+7. **Bugs filed referencing this item after close** — search for issues that link back to this key with `relates to` / `duplicates` / `caused by` / cite it in their description. Each one is a candidate edge case the original spec missed.
+8. **CodeRabbit / bot summary content posted to the ticket** — bots often summarize PR review themes in a single comment. Pull those out verbatim.
+9. **Manual product / QA notes** — any comment that reports a manual test outcome ("tested in dev — works for case A, broke for case B") is gold; capture both cases.
+10. **Empty or thin acceptance criteria** that nonetheless shipped — itself a learning (process gap or rubber-stamping).
+## Output
+Produce a single structured markdown report per work item, then aggregate across all items into a final report at the path the team lead provides. Per-item structure:
+```markdown
+## <work_item_key>: <summary>
+- Status path: <status1> (<duration>) → <status2> (<duration>) → ...
+- Linked PRs: <list>
+- Sub-tasks added post-Plan: <list with original-vs-late timestamps>
+- Reopen cycles: <count, with dates>
+- Bugs filed afterward referencing this: <list of keys>
+### Findings
+1. <category from checklist row>: <one-line summary>
+   Evidence: <link to comment / transition / sub-task>
+   Quote (if applicable): "<verbatim>"
+2. ...
+```
+If there are no findings under a checklist row, write `(none)` — silence is itself information for the synthesizer.
+## Rules
+- **Never judge.** "Probably not interesting" is not a category. Every signal that matches a checklist row goes in.
+- **Quote verbatim.** Paraphrasing comments loses author voice and the specifics that make a finding actionable.
+- **Link, don't summarize.** Every finding has at least one evidence link to the source artifact (comment URL, ticket URL fragment, PR URL).
+- **Run within the team.** Do not call `TeamCreate`. The Debrief skill created the team; you are a teammate.
+- **Read-only.** Never call write MCP tools. You report; you do not mutate.
+- **Parallel-safe.** You run alongside `pr-mining-specialist`; do not coordinate with them. The synthesizer reconciles.

package/plugins/src/base/commands/debrief/apply.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+description: "Apply human-marked dispositions from a Debrief triage document — route accepted learnings to their persistence destinations (Edge Case Brainstorm checklist, project rules, memory, tracker tickets). Reads the triage doc produced by /lisa:debrief; deterministic and idempotent."
+argument-hint: "<path to triage doc | URL>"
+---
+Use the /lisa:debrief:apply command (which invokes the `lisa:debrief-apply` skill) to read the triage document at $ARGUMENTS, parse human dispositions, and persist accepted learnings to their categorized destinations.

package/plugins/src/base/commands/debrief.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+description: "Debrief a shipped initiative — mine tickets, PRs, and review threads to surface candidate learnings (edge cases, gotchas, friction, tooling gaps, convention drift) for human triage. Stops after producing the triage doc; persistence happens in /lisa:debrief:apply."
+argument-hint: "<PRD URL | epic key | epic URL>"
+---
+Use the /lisa:debrief skill to walk the original Plan, mine completed work units and their PRs, and produce a triage-ready learnings document for $ARGUMENTS.

package/plugins/src/base/hooks/enforce-team-first.sh CHANGED Viewed

@@ -2,7 +2,7 @@
 # Enforces team-first orchestration for lifecycle skills.
 #
 # Triggered on four hook events:
-#   - UserPromptSubmit  : detects /lisa:research|plan|implement|intake in the
+#   - UserPromptSubmit  : detects /lisa:research|plan|implement|intake|debrief in the
 #                         raw prompt and arms enforcement for the session
 #   - PreToolUse        : detects the same skills via a `Skill` tool call,
 #                         arms enforcement, and blocks bypass tool calls
@@ -46,7 +46,7 @@ find "$STATE_DIR" -maxdepth 1 -type f -mmin +1440 -delete 2>/dev/null || true
 is_lifecycle_skill() {
   case "$1" in
-    lisa:research|lisa:plan|lisa:implement|lisa:intake) return 0 ;;
+    lisa:research|lisa:plan|lisa:implement|lisa:intake|lisa:debrief) return 0 ;;
     *) return 1 ;;
   esac
 }
@@ -63,7 +63,13 @@ case "$HOOK_EVENT" in
       # Match a slash command at the start of the prompt (allow optional whitespace).
       LEADING=$(printf '%s' "$PROMPT" | sed -n '1p' | sed -E 's/^[[:space:]]*//')
       case "$LEADING" in
-        /lisa:research*|/lisa:plan*|/lisa:implement*|/lisa:intake*)
+        # /lisa:debrief:apply is single-agent — explicitly excluded by listing
+        # it first with a no-op pattern; the broader /lisa:debrief* below would
+        # otherwise capture it.
+        /lisa:debrief:apply*)
+          : # single-agent, no team enforcement
+          ;;
+        /lisa:research*|/lisa:plan*|/lisa:implement*|/lisa:intake*|/lisa:debrief*)
           # Strip leading slash and any args after the first whitespace.
           SKILL_NAME=$(printf '%s' "$LEADING" | sed -E 's|^/||; s/[[:space:]].*$//')
           printf '%s\n' "$SKILL_NAME" >"$SKILL_FLAG" 2>/dev/null || true

package/plugins/src/base/rules/intent-routing.md CHANGED Viewed

@@ -11,7 +11,7 @@ This protocol runs **once per session**, on the first user message. After that,
 1. If the user invoked a slash command (`/lisa:research`, `/lisa:plan`, `/lisa:implement`, `/lisa:verify`, `/lisa:monitor`, `/lisa:intake`, etc.), the flow is already determined -- skip classification.
 2. Read the user's request and match it against the flow definitions below.
 3. If you cannot confidently classify the request:
-   - **Interactive session** (user is present): present a multiple choice using AskUserQuestion with options: Research, Plan, Implement, Verify, No flow.
+   - **Interactive session** (user is present): present a multiple choice using AskUserQuestion with options: Research, Plan, Implement, Verify, Debrief, No flow.
    - **Headless/non-interactive session** (running with `-p` flag, in a CI pipeline, or as a scheduled agent): do NOT ask the user. Classify to the best of your ability from available context (ticket content, prompt text, current branch state). If you truly cannot classify, default to "No flow" and proceed with the request as-is.
 4. Once a flow is selected, **echo it back explicitly** before doing anything else. State the flow, the work type (if applicable), and a one-sentence justification for why this flow was chosen. Example:
@@ -34,7 +34,7 @@ What this rule still enforces:
 2. **Cascade rule (load-bearing)**: Before calling `TeamCreate`, check whether you are already operating inside an agent team. Signs you are inside a team: a prior `TeamCreate` exists in this session; you were spawned via `Agent` with `team_name`; your context references a team lead. If any of these are true, **do NOT call `TeamCreate`** — the harness rejects double-creates and the work stalls. Continue within the existing team. Invoke flows via the Skill tool; the team lead inherits responsibility for orchestration.
-3. **Default mode**: `Research`, `Plan`, `Implement`, and `Intake` run as agent teams. The `Implement` flow — including every work type (`Build`, `Fix`, `Improve`, `Investigate-Only`) — is **always** a team flow. Bug fixes that "look simple" are not an exception: the Reproduce sub-flow, debug-specialist, bug-fixer, parallel reviewers, and verification-specialist all need to compose. `Verify` (standalone) and `Monitor` (standalone) use the One-shot Sub-agents pattern (see `## Orchestration` below) — these flows are linear with no parallelism and the team overhead is not warranted. Single-agent mode is otherwise reserved for: `product-walkthrough` invoked standalone (not as part of Research/Plan), and one-off diagnostic Bash/Read sessions that don't invoke any lifecycle skill. When in doubt, use a team.
+3. **Default mode**: `Research`, `Plan`, `Implement`, `Intake`, and `Debrief` run as agent teams. The `Implement` flow — including every work type (`Build`, `Fix`, `Improve`, `Investigate-Only`) — is **always** a team flow. Bug fixes that "look simple" are not an exception: the Reproduce sub-flow, debug-specialist, bug-fixer, parallel reviewers, and verification-specialist all need to compose. `Debrief` runs as a team because tracker-mining and pr-mining parallelize cleanly and synthesis gates on both completing. `Verify` (standalone) and `Monitor` (standalone) use the One-shot Sub-agents pattern (see `## Orchestration` below) — these flows are linear with no parallelism and the team overhead is not warranted. Single-agent mode is otherwise reserved for: `product-walkthrough` invoked standalone (not as part of Research/Plan), `debrief-apply` (deterministic routing of human-marked dispositions), and one-off diagnostic Bash/Read sessions that don't invoke any lifecycle skill. When in doubt, use a team.
 The mechanical TeamCreate bootstrap directive lives inside each lifecycle skill — see those skills' orchestration preambles for the exact wording: first `ToolSearch{select:TeamCreate}` (load deferred schema), then `TeamCreate`.
@@ -65,10 +65,11 @@ Gate:
 Sequence:
 1. **Investigate sub-flow** -- gather context from codebase, git history, existing behavior, and external sources
 2. `product-specialist` -- define user goals, user flows (Gherkin), acceptance criteria, error states, UX concerns, and out-of-scope items
-3. `architecture-specialist` -- assess technical feasibility, identify constraints, map existing system boundaries
-4. Synthesize findings into a PRD document containing: problem statement, user stories, acceptance criteria, technical constraints, open questions, and proposed scope
-5. **Plan Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Plan phase will need. For each recommended skill or agent, state why it is needed. If no skills or agents beyond the defaults are identified, explicitly justify why the standard set is sufficient. Include this as a "Recommended Tooling for Plan Phase" section in the PRD.
-6. `learner` -- capture discoveries for future sessions
+3. **Edge Case Brainstorm sub-flow** -- run the PRD candidate through the edge-case checklist; fold accepted cases into acceptance criteria, out-of-scope, or open questions
+4. `architecture-specialist` -- assess technical feasibility, identify constraints, map existing system boundaries
+5. Synthesize findings into a PRD document containing: problem statement, user stories, acceptance criteria, technical constraints, open questions, and proposed scope
+6. **Plan Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Plan phase will need. For each recommended skill or agent, state why it is needed. If no skills or agents beyond the defaults are identified, explicitly justify why the standard set is sufficient. Include this as a "Recommended Tooling for Plan Phase" section in the PRD.
+7. `learner` -- capture discoveries for future sessions
 Output: A PRD document that includes a "Recommended Tooling for Plan Phase" section listing the skills and agents the Plan phase should use. If there is not enough context to produce a complete PRD, stop and report what is missing rather than producing an incomplete one.
@@ -84,19 +85,21 @@ Gate:
 Sequence:
 1. **Investigate sub-flow** -- explore codebase for architecture, patterns, dependencies relevant to the spec
-2. `product-specialist` -- validate and refine acceptance criteria for the whole scope
-3. `architecture-specialist` -- map dependencies, identify cross-cutting concerns, determine execution order
-4. **Implement/Verify Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Implement and Verify phases will need for each work item. For each recommended skill or agent, state why it is needed and which work items it applies to. If no skills or agents beyond the defaults are identified for a work item, explicitly justify why the standard set is sufficient.
-5. Decompose into ordered work items (epics, stories, tasks, spikes, bugs), each with:
+2. `product-specialist` -- validate and refine acceptance criteria for the whole scope, including error states and UX concerns
+3. **Edge Case Brainstorm sub-flow** -- run the PRD as a whole through the checklist to catch scope-shaped gaps before decomposition
+4. `architecture-specialist` -- map dependencies, identify cross-cutting concerns, determine execution order
+5. **Implement/Verify Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Implement and Verify phases will need for each work item. For each recommended skill or agent, state why it is needed and which work items it applies to. If no skills or agents beyond the defaults are identified for a work item, explicitly justify why the standard set is sufficient.
+6. Decompose into ordered work items (epics, stories, tasks, spikes, bugs). For each item, run the **Edge Case Brainstorm sub-flow** scoped to that item — accepted cases become additional acceptance criteria or sub-tasks; rejected ones are noted with a one-line reason. Each item carries:
    - Type (epic, story, task, spike, bug)
-   - Acceptance criteria
+   - Acceptance criteria (including any added by the per-item brainstorm)
    - Verification method
    - Dependencies
-   - Skills and agents required (from step 4)
-6. Create work items in the tracker (JIRA, Linear, GitHub) with acceptance criteria, dependencies, and recommended skills/agents
-7. `learner` -- capture discoveries for future sessions
+   - Skills and agents required (from step 5)
+7. Create work items in the tracker (JIRA, Linear, GitHub) with acceptance criteria, dependencies, and recommended skills/agents
+8. **PRD back-link** -- update the source PRD with a `## Tickets` section listing every created work item (key, title, type, link), so the PRD becomes the canonical anchor for downstream flows (notably **Debrief**). Invoke `lisa:prd-backlink` with the PRD source and the created ticket list. The section is regenerated on each run, not appended, so re-planning never produces stale links.
+9. `learner` -- capture discoveries for future sessions
-Output: Work items in a tracker with acceptance criteria and recommended skills/agents, ordered by dependency. If the specification cannot be decomposed without further clarification, stop and report what is missing.
+Output: Work items in a tracker with acceptance criteria and recommended skills/agents, ordered by dependency. The source PRD carries a `## Tickets` section linking back to every created item. If the specification cannot be decomposed without further clarification, stop and report what is missing.
 ### Implement
@@ -189,6 +192,32 @@ Sequence:
 Output: Merged PR, successful deploy, remote verification passing.
+### Debrief
+When: An initiative is fully shipped — every work item from the original Plan is in a terminal state and its PR is merged. The user wants to surface candidate learnings (edge cases, gotchas, friction, tooling gaps, convention drift) for human triage so future agents inherit what this initiative taught.
+Gate:
+- A PRD or epic must be provided as input — the PRD URL (Notion / Confluence / Linear / GitHub Issue / file), the epic key (JIRA), or the epic issue URL (GitHub). The PRD's `## Tickets` section (written by Plan step 8) is the canonical anchor for the work-item set; an epic's children are the equivalent.
+- Every work item linked from the input must be in a terminal state (Done / Closed / Cancelled). If any item is still open, stop and list the unfinished items.
+- Every Done item that was implementable must have at least one merged PR linked. If a Done item has no PR, surface it as a debrief anomaly rather than silently excluding it.
+- Headless / non-interactive sessions: do not block on missing input — if the input is ambiguous (e.g., only a vague initiative name), fail with a clear error listing what was needed.
+Sequence:
+1. **Resolve the work-item set** — read the input. If it's a PRD, follow its `## Tickets` section. If it's an epic, list its children. Build the canonical list of `(work_item, linked_PRs[])` tuples. If a work item has no `linked_PRs` and is not a spike, mark it as an anomaly to surface in step 4.
+2. **Mine in parallel** (run as concurrent tasks within the team):
+   - `tracker-mining-specialist` — for every work item, walk the description, every comment (human, agent evidence, CodeRabbit summary), status transitions and their durations, late-arriving bugs that reference the item, and child sub-tasks added during implementation. Output a structured per-ticket findings list.
+   - `pr-mining-specialist` — for every linked PR, walk the description, every review comment (general + inline; CodeRabbit + human), every commit on the branch (especially late `fix:` / `revert:` / follow-up commits), and every test file added. Output a structured per-PR findings list.
+3. `learnings-synthesizer` — consume both findings lists, deduplicate, and categorize each candidate learning into one of:
+   - **Edge case** — a failure mode that should have been caught at PRD/Plan time; candidate addition to the Edge Case Brainstorm checklist
+   - **Recurring gotcha** — a stack- or codebase-specific trap (e.g., "this ORM silently truncates X")
+   - **Process friction** — a step in the lifecycle that consistently slowed the work
+   - **Tooling gap** — missing skill, wrong agent assignment, broken hook, missing automation
+   - **Convention drift** — an unwritten rule revealed by review comments that should be codified
+4. **Produce the human-triage document** — a markdown file with one row per candidate learning showing: category, summary, evidence (links to the source ticket comment / PR comment / commit), recommended persistence destination, and a checkbox-style disposition field the human will mark (Accept / Reject / Defer). Surface step-1 anomalies (work items missing PRs, etc.) in a separate section. The document is exhaustive — it lists every candidate, even ones the synthesizer rates low confidence — because the human, not the agent, decides what is worth keeping.
+5. **Stop and hand the document to the human.** Debrief does NOT persist accepted learnings itself. The human triages, marks dispositions, and runs the **`/lisa:debrief:apply`** command (skill: `debrief-apply`) to route the accepted items to their destinations.
+Output: A triage-ready learnings document covering every work item and PR in the initiative, with structured evidence and disposition fields. Persistence is deferred to `debrief-apply`, which the human invokes after triage.
 ## Sub-flows
 Sub-flows are reusable sequences invoked by main flows. When a flow says "Investigate sub-flow", execute the full Investigate sequence.
@@ -203,6 +232,54 @@ Sequence:
 3. `ops-specialist` -- check logs, errors, health (if runtime issue)
 4. Report findings with evidence
+### Edge Case Brainstorm
+Purpose: Force explicit consideration of edge cases at PRD time and at work-item time, so failure modes that change scope or add acceptance criteria are caught before implementation rather than after a bug is filed in production.
+Invoked by: Research (against the PRD as a whole), Plan (once against the PRD before decomposition, then once per work item during decomposition), and Build / Fix sub-flows when a `product-specialist` or `test-specialist` step would otherwise rubber-stamp acceptance criteria.
+Sequence:
+1. Walk through the checklist below and propose every candidate edge case that plausibly applies to the scope under review. Aim for breadth, not pre-filtered relevance — propose first, judge second.
+2. For each candidate, take an explicit action and record it:
+   - **Accept** — fold into acceptance criteria (PRD-level or work-item level), or open a new work item / sub-task if the case is large enough to warrant one
+   - **Defer** — capture as an open question or `Out of Scope` line with a one-sentence reason
+   - **Reject** — note the case and a one-sentence reason it does not apply (e.g., "single-tenant, no concurrent edits possible")
+3. A silent skip is not allowed — every candidate from the checklist must end up Accepted, Deferred, or Rejected with a reason. "Considered edge cases" without a per-item disposition does not satisfy this sub-flow.
+4. If three or more candidates are Accepted at PRD time, treat that as a signal that the PRD scope is wider than originally framed and call it out in the synthesis step.
+Checklist (pattern + question form — ask each question literally of the scope under review):
+**Navigation & URL state**
+- *Reload persistence*: if the user reloads mid-task, do they land where they were — same tab, same filters, same scroll, same selection — or get bounced to a default?
+- *Deep linking*: can the URL alone reconstruct the screen, or does it require state from a previous click?
+- *Back / forward*: does browser history match what the user expects, or does it skip steps or re-trigger side effects?
+- *Parameter change then reload*: after the user changes filters / sort / tab / pagination, does a reload preserve those choices?
+**Data lifecycle**
+- *Empty state*: what does this look like the very first time, with zero data?
+- *Single vs. many*: does the UI degrade with 1 item, 10k items, or at pagination boundaries?
+- *Stale data*: if the user leaves the tab open for an hour, what is wrong when they come back?
+- *Concurrent edits*: two users (or two tabs) editing the same record — last-write-wins, conflict, or merge?
+- *Deletion mid-flow*: the resource the user is viewing gets deleted by someone else while they have it open.
+**Failure modes**
+- *Network*: offline, slow, intermittent, request mid-flight when the user navigates away.
+- *Partial success*: bulk action where 8 of 10 succeed — what does the user see and what state is the system in?
+- *Permission denied mid-flow*: token expires, role changes, resource becomes inaccessible.
+- *Idempotency*: double-click submit, retry after timeout — does the action happen twice?
+**Input boundaries**
+- *Text*: empty, max-length, unicode, whitespace-only, leading / trailing whitespace, emoji, RTL.
+- *Numeric*: zero, negative, very large, non-integer, floating-point precision.
+- *Date / time*: timezone, DST transition, leap day, "now" vs. server time skew.
+**Auth & session**
+- *Session expiry mid-action*: what happens to in-flight work?
+- *Role downgrade*: the user loses access to the screen they are currently on.
+- *Multi-tab session*: logout in one tab while another tab is mid-action.
+This list is non-exhaustive — agents should propose additional edge cases relevant to the domain (e.g., real-time / streaming, money / financial rounding, regulated data, multi-tenant isolation) and run them through the same Accept / Defer / Reject discipline.
 ### Reproduce
 Purpose: Create a reliable reproduction that demonstrates a bug before fixing it.
@@ -267,11 +344,13 @@ Vendor-neutral callers (e.g., `implement`, `verify`) should invoke the `tracker-
 Flows can chain naturally:
 - Research produces a PRD -- hand it to Plan
-- Plan produces work items -- hand each to Implement
+- Plan produces work items (and writes a `## Tickets` back-link section into the PRD) -- hand each item to Implement
 - Implement produces verified code -- hand to Verify
+- Verify ships and confirms the deploy -- once every work item in the PRD is shipped, hand the PRD (or the epic) to Debrief
+- Debrief produces a triage-ready learnings document -- hand to the human, who marks dispositions and runs `debrief-apply` to persist accepted learnings
 - If any flow discovers it lacks what it needs, it stops and suggests the preceding flow
-The full lifecycle for a large initiative: Research -> Plan -> Implement (per item) -> Verify (per item).
+The full lifecycle for a large initiative: Research -> Plan -> Implement (per item) -> Verify (per item) -> Debrief (once across the whole initiative) -> Debrief Apply (human-triggered, after triage).
 ## Sub-flow Usage
@@ -290,6 +369,7 @@ Use an **agent team** (TeamCreate + TaskCreate per step) for:
 - **Implement** (Build, Fix, Improve) — long sequences with parallel review and a real risk of compaction
 - **Plan** — multiple specialists feeding a shared decomposition
 - **Research** — multiple specialists feeding a shared PRD
+- **Debrief** — tracker-mining and pr-mining run in parallel and gate the synthesizer; the work-item set can be large, so durable task state matters
 - Any flow that invokes the **Review sub-flow** (the four review specialists run in parallel and gate a single follow-up task)
 Why: these flows have enough steps that context compaction is likely; the Review sub-flow is parallel-by-design and `blockedBy` expresses that cleanly; durable task state lets the team lead recover assignments after compaction.

package/plugins/src/base/skills/confluence-to-tracker/SKILL.md CHANGED Viewed

@@ -259,6 +259,20 @@ After all tickets are created, present a summary table to the user:
 - Blockers list with recommendations and alternatives
 - Cross-PRD dependencies
+### Phase 7: PRD Back-link
+> **Mode guard**: In `dry_run: true` mode, skip this phase entirely — no tickets exist to link.
+After Phase 6, invoke the `lisa:prd-backlink` skill to write a `## Tickets` section back into the source Confluence PRD page. The section becomes the canonical anchor for the **Debrief** flow once the initiative ships.
+Invoke `lisa:prd-backlink` with:
+- `source_type: "confluence"`
+- `source_ref`: the original Confluence page URL
+- `tickets`: the full list created in Phases 3–5, each entry as `{ key, title, type, url, parent_key }`
+If `lisa:prd-backlink` fails (page permission denied, Confluence unreachable), surface the error in the Phase 6 report rather than aborting — the tickets are already created. Recommend the user re-run `lisa:prd-backlink` standalone once the source is reachable.
 ## Handling Ambiguities and Blockers
 When you encounter something the PRD + comments + codebase can't resolve:

package/plugins/src/base/skills/debrief/SKILL.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: debrief
+description: "Run the Debrief flow over a shipped initiative. Input: a PRD URL (Notion / Confluence / Linear / GitHub Issue / file), a JIRA epic key, or a GitHub epic issue URL. Output: a triage-ready learnings document covering every work item in the initiative — edge cases, gotchas, process friction, tooling gaps, convention drift — each with structured evidence and a human-disposition field. Persistence is deferred to lisa:debrief-apply."
+allowed-tools: ["Skill", "ToolSearch", "TeamCreate", "Bash", "Read", "Glob", "Grep"]
+---
+# Debrief: $ARGUMENTS
+Walk the original Plan for `$ARGUMENTS`, mine the completed work items and their PRs, and produce a triage-ready learnings document for human review.
+## Orchestration: agent team
+If you are NOT already operating inside an agent team (no prior `TeamCreate` in this session, not spawned via `Agent` with `team_name`), the very first thing you do is create the team. Two tool calls only, in this exact order:
+1. `ToolSearch` with `query: "select:TeamCreate"` — `TeamCreate` is a deferred tool whose schema must be loaded before it can be invoked. A cold call returns `InputValidationError` and tempts a fallback to direct `Agent` calls, which bypasses the team.
+2. `TeamCreate` — actually create the team.
+Until `TeamCreate` returns successfully, do NOT call any of: `Agent`, `TaskCreate`, `Skill`, MCP tools (Atlassian / Linear / GitHub / Notion), `Read`, `Write`, `Edit`, `Bash`, `Grep`, `Glob`. Resolving the work-item set, fetching tickets, walking PRs — all of those are tasks for the team you are about to create, not for the lead session before the team exists.
+If you ARE already inside an agent team (e.g., a teammate invoked this skill via the Skill tool), do NOT call `TeamCreate` — the harness rejects double-creates. Continue within the existing team.
+## Input
+`$ARGUMENTS` is one of:
+| Input shape | Resolution |
+|-------------|------------|
+| Notion / Confluence / Linear / GitHub Issue PRD URL | Fetch the PRD; read its `## Tickets` (or equivalent) back-link section written by the Plan flow |
+| File path to a PRD markdown | Read the file; parse its `## Tickets` section |
+| JIRA epic key (e.g. `SE-1234`) or epic URL | Fetch the epic; list its child issues (Stories, Tasks, Bugs) |
+| GitHub epic issue URL or `<org>/<repo>#<n>` | Fetch the epic issue; list its sub-issues / linked items |
+If the PRD has no `## Tickets` section AND the input is not an epic, stop and report — the Plan flow's PRD back-link step (`lisa:prd-backlink`) was likely skipped. Suggest re-running Plan to populate the section, or pass the epic key directly.
+## Gate
+Run before mining begins:
+1. **All work items terminal.** Every linked work item must be in a terminal state (Done / Closed / Cancelled equivalent for the tracker). If any item is still open, stop and list the unfinished items — Debrief is post-shipping by definition.
+2. **PR coverage.** Every Done item that was implementable (Story / Task / Bug; not Spike) must have at least one merged PR linked. Items missing a PR are recorded as **anomalies** to surface in the report rather than silently excluded — a Done item with no PR is itself a learning ("how did this ship?").
+3. **Headless safety.** In headless / `-p` / scheduled mode, do not block on missing input — fail fast with a clear error listing what was needed.
+## Flow
+Execute the **Debrief** flow as defined in the `intent-routing` rule (loaded via the lisa plugin). The rule contains the canonical step sequence (gate, mining, synthesis, output, hand-off). This skill does NOT restate flow steps — change them in the rule, propagate everywhere.
+The flow's mining step runs `tracker-mining-specialist` and `pr-mining-specialist` in parallel as separate tasks within the team. Both must complete before `learnings-synthesizer` runs. Express this with `blockedBy` so the synthesizer task is automatically gated on the two mining tasks.
+## Exhaustiveness expectation
+Debrief is deliberately exhaustive — the human, not the agent, decides what is worth keeping. Specialists should err toward surfacing more candidates, not fewer. A candidate that the synthesizer rates low confidence is still a row in the triage doc; only outright duplicates are dropped.
+## Output
+A markdown triage document at `./debrief/<initiative-slug>-<YYYY-MM-DD>.md` (or wherever the project's debrief output directory is configured) containing:
+1. **Header** — initiative name, source PRD/epic link, work-item count, PR count, generation date, gate results.
+2. **Anomalies** — work items missing PRs, items with abnormal status-transition timing, PRs with no review comments at all (signal-of-absence is a learning), etc.
+3. **Candidate learnings** — one row per candidate, grouped by category (Edge case / Recurring gotcha / Process friction / Tooling gap / Convention drift). Each row has:
+   - `Summary` — one sentence
+   - `Category`
+   - `Evidence` — links to the source ticket comment / PR comment / commit / test file (multiple allowed)
+   - `Recommended persistence destination` — the agent's best guess for where this should land if accepted (e.g., "Edge Case Brainstorm checklist → Navigation & URL state", "PROJECT_RULES.md", "memory: project_*.md", "new tooling-gap ticket")
+   - `Disposition` — empty checkbox-style field the human will fill: `[ ] Accept` / `[ ] Reject` / `[ ] Defer` plus a free-text reason
+4. **Source map** — appendix listing every work item and PR walked, so the human can verify completeness.
+The skill's terminal output is the path to the triage document and a one-line summary of counts per category. Persistence does not happen here — that is `lisa:debrief-apply`'s job.
+## Hand-off
+After producing the triage document, print:
+```text
+Triage document written to: <path>
+Counts: <n> edge cases, <n> gotchas, <n> friction, <n> tooling gaps, <n> convention drift; <n> anomalies
+Next: human triage. When done, run `/lisa:debrief:apply <path>` to persist accepted learnings.
+```
+Then stop. Debrief never persists learnings on its own.

package/plugins/src/base/skills/debrief-apply/SKILL.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+name: debrief-apply
+description: "Apply human-marked dispositions from a Debrief triage document. Reads the triage doc produced by lisa:debrief, parses each row's disposition (Accept / Reject / Defer), and routes Accepted items to their persistence destination. Deterministic and idempotent — safe to re-run if dispositions are added incrementally."
+allowed-tools: ["Skill", "Bash", "Read", "Edit", "Write", "Glob", "Grep"]
+---
+# Debrief Apply: $ARGUMENTS
+Read the triage document at `$ARGUMENTS` and persist every Accepted candidate learning to its destination.
+This skill is intentionally **single-agent** — there is no team. Routing is deterministic given the disposition column. Spawning sub-agents would only add latency.
+## Input
+A path or URL to a Debrief triage document produced by `lisa:debrief`. The document is expected to follow the structure that skill produces — a header, an anomalies section, candidate-learning rows grouped by category, and a source-map appendix.
+## Pre-flight
+1. **Verify the doc exists and parses.** If the file cannot be read or the expected sections are missing, stop and report — do not guess.
+2. **Confirm dispositions exist.** If every row is unmarked, stop and ask the human to triage first. A pristine doc is a no-op, not an error to silently swallow.
+3. **Identify the destination map.** Read the project's `.lisa.config.json` (or stack defaults) for: edge-case checklist file (default: `plugins/src/base/rules/intent-routing.md`'s Edge Case Brainstorm sub-flow), project-rules file (default: `.claude/rules/PROJECT_RULES.md`), memory directory (per the auto-memory system path), tracker for new tickets.
+## Routing rules
+For every row marked **Accept**:
+| Category | Destination | Action |
+|----------|-------------|--------|
+| Edge case | Edge Case Brainstorm checklist in `intent-routing.md` | Append the new pattern + question to the matching group (Navigation, Data, Failure, Input, Auth, or a new group if none fit). Use the row's `Summary` and `Evidence` link as a citation comment. |
+| Recurring gotcha | Memory file (`project_*.md`) | Write a new memory entry with `type: project`, structured as: rule, **Why:**, **How to apply:**. Add an index line to `MEMORY.md`. |
+| Process friction | Project rules file | Append a one-line guideline to `PROJECT_RULES.md` under an appropriate heading (or create one). |
+| Tooling gap | Configured tracker | Create a new ticket via `lisa:tracker-write` with `issue_type: Task`, summary derived from the row's `Summary`, description citing the evidence and the originating debrief doc. Label appropriately (`type:tooling`, `lifecycle-improvement`, etc.). |
+| Convention drift | `CLAUDE.md` (project) or `PROJECT_RULES.md` | Append the convention as a one-paragraph note under the relevant section. If no relevant section exists, create one. |
+For every row marked **Reject** or **Defer**: no action. Defer is a no-op for `apply` but worth surfacing in the run summary — the human may want to revisit at the next debrief.
+## Idempotency
+`apply` is safe to re-run. Each Accepted row carries an evidence link that doubles as a fingerprint — before writing, check whether the destination already cites that fingerprint. If it does, skip the write and note the row as `already-applied` in the run summary. This lets the human triage a doc incrementally (mark a few, run apply, mark more, run apply again) without producing duplicates.
+## Updating the triage doc
+After each Accepted row is persisted, replace its `[ ] Accept` checkbox with `[x] Applied — <one-line summary of what was written>`. This makes the triage doc itself the audit log of what was acted on. If a write fails (e.g., tracker is unreachable), mark the row `[!] Apply failed — <reason>` and continue with the rest. Never abort the whole run because one row failed.
+## Output
+A run summary printed to the user:
+```text
+Applied <n> learnings:
+  <n> edge cases → intent-routing.md
+  <n> gotchas → memory
+  <n> friction → PROJECT_RULES.md
+  <n> tooling gaps → <tracker> (<key1>, <key2>, ...)
+  <n> convention drift → CLAUDE.md
+Skipped:
+  <n> rejected, <n> deferred, <n> already-applied
+Failed:
+  <n> (see <path> for details)
+Triage doc updated in place: <path>
+```
+If anything is written to a tracker, suggest the human commit the local file changes (memory, rules, intent-routing) when ready — `apply` does not commit.

package/plugins/src/base/skills/github-to-tracker/SKILL.md CHANGED Viewed

@@ -252,6 +252,20 @@ After all tickets are created, present a summary table to the user:
 - Blockers list with recommendations and alternatives
 - Cross-PRD dependencies
+### Phase 7: PRD Back-link
+> **Mode guard**: In `dry_run: true` mode, skip this phase entirely — no tickets exist to link.
+After Phase 6, invoke the `lisa:prd-backlink` skill to write a `## Tickets` section back into the source GitHub Issue PRD body. The section becomes the canonical anchor for the **Debrief** flow once the initiative ships.
+Invoke `lisa:prd-backlink` with:
+- `source_type: "github"`
+- `source_ref`: the original GitHub Issue URL or `<org>/<repo>#<n>` token
+- `tickets`: the full list created in Phases 3–5, each entry as `{ key, title, type, url, parent_key }`
+If `lisa:prd-backlink` fails (permission denied, GitHub unreachable, issue locked), surface the error in the Phase 6 report rather than aborting — the tickets are already created. Recommend the user re-run `lisa:prd-backlink` standalone once the source is reachable.
 ## Handling Ambiguities and Blockers
 When you encounter something the PRD + comments + codebase can't resolve:

package/plugins/src/base/skills/linear-to-tracker/SKILL.md CHANGED Viewed

@@ -252,6 +252,20 @@ After all tickets are created, present a summary table to the user:
 - Blockers list with recommendations and alternatives
 - Cross-PRD dependencies
+### Phase 7: PRD Back-link
+> **Mode guard**: In `dry_run: true` mode, skip this phase entirely — no tickets exist to link.
+After Phase 6, invoke the `lisa:prd-backlink` skill to write a `## Tickets` section back into the source Linear project (or its description). The section becomes the canonical anchor for the **Debrief** flow once the initiative ships.
+Invoke `lisa:prd-backlink` with:
+- `source_type: "linear"`
+- `source_ref`: the original Linear project URL
+- `tickets`: the full list created in Phases 3–5, each entry as `{ key, title, type, url, parent_key }`
+If `lisa:prd-backlink` fails (permission denied, Linear unreachable), surface the error in the Phase 6 report rather than aborting — the tickets are already created. Recommend the user re-run `lisa:prd-backlink` standalone once the source is reachable.
 ## Handling Ambiguities and Blockers
 When you encounter something the PRD + comments + codebase can't resolve:

package/plugins/src/base/skills/notion-to-tracker/SKILL.md CHANGED Viewed

@@ -264,6 +264,20 @@ After all tickets are created, present a summary table to the user:
 - Blockers list with recommendations and alternatives
 - Cross-PRD dependencies
+### Phase 7: PRD Back-link
+> **Mode guard**: In `dry_run: true` mode, skip this phase entirely — no tickets exist to link.
+After Phase 6, invoke the `lisa:prd-backlink` skill to write a `## Tickets` section back into the source PRD. The section becomes the canonical anchor for the **Debrief** flow once the initiative ships, and gives any human reading the PRD months later a one-click path to every work item created from it.
+Invoke `lisa:prd-backlink` with:
+- `source_type: "notion"`
+- `source_ref`: the original PRD URL
+- `tickets`: the full list created in Phases 3–5, each entry as `{ key, title, type, url, parent_key }`
+If `lisa:prd-backlink` fails (PRD permission denied, Notion unreachable, source mutated mid-run), surface the error in the Phase 6 report rather than aborting — the tickets are already created and their value to the team is not blocked by the back-link write. Recommend the user re-run `lisa:prd-backlink` standalone once the source is reachable again.
 ## Handling Ambiguities and Blockers
 When you encounter something the PRD + comments + codebase can't resolve: