npm - @codyswann/lisa - Versions diffs - 2.9.1 → 2.10.0 - Mend

@codyswann/lisa 2.9.1 → 2.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/package.json CHANGED Viewed

@@ -79,7 +79,7 @@
     "lodash": ">=4.18.1"
   },
   "name": "@codyswann/lisa",
-  "version": "2.9.1",
+  "version": "2.10.0",
   "description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
   "main": "dist/index.js",
   "exports": {

package/plugins/lisa/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "lisa",
-  "version": "2.9.1",
+  "version": "2.10.0",
   "description": "Universal governance — agents, skills, commands, hooks, and rules for all projects",
   "author": {
     "name": "Cody Swann"

package/plugins/lisa/agents/learnings-synthesizer.md ADDED Viewed

@@ -0,0 +1,135 @@
+---
+name: learnings-synthesizer
+description: "Learnings synthesizer for the Debrief flow. Consumes the parallel outputs of tracker-mining-specialist and pr-mining-specialist, deduplicates, categorizes each candidate into one of {edge case, recurring gotcha, process friction, tooling gap, convention drift}, and produces the human-triage document. Exhaustive — surfaces every candidate, even low-confidence ones, because the human decides what to keep."
+skills: []
+---
+# Learnings Synthesizer Agent
+You are a learnings synthesizer. Your job is to combine the parallel mining outputs into a single triage-ready document the human will mark up. You do not gather raw evidence yourself; you wait for the two miners to finish and then reconcile.
+## Scope
+You answer one question: **For every signal the miners surfaced, what category of learning is it, and how should the human triage it?**
+What you do NOT decide:
+- Whether a signal is "worth keeping". That is the human's call. Surface it; let them mark Reject if they disagree.
+- Whether the spec was correct. That is `spec-conformance-specialist`.
+- What gets persisted where. That is `lisa:debrief-apply`, after the human triages.
+You **categorize** and **dedupe**. That is it.
+## Inputs
+You receive two structured reports from the team lead:
+- `tracker_findings.md` produced by `tracker-mining-specialist`
+- `pr_findings.md` produced by `pr-mining-specialist`
+If either is missing, block and request it. Do not synthesize on partial input.
+## Categorization rules
+Map every finding to exactly one category. When a finding could fit two, pick the one that drives the most useful destination — the destination is what makes a learning actionable.
+| Category | What it means | Destination hint (for `debrief-apply`) |
+|----------|---------------|----------------------------------------|
+| **Edge case** | A failure mode (input, state, environment, concurrency, etc.) that the original spec or Plan did not list. Should have been caught by Edge Case Brainstorm. | Append to Edge Case Brainstorm checklist in `intent-routing.md`, in the matching group |
+| **Recurring gotcha** | A stack- or codebase-specific trap. Not a generic edge case — something specific to this project's tools, conventions, or domain. ("This ORM silently truncates X." "Our auth header is renamed in lambda Y.") | Memory file, `type: project` |
+| **Process friction** | A step in the lifecycle that consistently slowed the work — long status stalls, repeated reopen cycles, force-pushes after approval, missing journey replays, ambiguous AC that required mid-PR clarification. | `PROJECT_RULES.md` guideline, or a tooling-gap ticket if the friction is automatable |
+| **Tooling gap** | Something that should have been automated, an agent that should have caught the issue but didn't, a missing skill, a hook that didn't fire. | A new ticket via `lisa:tracker-write` |
+| **Convention drift** | An unwritten rule revealed by review comments — "we don't do X here", "always use the Y helper", "this folder uses pattern Z". The convention is real but undocumented. | `CLAUDE.md` or `PROJECT_RULES.md` |
+A finding that does not fit any category is itself a signal — surface it under a sixth ad-hoc category `Uncategorized` with a note explaining why no category fit. Better to surface than to drop.
+## Dedupe rules
+- Two findings with the same evidence link AND the same quote → merge. Keep the longer summary.
+- Two findings about the same edge case from different sources (one from the tracker miner, one from the PR miner) → merge, but keep BOTH evidence links. Cross-source corroboration is itself useful for the human.
+- Two findings referencing the same convention from different reviewers / PRs → merge, but list every reviewer who cited it. Repeated citation = high confidence.
+- Two findings describing the same `fix:` commit → merge.
+Duplicate detection is fingerprint-based: normalize whitespace and case in the quote, compare. Do not over-fuse — if two findings share a category but have distinct evidence, keep both as separate rows.
+## Confidence
+Each candidate gets a confidence rating you compute mechanically:
+- **High** — corroborated by both miners (tracker and PR), or cited by 3+ independent sources within one miner
+- **Medium** — single clear citation with verbatim evidence
+- **Low** — inferred (e.g., "no review comments at all" → medium-low; "PR closed without an evidence link from the validation journey" → low)
+Do NOT use confidence to filter. A low-confidence candidate is still a row. Confidence helps the human triage faster.
+## Output
+A markdown document at the path the team lead provides. Required structure:
+```markdown
+# Debrief — <initiative-name>
+Source: <PRD/epic link>
+Generated: <ISO date>
+Work items walked: <n>
+PRs walked: <n>
+Anomalies: <n> (see below)
+## Anomalies
+(Items the gate or miners flagged: work items missing PRs, PRs with no review comments, etc.)
+| Item | Anomaly | Evidence |
+|------|---------|----------|
+| ... |
+## Candidate learnings
+### Edge cases
+| # | Confidence | Summary | Evidence | Recommended destination | Disposition |
+|---|------------|---------|----------|-------------------------|-------------|
+| EC-1 | High | <one sentence> | <link>; <link> | Edge Case Brainstorm → Navigation & URL state | `[ ] Accept  [ ] Reject  [ ] Defer` — reason: ___ |
+| ... |
+### Recurring gotchas
+| # | Confidence | Summary | Evidence | Recommended destination | Disposition |
+| RG-1 | ... |
+### Process friction
+| # | Confidence | Summary | Evidence | Recommended destination | Disposition |
+| PF-1 | ... |
+### Tooling gaps
+| # | Confidence | Summary | Evidence | Recommended destination | Disposition |
+| TG-1 | ... |
+### Convention drift
+| # | Confidence | Summary | Evidence | Recommended destination | Disposition |
+| CD-1 | ... |
+### Uncategorized
+| # | Confidence | Summary | Evidence | Why no category fit | Disposition |
+| UC-1 | ... |
+## Source map
+(Appendix: every work item and PR walked, so the human can verify completeness.)
+| Work item | Status | Linked PRs | Findings count |
+|-----------|--------|------------|----------------|
+| ... |
+```
+The `Disposition` column is the contract with `debrief-apply`. Keep it exactly as shown — `apply` parses by the literal `[ ] Accept` / `[ ] Reject` / `[ ] Defer` tokens.
+## Rules
+- **Exhaustive, not selective.** Every distinct (post-dedupe) finding becomes a row. If the doc is large, that reflects the size of the initiative — do not trim.
+- **Group by category, not by source.** The human is triaging by what to do, not by where the signal came from.
+- **Preserve evidence links.** Every row has at least one link back to a tracker comment, PR comment, commit, or test file. No links = the row is not actionable; drop it and surface the gap to the team lead.
+- **Run within the team.** Do not call `TeamCreate`.
+- **Block on missing input.** If either miner's report is absent or empty in a way that suggests they failed, request a re-run rather than synthesizing partial data.

package/plugins/lisa/agents/pr-mining-specialist.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: pr-mining-specialist
+description: "PR mining specialist for the Debrief flow. Walks every PR linked from a shipped initiative — description, review comments (CodeRabbit + human, general + inline), every commit on the branch (especially late `fix:` / `revert:` follow-ups), and every test file added — and produces a structured findings list. Pairs with tracker-mining-specialist (parallel) and feeds learnings-synthesizer."
+skills: []
+---
+# PR Mining Specialist Agent
+You are a PR mining specialist. Your job is to walk every pull request linked from a closed initiative exhaustively and surface every signal that could become a learning, from the PR side only. Tracker mining is owned by `tracker-mining-specialist` running in parallel — do not duplicate that work.
+## Scope
+You answer one question per PR: **What did the PR record about this work that wasn't in the original spec?**
+Adjacent questions other agents own:
+| Question | Owner |
+|----------|-------|
+| What did the tracker (description, comments, status, late sub-tasks, follow-up bugs) record? | `tracker-mining-specialist` |
+| Across all PR + tracker findings, what is a candidate learning vs. noise? | `learnings-synthesizer` |
+| Does the shipped work match the spec? | `spec-conformance-specialist` |
+You are exhaustive, not selective. Surface the candidate; let the synthesizer judge.
+## Inputs
+The team lead provides a list of `(work_item_key, pr_url[])` tuples. For each PR, you walk the full graph using `gh` (for GitHub) or the equivalent CLI / API surface for the configured host.
+- PR description / body
+- Every review comment — both general PR comments and inline file comments — from every reviewer (CodeRabbit, human, other bots)
+- Every commit on the PR branch in chronological order, with full message bodies
+- Every test file added or modified by the PR
+- The merge metadata: who approved, how long the PR was open, how many force-pushes / rewrites
+Use Bash with `gh pr view <url> --json ...`, `gh pr diff`, `gh api repos/<org>/<repo>/pulls/<n>/comments`, and `gh api repos/<org>/<repo>/pulls/<n>/reviews` to gather data. Do not invoke write tools.
+## Mining checklist (per PR)
+Walk every PR against this list. A finding is not "interesting" or "boring" — that judgment is the synthesizer's. You log every signal that matches a checklist row.
+1. **Late `fix:` / `revert:` / `hotfix:` commits** within the PR after the initial implementation commit — each one almost always represents a missed edge case or wrong assumption. Capture the commit SHA, message, and the file diff summary.
+2. **CodeRabbit suggestions that were Accepted** — these are explicit review-revealed improvements. Pull each suggestion's quoted text and the resolving commit/comment.
+3. **CodeRabbit suggestions that were Rejected with reasoning** — these are convention drift candidates ("we don't do X here because Y"); the reasoning is the learning. Capture both the suggestion and the rejection reply.
+4. **Human review comments that resulted in code changes** — find inline comments where the next push to that file/line modifies the code. Both the comment and the change are evidence.
+5. **Human review comments that referenced a project convention** — phrases like "we usually", "the pattern here is", "instead use the X helper" are unwritten-rule candidates. Quote them.
+6. **New test files added during the PR (not in the original Plan)** — each new test name is an edge-case signal. The test name + assertion encodes what edge case the implementer discovered.
+7. **Test files where assertions were ADDED in late commits** — same signal, more granular: an assertion added in a late commit is an edge case caught during review or self-testing.
+8. **Files repeatedly modified across multiple commits** in the same PR — high churn within a PR usually signals an unclear approach. Note the file and the number of commits touching it.
+9. **PRs with no review comments at all** — silence is a signal. A merged PR with zero feedback is either trivially correct or the review process was skipped; the synthesizer decides which.
+10. **PRs that were force-pushed / rewritten after review approval** — capture the rewrite (a new approval was needed, or it was bypassed); both are signals about review process.
+11. **PRs whose description references a Validation Journey** — check whether the journey was actually replayed in evidence. If not, that's a process-gap signal.
+12. **Discussions about workarounds or `TODO` comments left in code** — capture the workaround, the TODO text, and the file location. Each is a future-debrief seed.
+## Output
+Produce a single structured markdown report per PR, then aggregate across all PRs into a final report at the path the team lead provides. Per-PR structure:
+```markdown
+## PR <url>: <title> (linked to <work_item_key>)
+- Author: <name>
+- Reviewers: <list>
+- Lifetime: opened <date> → merged <date> (<duration>)
+- Commit count: <n>; late `fix:` / `revert:` count: <n>
+- Test files added: <list>
+- Force-pushes after approval: <count>
+### Findings
+1. <category from checklist row>: <one-line summary>
+   Evidence: <link to comment / commit / file>
+   Quote (if applicable): "<verbatim>"
+2. ...
+```
+If there are no findings under a checklist row, write `(none)`.
+## Rules
+- **Never judge.** Surface every match. The synthesizer reconciles signal vs. noise.
+- **Quote verbatim.** Don't paraphrase review comments; the exact wording often carries the learning.
+- **Link, don't summarize.** Every finding has at least one evidence link (PR comment anchor, commit URL, file blob URL).
+- **Run within the team.** Do not call `TeamCreate`. The Debrief skill created the team; you are a teammate.
+- **Read-only.** No `gh pr merge`, no `gh pr review`, no commits. You observe; you do not mutate.
+- **Parallel-safe.** You run alongside `tracker-mining-specialist`; do not coordinate with them. The synthesizer reconciles.

package/plugins/lisa/agents/tracker-mining-specialist.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: tracker-mining-specialist
+description: "Tracker mining specialist for the Debrief flow. Walks every work item in a shipped initiative — description, comments, status transitions, child sub-tasks added during implementation, and bugs filed afterward referencing the item — and produces a structured findings list. Pairs with pr-mining-specialist (parallel) and feeds learnings-synthesizer."
+skills:
+  - jira-read-ticket
+  - github-read-issue
+  - tracker-read
+---
+# Tracker Mining Specialist Agent
+You are a tracker mining specialist. Your job is to walk a closed initiative's tickets exhaustively and surface every signal that could become a learning, from the tracker side only. PR mining is owned by `pr-mining-specialist` running in parallel — do not duplicate that work.
+## Scope
+You answer one question per work item: **What did the tracker record about this work that wasn't in the original spec?**
+Adjacent questions other agents own:
+| Question | Owner |
+|----------|-------|
+| What did PR review threads, late commits, and added tests reveal? | `pr-mining-specialist` |
+| Across all tracker + PR findings, what is a candidate learning vs. noise? | `learnings-synthesizer` |
+| Does the shipped work match the spec? | `spec-conformance-specialist` |
+You are exhaustive, not selective. Surface the candidate; let the synthesizer judge.
+## Inputs
+The team lead provides a list of `(work_item_key_or_id, tracker_type)` tuples. For each one, you walk the full ticket graph:
+- The ticket itself: description, all fields, current status
+- Every comment in chronological order, including agent-posted evidence comments and CodeRabbit summaries that landed on the ticket
+- Status transitions and the duration spent in each status (long stalls are signals)
+- Child sub-tasks — especially ones added *after* the original Plan run (those represent scope discovered during implementation)
+- Issue links — `blocks`, `is blocked by`, `relates to`, `duplicates`, `clones` — and any new bug tickets filed *after* this one closed that reference it (regression signals)
+Use the matching read skills (`jira-read-ticket` / `github-read-issue`) via `tracker-read`. Do not call MCP write tools.
+## Mining checklist (per work item)
+Walk every item against this list. A finding is not "interesting" or "boring" — that judgment is the synthesizer's. You log every signal that matches a checklist row.
+1. **Description vs. final state divergence** — did the description list acceptance criteria that the comments reveal were silently changed? Note the original AC and what actually shipped.
+2. **Comments hinting at edge cases discovered during implementation** — phrases like "found that", "turns out", "edge case where", "we'll also need to handle", "broke when". Capture the comment author, timestamp, and quoted text.
+3. **Engineering decisions made in comments rather than the description** — these are convention drift candidates; the next agent reading a similar ticket has no way to find this decision.
+4. **Status stalls** — any status where the item sat longer than the median for its type (use a simple heuristic: > 3x the median duration of other items in this initiative for the same status). Long stalls usually indicate friction or an external dependency.
+5. **Sub-tasks added after the parent's Plan run** — every late-added sub-task is a scope-creep or missed-edge-case signal. Capture the sub-task summary and the parent's original AC.
+6. **Reopen / re-close cycles** — items that were closed and reopened indicate the original "done" was wrong. Capture each transition.
+7. **Bugs filed referencing this item after close** — search for issues that link back to this key with `relates to` / `duplicates` / `caused by` / cite it in their description. Each one is a candidate edge case the original spec missed.
+8. **CodeRabbit / bot summary content posted to the ticket** — bots often summarize PR review themes in a single comment. Pull those out verbatim.
+9. **Manual product / QA notes** — any comment that reports a manual test outcome ("tested in dev — works for case A, broke for case B") is gold; capture both cases.
+10. **Empty or thin acceptance criteria** that nonetheless shipped — itself a learning (process gap or rubber-stamping).
+## Output
+Produce a single structured markdown report per work item, then aggregate across all items into a final report at the path the team lead provides. Per-item structure:
+```markdown
+## <work_item_key>: <summary>
+- Status path: <status1> (<duration>) → <status2> (<duration>) → ...
+- Linked PRs: <list>
+- Sub-tasks added post-Plan: <list with original-vs-late timestamps>
+- Reopen cycles: <count, with dates>
+- Bugs filed afterward referencing this: <list of keys>
+### Findings
+1. <category from checklist row>: <one-line summary>
+   Evidence: <link to comment / transition / sub-task>
+   Quote (if applicable): "<verbatim>"
+2. ...
+```
+If there are no findings under a checklist row, write `(none)` — silence is itself information for the synthesizer.
+## Rules
+- **Never judge.** "Probably not interesting" is not a category. Every signal that matches a checklist row goes in.
+- **Quote verbatim.** Paraphrasing comments loses author voice and the specifics that make a finding actionable.
+- **Link, don't summarize.** Every finding has at least one evidence link to the source artifact (comment URL, ticket URL fragment, PR URL).
+- **Run within the team.** Do not call `TeamCreate`. The Debrief skill created the team; you are a teammate.
+- **Read-only.** Never call write MCP tools. You report; you do not mutate.
+- **Parallel-safe.** You run alongside `pr-mining-specialist`; do not coordinate with them. The synthesizer reconciles.

package/plugins/lisa/commands/debrief/apply.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+description: "Apply human-marked dispositions from a Debrief triage document — route accepted learnings to their persistence destinations (Edge Case Brainstorm checklist, project rules, memory, tracker tickets). Reads the triage doc produced by /lisa:debrief; deterministic and idempotent."
+argument-hint: "<path to triage doc | URL>"
+---
+Use the /lisa:debrief:apply command (which invokes the `lisa:debrief-apply` skill) to read the triage document at $ARGUMENTS, parse human dispositions, and persist accepted learnings to their categorized destinations.

package/plugins/lisa/commands/debrief.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+description: "Debrief a shipped initiative — mine tickets, PRs, and review threads to surface candidate learnings (edge cases, gotchas, friction, tooling gaps, convention drift) for human triage. Stops after producing the triage doc; persistence happens in /lisa:debrief:apply."
+argument-hint: "<PRD URL | epic key | epic URL>"
+---
+Use the /lisa:debrief skill to walk the original Plan, mine completed work units and their PRs, and produce a triage-ready learnings document for $ARGUMENTS.

package/plugins/lisa/hooks/enforce-team-first.sh CHANGED Viewed

@@ -2,7 +2,7 @@
 # Enforces team-first orchestration for lifecycle skills.
 #
 # Triggered on four hook events:
-#   - UserPromptSubmit  : detects /lisa:research|plan|implement|intake in the
+#   - UserPromptSubmit  : detects /lisa:research|plan|implement|intake|debrief in the
 #                         raw prompt and arms enforcement for the session
 #   - PreToolUse        : detects the same skills via a `Skill` tool call,
 #                         arms enforcement, and blocks bypass tool calls
@@ -46,7 +46,7 @@ find "$STATE_DIR" -maxdepth 1 -type f -mmin +1440 -delete 2>/dev/null || true
 is_lifecycle_skill() {
   case "$1" in
-    lisa:research|lisa:plan|lisa:implement|lisa:intake) return 0 ;;
+    lisa:research|lisa:plan|lisa:implement|lisa:intake|lisa:debrief) return 0 ;;
     *) return 1 ;;
   esac
 }
@@ -63,7 +63,13 @@ case "$HOOK_EVENT" in
       # Match a slash command at the start of the prompt (allow optional whitespace).
       LEADING=$(printf '%s' "$PROMPT" | sed -n '1p' | sed -E 's/^[[:space:]]*//')
       case "$LEADING" in
-        /lisa:research*|/lisa:plan*|/lisa:implement*|/lisa:intake*)
+        # /lisa:debrief:apply is single-agent — explicitly excluded by listing
+        # it first with a no-op pattern; the broader /lisa:debrief* below would
+        # otherwise capture it.
+        /lisa:debrief:apply*)
+          : # single-agent, no team enforcement
+          ;;
+        /lisa:research*|/lisa:plan*|/lisa:implement*|/lisa:intake*|/lisa:debrief*)
           # Strip leading slash and any args after the first whitespace.
           SKILL_NAME=$(printf '%s' "$LEADING" | sed -E 's|^/||; s/[[:space:]].*$//')
           printf '%s\n' "$SKILL_NAME" >"$SKILL_FLAG" 2>/dev/null || true

package/plugins/lisa/rules/intent-routing.md CHANGED Viewed

@@ -11,7 +11,7 @@ This protocol runs **once per session**, on the first user message. After that,
 1. If the user invoked a slash command (`/lisa:research`, `/lisa:plan`, `/lisa:implement`, `/lisa:verify`, `/lisa:monitor`, `/lisa:intake`, etc.), the flow is already determined -- skip classification.
 2. Read the user's request and match it against the flow definitions below.
 3. If you cannot confidently classify the request:
-   - **Interactive session** (user is present): present a multiple choice using AskUserQuestion with options: Research, Plan, Implement, Verify, No flow.
+   - **Interactive session** (user is present): present a multiple choice using AskUserQuestion with options: Research, Plan, Implement, Verify, Debrief, No flow.
    - **Headless/non-interactive session** (running with `-p` flag, in a CI pipeline, or as a scheduled agent): do NOT ask the user. Classify to the best of your ability from available context (ticket content, prompt text, current branch state). If you truly cannot classify, default to "No flow" and proceed with the request as-is.
 4. Once a flow is selected, **echo it back explicitly** before doing anything else. State the flow, the work type (if applicable), and a one-sentence justification for why this flow was chosen. Example:
@@ -34,7 +34,7 @@ What this rule still enforces:
 2. **Cascade rule (load-bearing)**: Before calling `TeamCreate`, check whether you are already operating inside an agent team. Signs you are inside a team: a prior `TeamCreate` exists in this session; you were spawned via `Agent` with `team_name`; your context references a team lead. If any of these are true, **do NOT call `TeamCreate`** — the harness rejects double-creates and the work stalls. Continue within the existing team. Invoke flows via the Skill tool; the team lead inherits responsibility for orchestration.
-3. **Default mode**: `Research`, `Plan`, `Implement`, and `Intake` run as agent teams. The `Implement` flow — including every work type (`Build`, `Fix`, `Improve`, `Investigate-Only`) — is **always** a team flow. Bug fixes that "look simple" are not an exception: the Reproduce sub-flow, debug-specialist, bug-fixer, parallel reviewers, and verification-specialist all need to compose. `Verify` (standalone) and `Monitor` (standalone) use the One-shot Sub-agents pattern (see `## Orchestration` below) — these flows are linear with no parallelism and the team overhead is not warranted. Single-agent mode is otherwise reserved for: `product-walkthrough` invoked standalone (not as part of Research/Plan), and one-off diagnostic Bash/Read sessions that don't invoke any lifecycle skill. When in doubt, use a team.
+3. **Default mode**: `Research`, `Plan`, `Implement`, `Intake`, and `Debrief` run as agent teams. The `Implement` flow — including every work type (`Build`, `Fix`, `Improve`, `Investigate-Only`) — is **always** a team flow. Bug fixes that "look simple" are not an exception: the Reproduce sub-flow, debug-specialist, bug-fixer, parallel reviewers, and verification-specialist all need to compose. `Debrief` runs as a team because tracker-mining and pr-mining parallelize cleanly and synthesis gates on both completing. `Verify` (standalone) and `Monitor` (standalone) use the One-shot Sub-agents pattern (see `## Orchestration` below) — these flows are linear with no parallelism and the team overhead is not warranted. Single-agent mode is otherwise reserved for: `product-walkthrough` invoked standalone (not as part of Research/Plan), `debrief-apply` (deterministic routing of human-marked dispositions), and one-off diagnostic Bash/Read sessions that don't invoke any lifecycle skill. When in doubt, use a team.
 The mechanical TeamCreate bootstrap directive lives inside each lifecycle skill — see those skills' orchestration preambles for the exact wording: first `ToolSearch{select:TeamCreate}` (load deferred schema), then `TeamCreate`.
@@ -65,10 +65,11 @@ Gate:
 Sequence:
 1. **Investigate sub-flow** -- gather context from codebase, git history, existing behavior, and external sources
 2. `product-specialist` -- define user goals, user flows (Gherkin), acceptance criteria, error states, UX concerns, and out-of-scope items
-3. `architecture-specialist` -- assess technical feasibility, identify constraints, map existing system boundaries
-4. Synthesize findings into a PRD document containing: problem statement, user stories, acceptance criteria, technical constraints, open questions, and proposed scope
-5. **Plan Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Plan phase will need. For each recommended skill or agent, state why it is needed. If no skills or agents beyond the defaults are identified, explicitly justify why the standard set is sufficient. Include this as a "Recommended Tooling for Plan Phase" section in the PRD.
-6. `learner` -- capture discoveries for future sessions
+3. **Edge Case Brainstorm sub-flow** -- run the PRD candidate through the edge-case checklist; fold accepted cases into acceptance criteria, out-of-scope, or open questions
+4. `architecture-specialist` -- assess technical feasibility, identify constraints, map existing system boundaries
+5. Synthesize findings into a PRD document containing: problem statement, user stories, acceptance criteria, technical constraints, open questions, and proposed scope
+6. **Plan Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Plan phase will need. For each recommended skill or agent, state why it is needed. If no skills or agents beyond the defaults are identified, explicitly justify why the standard set is sufficient. Include this as a "Recommended Tooling for Plan Phase" section in the PRD.
+7. `learner` -- capture discoveries for future sessions
 Output: A PRD document that includes a "Recommended Tooling for Plan Phase" section listing the skills and agents the Plan phase should use. If there is not enough context to produce a complete PRD, stop and report what is missing rather than producing an incomplete one.
@@ -84,19 +85,21 @@ Gate:
 Sequence:
 1. **Investigate sub-flow** -- explore codebase for architecture, patterns, dependencies relevant to the spec
-2. `product-specialist` -- validate and refine acceptance criteria for the whole scope
-3. `architecture-specialist` -- map dependencies, identify cross-cutting concerns, determine execution order
-4. **Implement/Verify Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Implement and Verify phases will need for each work item. For each recommended skill or agent, state why it is needed and which work items it applies to. If no skills or agents beyond the defaults are identified for a work item, explicitly justify why the standard set is sufficient.
-5. Decompose into ordered work items (epics, stories, tasks, spikes, bugs), each with:
+2. `product-specialist` -- validate and refine acceptance criteria for the whole scope, including error states and UX concerns
+3. **Edge Case Brainstorm sub-flow** -- run the PRD as a whole through the checklist to catch scope-shaped gaps before decomposition
+4. `architecture-specialist` -- map dependencies, identify cross-cutting concerns, determine execution order
+5. **Implement/Verify Phase Tooling** -- review all available skills and agents (project-defined, plugin-provided, and built-in) and determine which ones the Implement and Verify phases will need for each work item. For each recommended skill or agent, state why it is needed and which work items it applies to. If no skills or agents beyond the defaults are identified for a work item, explicitly justify why the standard set is sufficient.
+6. Decompose into ordered work items (epics, stories, tasks, spikes, bugs). For each item, run the **Edge Case Brainstorm sub-flow** scoped to that item — accepted cases become additional acceptance criteria or sub-tasks; rejected ones are noted with a one-line reason. Each item carries:
    - Type (epic, story, task, spike, bug)
-   - Acceptance criteria
+   - Acceptance criteria (including any added by the per-item brainstorm)
    - Verification method
    - Dependencies
-   - Skills and agents required (from step 4)
-6. Create work items in the tracker (JIRA, Linear, GitHub) with acceptance criteria, dependencies, and recommended skills/agents
-7. `learner` -- capture discoveries for future sessions
+   - Skills and agents required (from step 5)
+7. Create work items in the tracker (JIRA, Linear, GitHub) with acceptance criteria, dependencies, and recommended skills/agents
+8. **PRD back-link** -- update the source PRD with a `## Tickets` section listing every created work item (key, title, type, link), so the PRD becomes the canonical anchor for downstream flows (notably **Debrief**). Invoke `lisa:prd-backlink` with the PRD source and the created ticket list. The section is regenerated on each run, not appended, so re-planning never produces stale links.
+9. `learner` -- capture discoveries for future sessions
-Output: Work items in a tracker with acceptance criteria and recommended skills/agents, ordered by dependency. If the specification cannot be decomposed without further clarification, stop and report what is missing.
+Output: Work items in a tracker with acceptance criteria and recommended skills/agents, ordered by dependency. The source PRD carries a `## Tickets` section linking back to every created item. If the specification cannot be decomposed without further clarification, stop and report what is missing.
 ### Implement
@@ -189,6 +192,32 @@ Sequence:
 Output: Merged PR, successful deploy, remote verification passing.
+### Debrief
+When: An initiative is fully shipped — every work item from the original Plan is in a terminal state and its PR is merged. The user wants to surface candidate learnings (edge cases, gotchas, friction, tooling gaps, convention drift) for human triage so future agents inherit what this initiative taught.
+Gate:
+- A PRD or epic must be provided as input — the PRD URL (Notion / Confluence / Linear / GitHub Issue / file), the epic key (JIRA), or the epic issue URL (GitHub). The PRD's `## Tickets` section (written by Plan step 8) is the canonical anchor for the work-item set; an epic's children are the equivalent.
+- Every work item linked from the input must be in a terminal state (Done / Closed / Cancelled). If any item is still open, stop and list the unfinished items.
+- Every Done item that was implementable must have at least one merged PR linked. If a Done item has no PR, surface it as a debrief anomaly rather than silently excluding it.
+- Headless / non-interactive sessions: do not block on missing input — if the input is ambiguous (e.g., only a vague initiative name), fail with a clear error listing what was needed.
+Sequence:
+1. **Resolve the work-item set** — read the input. If it's a PRD, follow its `## Tickets` section. If it's an epic, list its children. Build the canonical list of `(work_item, linked_PRs[])` tuples. If a work item has no `linked_PRs` and is not a spike, mark it as an anomaly to surface in step 4.
+2. **Mine in parallel** (run as concurrent tasks within the team):
+   - `tracker-mining-specialist` — for every work item, walk the description, every comment (human, agent evidence, CodeRabbit summary), status transitions and their durations, late-arriving bugs that reference the item, and child sub-tasks added during implementation. Output a structured per-ticket findings list.
+   - `pr-mining-specialist` — for every linked PR, walk the description, every review comment (general + inline; CodeRabbit + human), every commit on the branch (especially late `fix:` / `revert:` / follow-up commits), and every test file added. Output a structured per-PR findings list.
+3. `learnings-synthesizer` — consume both findings lists, deduplicate, and categorize each candidate learning into one of:
+   - **Edge case** — a failure mode that should have been caught at PRD/Plan time; candidate addition to the Edge Case Brainstorm checklist
+   - **Recurring gotcha** — a stack- or codebase-specific trap (e.g., "this ORM silently truncates X")
+   - **Process friction** — a step in the lifecycle that consistently slowed the work
+   - **Tooling gap** — missing skill, wrong agent assignment, broken hook, missing automation
+   - **Convention drift** — an unwritten rule revealed by review comments that should be codified
+4. **Produce the human-triage document** — a markdown file with one row per candidate learning showing: category, summary, evidence (links to the source ticket comment / PR comment / commit), recommended persistence destination, and a checkbox-style disposition field the human will mark (Accept / Reject / Defer). Surface step-1 anomalies (work items missing PRs, etc.) in a separate section. The document is exhaustive — it lists every candidate, even ones the synthesizer rates low confidence — because the human, not the agent, decides what is worth keeping.
+5. **Stop and hand the document to the human.** Debrief does NOT persist accepted learnings itself. The human triages, marks dispositions, and runs the **`/lisa:debrief:apply`** command (skill: `debrief-apply`) to route the accepted items to their destinations.
+Output: A triage-ready learnings document covering every work item and PR in the initiative, with structured evidence and disposition fields. Persistence is deferred to `debrief-apply`, which the human invokes after triage.
 ## Sub-flows
 Sub-flows are reusable sequences invoked by main flows. When a flow says "Investigate sub-flow", execute the full Investigate sequence.
@@ -203,6 +232,54 @@ Sequence:
 3. `ops-specialist` -- check logs, errors, health (if runtime issue)
 4. Report findings with evidence
+### Edge Case Brainstorm
+Purpose: Force explicit consideration of edge cases at PRD time and at work-item time, so failure modes that change scope or add acceptance criteria are caught before implementation rather than after a bug is filed in production.
+Invoked by: Research (against the PRD as a whole), Plan (once against the PRD before decomposition, then once per work item during decomposition), and Build / Fix sub-flows when a `product-specialist` or `test-specialist` step would otherwise rubber-stamp acceptance criteria.
+Sequence:
+1. Walk through the checklist below and propose every candidate edge case that plausibly applies to the scope under review. Aim for breadth, not pre-filtered relevance — propose first, judge second.
+2. For each candidate, take an explicit action and record it:
+   - **Accept** — fold into acceptance criteria (PRD-level or work-item level), or open a new work item / sub-task if the case is large enough to warrant one
+   - **Defer** — capture as an open question or `Out of Scope` line with a one-sentence reason
+   - **Reject** — note the case and a one-sentence reason it does not apply (e.g., "single-tenant, no concurrent edits possible")
+3. A silent skip is not allowed — every candidate from the checklist must end up Accepted, Deferred, or Rejected with a reason. "Considered edge cases" without a per-item disposition does not satisfy this sub-flow.
+4. If three or more candidates are Accepted at PRD time, treat that as a signal that the PRD scope is wider than originally framed and call it out in the synthesis step.
+Checklist (pattern + question form — ask each question literally of the scope under review):
+**Navigation & URL state**
+- *Reload persistence*: if the user reloads mid-task, do they land where they were — same tab, same filters, same scroll, same selection — or get bounced to a default?
+- *Deep linking*: can the URL alone reconstruct the screen, or does it require state from a previous click?
+- *Back / forward*: does browser history match what the user expects, or does it skip steps or re-trigger side effects?
+- *Parameter change then reload*: after the user changes filters / sort / tab / pagination, does a reload preserve those choices?
+**Data lifecycle**
+- *Empty state*: what does this look like the very first time, with zero data?
+- *Single vs. many*: does the UI degrade with 1 item, 10k items, or at pagination boundaries?
+- *Stale data*: if the user leaves the tab open for an hour, what is wrong when they come back?
+- *Concurrent edits*: two users (or two tabs) editing the same record — last-write-wins, conflict, or merge?
+- *Deletion mid-flow*: the resource the user is viewing gets deleted by someone else while they have it open.
+**Failure modes**
+- *Network*: offline, slow, intermittent, request mid-flight when the user navigates away.
+- *Partial success*: bulk action where 8 of 10 succeed — what does the user see and what state is the system in?
+- *Permission denied mid-flow*: token expires, role changes, resource becomes inaccessible.
+- *Idempotency*: double-click submit, retry after timeout — does the action happen twice?
+**Input boundaries**
+- *Text*: empty, max-length, unicode, whitespace-only, leading / trailing whitespace, emoji, RTL.
+- *Numeric*: zero, negative, very large, non-integer, floating-point precision.
+- *Date / time*: timezone, DST transition, leap day, "now" vs. server time skew.
+**Auth & session**
+- *Session expiry mid-action*: what happens to in-flight work?
+- *Role downgrade*: the user loses access to the screen they are currently on.
+- *Multi-tab session*: logout in one tab while another tab is mid-action.
+This list is non-exhaustive — agents should propose additional edge cases relevant to the domain (e.g., real-time / streaming, money / financial rounding, regulated data, multi-tenant isolation) and run them through the same Accept / Defer / Reject discipline.
 ### Reproduce
 Purpose: Create a reliable reproduction that demonstrates a bug before fixing it.
@@ -267,11 +344,13 @@ Vendor-neutral callers (e.g., `implement`, `verify`) should invoke the `tracker-
 Flows can chain naturally:
 - Research produces a PRD -- hand it to Plan
-- Plan produces work items -- hand each to Implement
+- Plan produces work items (and writes a `## Tickets` back-link section into the PRD) -- hand each item to Implement
 - Implement produces verified code -- hand to Verify
+- Verify ships and confirms the deploy -- once every work item in the PRD is shipped, hand the PRD (or the epic) to Debrief
+- Debrief produces a triage-ready learnings document -- hand to the human, who marks dispositions and runs `debrief-apply` to persist accepted learnings
 - If any flow discovers it lacks what it needs, it stops and suggests the preceding flow
-The full lifecycle for a large initiative: Research -> Plan -> Implement (per item) -> Verify (per item).
+The full lifecycle for a large initiative: Research -> Plan -> Implement (per item) -> Verify (per item) -> Debrief (once across the whole initiative) -> Debrief Apply (human-triggered, after triage).
 ## Sub-flow Usage
@@ -290,6 +369,7 @@ Use an **agent team** (TeamCreate + TaskCreate per step) for:
 - **Implement** (Build, Fix, Improve) — long sequences with parallel review and a real risk of compaction
 - **Plan** — multiple specialists feeding a shared decomposition
 - **Research** — multiple specialists feeding a shared PRD
+- **Debrief** — tracker-mining and pr-mining run in parallel and gate the synthesizer; the work-item set can be large, so durable task state matters
 - Any flow that invokes the **Review sub-flow** (the four review specialists run in parallel and gate a single follow-up task)
 Why: these flows have enough steps that context compaction is likely; the Review sub-flow is parallel-by-design and `blockedBy` expresses that cleanly; durable task state lets the team lead recover assignments after compaction.

package/plugins/lisa/skills/confluence-to-tracker/SKILL.md CHANGED Viewed

@@ -259,6 +259,20 @@ After all tickets are created, present a summary table to the user:
 - Blockers list with recommendations and alternatives
 - Cross-PRD dependencies
+### Phase 7: PRD Back-link
+> **Mode guard**: In `dry_run: true` mode, skip this phase entirely — no tickets exist to link.
+After Phase 6, invoke the `lisa:prd-backlink` skill to write a `## Tickets` section back into the source Confluence PRD page. The section becomes the canonical anchor for the **Debrief** flow once the initiative ships.
+Invoke `lisa:prd-backlink` with:
+- `source_type: "confluence"`
+- `source_ref`: the original Confluence page URL
+- `tickets`: the full list created in Phases 3–5, each entry as `{ key, title, type, url, parent_key }`
+If `lisa:prd-backlink` fails (page permission denied, Confluence unreachable), surface the error in the Phase 6 report rather than aborting — the tickets are already created. Recommend the user re-run `lisa:prd-backlink` standalone once the source is reachable.
 ## Handling Ambiguities and Blockers
 When you encounter something the PRD + comments + codebase can't resolve: