npm - convoke-agents - Versions diffs - 2.3.1 → 2.4.0 - Mend

convoke-agents 2.3.1 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/_bmad/bme/_enhance/workflows/initiatives-backlog/steps-t/step-t-03-score.md ADDED Viewed

@@ -0,0 +1,147 @@
+---
+name: 'step-t-03-score'
+description: 'Propose RICE scores for confirmed findings, validate at Gate 2, calculate composite scores'
+nextStepFile: '{project-root}/_bmad/bme/_enhance/workflows/initiatives-backlog/steps-t/step-t-04-update.md'
+outputFile: '{planning_artifacts}/initiatives-backlog.md'
+templateFile: '{project-root}/_bmad/bme/_enhance/workflows/initiatives-backlog/templates/rice-scoring-guide.md'
+advancedElicitationTask: '{project-root}/_bmad/core/workflows/advanced-elicitation/workflow.md'
+partyModeWorkflow: '{project-root}/_bmad/core/workflows/bmad-party-mode/workflow.md'
+---
+# Step 3: RICE Scoring & Gate 2 Validation
+## STEP GOAL:
+Propose RICE scores for each confirmed finding from Gate 1, present the scored batch for user validation at Gate 2, and calculate composite scores with proper sorting.
+## MANDATORY EXECUTION RULES (READ FIRST):
+### Universal Rules:
+- 🛑 NEVER generate content without user input at Gate 2
+- 📖 CRITICAL: Read this complete step file before taking action
+- 🔄 CRITICAL: When loading next step with 'C', read the entire file
+- 📋 YOU ARE A SCORING ANALYST proposing calibrated RICE scores
+### Role Reinforcement:
+- ✅ You are a **RICE scoring analyst** — systematic, calibrated, evidence-based
+- ✅ Propose scores grounded in the scoring guide's definitions and calibration examples
+- ✅ The user validates and adjusts your proposals at Gate 2 — you propose, they decide
+- ✅ Compare proposed scores against existing backlog items for calibration consistency
+### Step-Specific Rules:
+- 🎯 Focus on scoring, rationale, and composite calculation
+- 🚫 FORBIDDEN to write to the backlog file (that is step-t-04's job)
+- 🚫 FORBIDDEN to re-extract or re-classify findings (that was step-t-02's job)
+- 🚫 FORBIDDEN to add new findings at Gate 2 (that was Gate 1's job — only drops allowed here)
+- 💬 Approach: propose entire batch at once so user sees relative positioning, then collaborative refinement
+## EXECUTION PROTOCOLS:
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 📖 Load {templateFile} for RICE factor definitions, scales, and calibration examples
+- 💾 Recalculate and re-sort after every Gate 2 adjustment
+## CONTEXT BOUNDARIES:
+- Available context: Confirmed findings from Gate 1, existing backlog (if loaded), RICE scoring guide template
+- Focus: Scoring and Gate 2 validation only
+- Limits: Do NOT write to backlog or modify extraction results
+- Dependencies: step-t-02-extract.md (confirmed findings from Gate 1)
+## MANDATORY SEQUENCE
+**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
+### 1. Load RICE Scoring Guide
+Load `{templateFile}` (rice-scoring-guide.md) and internalize:
+- **Factor definitions:** Reach (1-10), Impact (0.25-3), Confidence (20-100%), Effort (1-10)
+- **Guided questions** for each factor
+- **Calibration examples** from the existing backlog (study the reasoning, not just the numbers)
+- **Composite formula:** Score = (R x I x C) / E, where C is decimal (e.g., 70% = 0.7)
+- **Score rounding:** One decimal place for display
+### 2. Propose RICE Scores for All Findings
+For each confirmed finding from Gate 1, propose RICE scores using the guided questions:
+- **Reach (1-10):** "How many users per quarter will this affect?"
+- **Impact (0.25-3):** "What's the per-user impact?"
+- **Confidence (20-100%):** "How confident are we in these estimates?" Default to 50% when no direct evidence exists.
+- **Effort (1-10):** "Relative effort in story points?"
+For each score, write a **one-line rationale** explaining the scoring basis (FR12). Example:
+> **#1: Add output examples for Noah agent** — R:5 I:1 C:70% E:2 = 1.8
+> *Reach 5: affects users checking agent outputs. Impact 1: helpful but workarounds exist. Confidence 70%: pattern validated with other agents. Effort 2: single file addition.*
+**Calibration check:** Mentally compare each proposed score against 2-3 existing backlog items at similar scale. If the score would rank the item significantly above or below where it "feels" relative to those items, revisit the component scores.
+### 3. Calculate Composite Scores and Sort
+For each finding:
+1. Calculate composite: Score = (Reach x Impact x Confidence) / Effort
+2. Round to one decimal place
+3. Verify score falls within expected range (~0.0 to ~30.0; existing backlog ranges ~0.2 to ~10.0)
+Sort the batch:
+1. **Primary:** Descending by composite score
+2. **Tiebreak 1:** Higher Confidence first
+3. **Tiebreak 2:** Newer insertion order first
+### 4. Present Scoring Batch (Gate 2)
+Display the scored results:
+> **Gate 2 — Review Proposed RICE Scores**
+>
+> **Scored findings: [N]**
+>
+> | # | Finding | R | I | C | E | Score | Rationale |
+> |---|---------|---|---|---|---|-------|-----------|
+> | 1 | [title] | 5 | 2 | 80% | 3 | 2.7 | [one-line rationale] |
+> | 2 | [title] | 3 | 1 | 60% | 2 | 0.9 | [one-line rationale] |
+> | 3 | [title] | 7 | 0.5 | 50% | 1 | 1.8 | [one-line rationale] |
+>
+> *Sorted by composite score (descending). Formula: (R x I x C) / E*
+### 5. Present GATE 2 MENU OPTIONS
+Display:
+> **Gate 2 — Adjust scores or finalize:**
+>
+> **Score adjustments** (by item number):
+> - `#N R [value]` — Change Reach (1-10)
+> - `#N I [value]` — Change Impact (0.25, 0.5, 1, 2, or 3)
+> - `#N CF [value]` — Change Confidence (20-100%)
+> - `#N E [value]` — Change Effort (1-10)
+>
+> **Batch editing:**
+> - `D #N` — Drop item #N from the batch (will not be added to backlog)
+>
+> **[A] Advanced Elicitation** — Deeper analysis of scoring rationale
+> **[P] Party Mode** — Multi-perspective scoring discussion
+> **[C] Continue** — Finalize scores and proceed to backlog update
+#### Menu Handling Logic:
+- IF `#N R [value]`: Update Reach for item #N. Recalculate composite. Re-sort batch. Redisplay table and menu.
+- IF `#N I [value]`: Update Impact for item #N. Recalculate composite. Re-sort batch. Redisplay table and menu.
+- IF `#N CF [value]`: Update Confidence for item #N. Recalculate composite. Re-sort batch. Redisplay table and menu.
+- IF `#N E [value]`: Update Effort for item #N. Recalculate composite. Re-sort batch. Redisplay table and menu.
+- IF `D #N`: Remove item #N from the scoring batch. Redisplay table and menu.
+- IF A: Execute {advancedElicitationTask} for deeper scoring analysis, and when finished redisplay the menu.
+- IF P: Execute {partyModeWorkflow} for multi-perspective scoring discussion, and when finished redisplay the menu.
+- IF C: Finalize the scored batch. Load, read the entire file, and execute `{project-root}/_bmad/bme/_enhance/workflows/initiatives-backlog/steps-t/step-t-04-update.md`
+- IF any other input: Display "Unknown command. Use `#N R/I/CF/E [value]`, `D #N`, **A**, **P**, or **C** to continue." then redisplay menu.
+#### EXECUTION RULES:
+- ALWAYS halt and wait for user input after presenting the menu
+- After EVERY score adjustment, recalculate composite, re-sort, and redisplay the full table AND the menu
+- The user may make multiple adjustments before pressing C
+- ONLY proceed to step-t-04 when user selects 'C'
+- After A or P execution, return to this menu
+- Do NOT auto-continue — the user must explicitly approve the scores
+## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
+### ✅ SUCCESS: All findings scored with calibrated RICE components and rationale, composites calculated correctly, batch sorted by score, user validated at Gate 2, finalized scores passed to step-t-04
+### ❌ SYSTEM FAILURE: Scores proposed without rationale, composite formula wrong, scores outside valid ranges, user not given Gate 2 validation, findings written to backlog prematurely
+**Master Rule:** Skipping steps is FORBIDDEN.

package/_bmad/bme/_enhance/workflows/initiatives-backlog/steps-t/step-t-04-update.md ADDED Viewed

@@ -0,0 +1,155 @@
+---
+name: 'step-t-04-update'
+description: 'Validate backlog structure, append scored items safely, regenerate prioritized view, and present completion summary'
+outputFile: '{planning_artifacts}/initiatives-backlog.md'
+templateFile: '{project-root}/_bmad/bme/_enhance/workflows/initiatives-backlog/templates/backlog-format-spec.md'
+workflowFile: '{project-root}/_bmad/bme/_enhance/workflows/initiatives-backlog/workflow.md'
+---
+# Step 4: Backlog Update, Safety & Completion
+## STEP GOAL:
+Validate backlog structure, safely append scored items from Gate 2, regenerate the prioritized view, and present a completion summary before returning to the T/R/C menu.
+## MANDATORY EXECUTION RULES (READ FIRST):
+### Universal Rules:
+- 🛑 NEVER generate content without user input at validation mismatch prompt
+- 📖 CRITICAL: Read this complete step file before taking action
+- 🔄 CRITICAL: When returning to menu, read the entire workflow file
+- 📋 YOU ARE A BACKLOG OPERATIONS SPECIALIST performing safe, structured writes
+### Role Reinforcement:
+- ✅ You are a **backlog operations specialist** — precise, non-destructive, append-only
+- ✅ Preserve all existing content — never delete, overwrite, or reorder existing rows
+- ✅ The Prioritized View is the ONLY section regenerated from scratch
+- ✅ All output must be standard markdown — no HTML, no proprietary syntax
+### Step-Specific Rules:
+- 🎯 Focus on validation, safe writes, and completion reporting
+- 🚫 FORBIDDEN to delete or reorder existing backlog items (FR18, NFR1)
+- 🚫 FORBIDDEN to re-score items (scoring was finalized at Gate 2)
+- 🚫 FORBIDDEN to modify step-t-01, step-t-02, or step-t-03
+- 💬 Approach: validate first, write safely, summarize clearly
+## EXECUTION PROTOCOLS:
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 📖 Load `{templateFile}` (backlog-format-spec.md) for structural validation rules and table formats
+- 💾 Write to `{outputFile}` only after validation passes (or user overrides)
+## CONTEXT BOUNDARIES:
+- Available context: Scored findings from Gate 2, existing backlog file, backlog format spec template
+- Focus: Structural validation, safe append, prioritized view regeneration, completion summary
+- Limits: Do NOT re-score, re-extract, or re-classify items
+- Dependencies: step-t-03-score.md (scored findings from Gate 2)
+## MANDATORY SEQUENCE
+**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
+### 1. Pre-Write Validation
+Load `{outputFile}` (existing backlog) and validate structural integrity:
+1. **Section heading anchors** — All 7 required H2 sections exist in correct order:
+   - `## RICE Scoring Guide`
+   - `## Backlog`
+   - `## Exploration Candidates`
+   - `## Epic Groupings`
+   - `## Prioritized View (by RICE Score)`
+   - `## Completed`
+   - `## Change Log`
+2. **Prioritized view table** — Has exactly 6 columns (Rank, #, Initiative, Score, Track, Category)
+3. **Category tables** — Each table under `## Backlog` has exactly 10 columns (#, Initiative, Source, R, I, C, E, Score, Track, Status)
+4. **Change Log section** — The `## Change Log` H2 section exists with a table
+If ALL checks pass, proceed directly to step 3 (Append Items).
+### 2. Mismatch Handling
+If ANY validation check fails, present the specific mismatch(es):
+> **Pre-Write Validation — Structural Mismatch Detected**
+>
+> [List each failed check with details]
+>
+> **[Y] Yes, proceed anyway**
+> **[X] Abort and return to menu**
+**ALWAYS halt and wait for user input.**
+- IF Y: Continue to step 3 (Append Items)
+- IF X: Display "Aborting backlog update." then load, read the entire file, and execute `{workflowFile}` to return to mode selection
+- IF any other input: Display "Please select **Y** or **X**." then redisplay the prompt
+### 3. Append Items
+For each scored item from Gate 2:
+1. **Find target category** — Locate the H3 section under `## Backlog` matching the item's category
+2. **Create category if needed** — If category doesn't exist, add a new H3 heading with a 10-column table at the end of `## Backlog` (before `## Exploration Candidates`)
+3. **Generate item ID** — Use category prefix letter (D/U/T/I/A/P) + next number (increment from highest existing in that category)
+4. **Append row** — Add new row to end of category table. NEVER delete, overwrite, or reorder existing rows
+5. **Add provenance** — Include in the Initiative description: `Added from [source], [date]` where source is the input origin from step-t-01 and date is the current session date
+**Column format** (10 columns per backlog-format-spec.md):
+```
+| [ID] | **[Title]** — [description]. Added from [source], [date] | [source ref] | [R] | [I] | [C]% | [E] | [score] | [track] | Backlog |
+```
+**Important:** Triage Gate 2 adjustments are the initial score — no rescore provenance is generated.
+### 4. Regenerate Prioritized View
+Rebuild the `## Prioritized View (by RICE Score)` table from scratch:
+1. Collect ALL active items from all category tables (existing + newly appended)
+2. Exclude items with Status "Done" or items in the `## Completed` section
+3. Sort by composite RICE score descending
+4. Tiebreak: (1) Higher Confidence first, (2) Newer insertion order first
+5. Generate sequential rank numbers starting at 1
+Table format (6 columns):
+```
+| Rank | # | Initiative | Score | Track | Category |
+|------|---|-----------|-------|-------|----------|
+```
+### 5. Add Changelog Entry
+Prepend a new row to the `## Change Log` table (newest first):
+```
+| YYYY-MM-DD | Triage: Added [N] items ([categories affected]). [Any merge notes if applicable] |
+```
+### 6. Update Last Updated Date
+Set the metadata header `Last Updated` field to the current date (YYYY-MM-DD format).
+### 7. Completion Summary & Return to Menu
+After successful write, display:
+> **Triage Complete**
+>
+> **Items added:** [N]
+> **Items merged:** [N] (absorbed into existing items at Gate 1)
+> **Categories affected:** [list]
+>
+> **New Top 3 Positions:**
+> 1. [#ID] [title] — Score: [X.X]
+> 2. [#ID] [title] — Score: [X.X]
+> 3. [#ID] [title] — Score: [X.X]
+Then return to the T/R/C menu:
+> Loading `{workflowFile}` to return to mode selection...
+Load, read the entire file, and execute `{workflowFile}`.
+## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
+### ✅ SUCCESS: Pre-write validation performed, existing content preserved, items appended with correct IDs and provenance, prioritized view regenerated with all items sorted correctly, changelog updated, completion summary displayed with top 3, T/R/C menu re-presented
+### ❌ SYSTEM FAILURE: Existing backlog content deleted/overwritten/reordered, items written without validation, wrong IDs or missing provenance, prioritized view not regenerated, no completion summary, no return to menu
+**Master Rule:** Skipping steps is FORBIDDEN.

package/_bmad/bme/_enhance/workflows/initiatives-backlog/templates/backlog-format-spec.md ADDED Viewed

@@ -0,0 +1,219 @@
+# Backlog Format Specification
+Reference document for consistent backlog file formatting across all initiatives backlog operations. Loaded by the workflow during file write operations to ensure output matches the canonical format.
+All output must be standard markdown — no proprietary extensions, HTML embeds, or tool-specific syntax (NFR6).
+---
+## Metadata Header
+Every backlog file begins with:
+```markdown
+# Convoke Initiatives Backlog
+**Created:** YYYY-MM-DD
+**Method:** RICE (Reach, Impact, Confidence, Effort)
+**Last Updated:** YYYY-MM-DD
+```
+The `Last Updated` date is refreshed on every write operation.
+---
+## Section Hierarchy
+The backlog file uses this exact heading structure. Sections must appear in this order.
+```
+# Convoke Initiatives Backlog                    (H1 — title)
+## RICE Scoring Guide                            (H2 — inline scoring reference)
+## Backlog                                       (H2 — active items container)
+### [Category Name]                              (H3 — one per category, repeating)
+## Exploration Candidates                        (H2 — unscored items needing discovery)
+## Epic Groupings                                (H2 — bundled delivery suggestions)
+### Epic: "[Name]" ([item IDs])                  (H3 — one per grouping)
+## Prioritized View (by RICE Score)              (H2 — auto-generated ranked table)
+## Completed                                     (H2 — finished items, grouped by date)
+### YYYY-MM-DD                                   (H3 — date grouping for milestones)
+## Change Log                                    (H2 — operational history)
+```
+### Category Names
+Categories are user-defined H3 headings under `## Backlog`. The existing backlog uses:
+- Documentation & Onboarding
+- Update & Migration System
+- Testing & CI
+- Infrastructure
+- Agent Quality & Consistency
+- Platform & Product Vision
+New categories may be added. Category names must be unique within the backlog.
+---
+## Table Formats
+### Category Table (under each H3 category)
+```markdown
+| # | Initiative | Source | R | I | C | E | Score | Track | Status |
+|---|-----------|--------|---|---|---|---|-------|-------|--------|
+```
+**Column rules:**
+- `#`: Short alphanumeric ID (e.g., D2, P4, T3). Unique within the backlog.
+- `Initiative`: `**[Bold title]** — [description]`. May include markdown links.
+- `Source`: Origin of the initiative (e.g., "Vortex review (Liam, Wade)", "Product owner")
+- `R`: Reach score (integer 1-10)
+- `I`: Impact score (0.25, 0.5, 1, 2, or 3)
+- `C`: Confidence as percentage (e.g., 70%, 90%)
+- `E`: Effort score (integer 1-10)
+- `Score`: Composite RICE score, one decimal place (e.g., 2.8)
+- `Track`: "Keep the lights on" or "Move the needle"
+- `Status`: One of: Backlog, In Planning, In Progress, Done, Blocked
+### Prioritized View Table (under `## Prioritized View`)
+```markdown
+| Rank | # | Initiative | Score | Track | Category |
+|------|---|-----------|-------|-------|----------|
+```
+**Rules:**
+- Sorted by composite RICE score, descending
+- Tiebreak: Confidence (higher first), then insertion order (newer first)
+- Only includes active items (not Done or in Completed section)
+- Regenerated from scratch on every write operation — not manually maintained
+- Rank is a sequential integer starting at 1
+### Exploration Candidates Table (under `## Exploration Candidates`)
+```markdown
+| # | Initiative | Source | Next Step |
+|---|-----------|--------|-----------|
+```
+These items are unscored and not included in the prioritized view.
+### Completed Section Tables (under `## Completed`)
+Grouped by date using H3 headers:
+```markdown
+### YYYY-MM-DD
+| # | Initiative | Score | Category |
+|---|-----------|-------|----------|
+```
+**Note:** Legacy completed entries (pre-backlog era) may use non-standard table formats (e.g., `| Item | Fix Applied |`). These should be preserved as-is during write operations — do not attempt to reformat them.
+---
+## Change Log Format
+The Change Log section uses a table:
+```markdown
+## Change Log
+| Date | Change |
+|------|--------|
+| YYYY-MM-DD | [Description of what changed] |
+```
+Entries are prepended (newest first). Each workflow session adds one entry summarizing items added, removed, rescored, or moved.
+---
+## RICE Composite Formula
+**Formula:** Score = (Reach x Impact x Confidence) / Effort
+Where Confidence is expressed as a decimal (e.g., 70% = 0.7).
+**Example:** R:8, I:3, C:70%, E:6 = (8 x 3 x 0.7) / 6 = 2.8
+**Sort order:** Descending by composite score. Ties broken by:
+1. Confidence — higher first
+2. Insertion order — newer first
+---
+## Insertion Rules
+### Adding New Items (Triage mode, Create mode)
+1. Identify the target category H3 section under `## Backlog`
+2. Append the new row to the end of that category's table
+3. If the category doesn't exist, create a new H3 heading at the end of the `## Backlog` section (before `## Exploration Candidates`)
+4. Regenerate the `## Prioritized View` table with all active items sorted by composite score
+5. Add a Change Log entry
+### Moving Items to Completed
+1. Remove the item row from its category table
+2. Add it to the appropriate `### YYYY-MM-DD` group under `## Completed`
+3. If no group exists for today's date, create one
+4. Regenerate the `## Prioritized View` table
+5. Add a Change Log entry
+---
+## Provenance Tags
+Provenance is recorded in the Initiative cell description or as a separate annotation.
+### Triage Mode — New Items
+Format: `"Added from [source], [date]"`
+Example: `Added from party-mode review transcript, 2026-03-15`
+The score recorded is the **final** score after any Gate 2 user adjustments. The agent's initial proposal is not recorded separately. Triage Gate 2 adjustments are NOT rescores — they are the initial score. No rescore provenance is generated.
+### Review Mode — Rescored Items
+Format: `"Rescored [old]->[new], Review, [date]"`
+Example: `Rescored 1.8->2.4, Review, 2026-03-15`
+Only recorded when the composite score actually changes. Confirming an existing score or skipping an item generates no provenance entry.
+### Create Mode — New Items
+Format: `"Added from Create mode, [date]"`
+---
+## Pre-Write Validation
+Before writing to the backlog file, the workflow must validate:
+1. **Section heading anchors** — All required H2 sections exist in the correct order
+2. **Prioritized view table column count** — Table has exactly 6 columns
+3. **Category table column count** — Each category table has exactly 10 columns
+4. **Change Log section existence** — The Change Log H2 section exists
+5. **No data loss** — Existing category section content is preserved (no deletions, overwrites, or reordering of existing rows). The Prioritized View is excluded from this check since it is regenerated.
+If validation detects a structural mismatch, the user can proceed or abort.
+---
+## Format Consistency
+The backlog output must match the exact current format of `initiatives-backlog.md`. When in doubt, load the existing file and match its patterns precisely. This ensures:
+- Round-trip parseability (the workflow can reload its own output)
+- Manual editability (users can edit the file in any text editor between sessions)
+- `git diff` readability (consistent formatting minimizes noise)

package/_bmad/bme/_enhance/workflows/initiatives-backlog/templates/rice-scoring-guide.md ADDED Viewed

@@ -0,0 +1,154 @@
+# RICE Scoring Guide
+Reference document for consistent RICE scoring across all initiatives backlog operations. Loaded by the workflow during Triage (Gate 2 scoring), Review (rescoring), and Create (initial scoring) modes.
+---
+## RICE Factor Definitions
+| Factor | Scale | Description |
+|--------|-------|-------------|
+| **Reach** | 1-10 | How many users/quarter will this affect? (10 = all users, 1 = edge case) |
+| **Impact** | 0.25 - 3 | Per-user impact (3 = massive, 2 = high, 1 = medium, 0.5 = low, 0.25 = minimal) |
+| **Confidence** | 20-100% | How sure are we about reach and impact estimates? |
+| **Effort** | 1-10 | Relative effort in story points (1 = trivial, 10 = multi-epic) |
+| **Score** | calculated | (Reach x Impact x Confidence) / Effort |
+---
+## Guided Scoring Questions
+Use these questions to guide scoring for each factor. The goal is genuine strategic reflection, not mechanical calculation.
+### Reach (1-10)
+"How many users per quarter will this affect?"
+| Score | Meaning |
+|-------|---------|
+| 10 | All users — every project that installs Convoke encounters this |
+| 7-9 | Most users — affects a common workflow or visible surface |
+| 4-6 | Some users — affects a specific use case or user segment |
+| 2-3 | Few users — niche scenario or advanced feature |
+| 1 | Edge case — rare configuration or exceptional circumstance |
+### Impact (0.25-3)
+"What's the per-user impact when they encounter this?"
+| Score | Meaning | Signal |
+|-------|---------|--------|
+| 3 | Massive | Unblocks a capability that didn't exist before; users would pay for this |
+| 2 | High | Significant improvement to an existing workflow; saves meaningful time |
+| 1 | Medium | Noticeable improvement; users appreciate it but can work around it |
+| 0.5 | Low | Minor quality-of-life improvement; polish |
+| 0.25 | Minimal | Cosmetic or hygienic; almost invisible to users |
+### Confidence (20-100%)
+"How confident are we in the Reach and Impact estimates?"
+| Score | Meaning | Basis |
+|-------|---------|-------|
+| 100% | Measured data | Direct observation, usage metrics, user reports |
+| 80% | Strong evidence | Multiple corroborating signals, team consensus |
+| 60% | Reasonable estimate | Single data point or strong analogy to similar work |
+| 50% | Educated guess | Logical reasoning without direct evidence |
+| 40% | Informed speculation | Based on domain knowledge, no project-specific data |
+| 20% | Pure speculation | Gut feeling, novel territory, no precedent |
+### Effort (1-10)
+"Relative effort in story points?"
+| Score | Meaning |
+|-------|---------|
+| 1 | Trivial — single file change, under 30 minutes |
+| 2-3 | Small — a few files, a focused session |
+| 4-5 | Medium — multi-file, requires design thought, 1-2 stories |
+| 6-7 | Large — multi-story, cross-cutting concerns |
+| 8-9 | Very large — full epic, significant architecture work |
+| 10 | Multi-epic — major initiative spanning multiple sprints |
+---
+## Composite Formula & Sort Order
+**Formula:** Score = (Reach x Impact x Confidence) / Effort
+**Sort order:** Descending by composite score.
+**Tiebreak rules:**
+1. Higher Confidence first (more certain items surface above speculative ones)
+2. Newer insertion order first (recently added items break remaining ties)
+---
+## Calibration Examples
+These examples are drawn from the existing Convoke backlog to anchor scoring consistency. Study the reasoning, not just the numbers — the goal is to understand *why* items scored as they did.
+### Low Tier (~0.2-0.5)
+**A4: "Fix temp dir prefix inconsistency"** — R:1 I:0.25 C:100% E:1 = **0.3**
+- Reach 1: Only affects internal tooling, no user visibility
+- Impact 0.25: Cosmetic inconsistency with zero functional effect
+- Confidence 100%: Known, observable, deterministic
+- Effort 1: Single string change
+- *Lesson: High confidence and low effort don't rescue low reach and minimal impact*
+**A2: "Create .agent.yaml source files"** — R:2 I:0.5 C:60% E:4 = **0.2**
+- Reach 2: Only affects module authors using the BMAD authoring pipeline
+- Impact 0.5: Enables standard tooling but workarounds exist
+- Confidence 60%: Unclear how many authors will use the pipeline
+- Effort 4: Multiple files across multiple agents
+- *Lesson: Moderate effort with uncertain reach pushes score very low*
+### Medium Tier (~1.0-2.0)
+**U4: "Test upgrade-path step file cleanup"** — R:3 I:1 C:90% E:2 = **1.4**
+- Reach 3: Only users upgrading from specific older versions
+- Impact 1: Prevents a confusing stale-file scenario
+- Confidence 90%: Known issue from observed upgrade path
+- Effort 2: Focused integration test
+- *Lesson: High confidence on a real (but narrow) problem scores solidly mid-range*
+**I1: "NPM_TOKEN secret for CI publish"** — R:8 I:2 C:90% E:8 = **1.8**
+- Reach 8: Every release depends on this automation
+- Impact 2: Eliminates manual publish step, significant time savings
+- Confidence 90%: Well-understood CI pattern
+- Effort 8: Full CI pipeline setup, secrets management, testing
+- *Lesson: High reach and impact can be offset by high effort — the formula balances ambition against cost*
+### High Tier (~2.5+)
+**P4: "Enhance module"** — R:8 I:3 C:70% E:6 = **2.8**
+- Reach 8: New capability for every BMAD user with Convoke
+- Impact 3: Creates an entirely new value layer (multiplicative, not additive)
+- Confidence 70%: Architecture validated but user adoption uncertain
+- Effort 6: Multi-epic initiative with installer integration
+- *Lesson: Massive impact with broad reach justifies investment even at moderate confidence*
+**S4: "Skills migration & module compliance"** — R:10 I:2 C:90% E:5 = **3.6**
+- Reach 10: Affects every user — skills activation was broken
+- Impact 2: Restores core functionality and modernizes format
+- Confidence 90%: Known breakage with clear fix path
+- Effort 5: Multi-file migration with schema changes
+- *Lesson: Universal reach with a clear fix and high confidence produces the highest scores*
+---
+## Score Distribution Health Check
+A healthy backlog has differentiated scores. If more than 3 items share the same composite score in the top 10 of the prioritized view, refine the distinguishing RICE components — typically Confidence or Impact have the most room for differentiation.
+This is a quality signal, not a hard rule. Identical scores indicate either genuine parity (acceptable if rare) or insufficient scoring granularity (fix by re-examining the items with fresh eyes).
+---
+## Scoring Consistency Notes
+- Scores in this backlog range from approximately 0.2 to 10.0. New scores should land within this range.
+- Composite scores are rounded to one decimal place for display (e.g., 1.35 rounds to 1.4). This matches existing backlog convention and keeps the prioritized view scannable.
+- When scoring a new item, mentally compare it to 2-3 existing items at similar scale. If your proposed score would rank it significantly above or below where it "feels" relative to those items, revisit the component scores.
+- The Confidence factor is the most commonly under-scrutinized. Default to 50% (educated guess) when no direct evidence exists, not 80%.