npm - @kudusov.takhir/ba-toolkit - Versions diffs - 3.6.0 → 3.7.0 - Mend

@kudusov.takhir/ba-toolkit 3.6.0 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md +57 -1
package/package.json +1 -1
package/skills/analyze/SKILL.md +57 -17
package/skills/clarify/SKILL.md +31 -11
package/skills/estimate/SKILL.md +41 -9
package/skills/glossary/SKILL.md +24 -7
package/skills/references/templates/analyze-template.md +23 -15
package/skills/references/templates/risk-template.md +58 -157
package/skills/references/templates/sprint-template.md +71 -104
package/skills/references/templates/trace-template.md +83 -19
package/skills/risk/SKILL.md +43 -9
package/skills/sprint/SKILL.md +36 -89
package/skills/trace/SKILL.md +19 -36

package/CHANGELOG.md CHANGED Viewed

@@ -11,6 +11,61 @@ Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ---
+## [3.7.0] — 2026-04-10
+### Highlights
+- **Group B skill audit pass on `/trace`, `/analyze`, `/clarify`, `/estimate`, `/glossary`, `/risk`, `/sprint`** — 7 cross-cutting utilities brought to senior-BA rigour. ~33 Critical + High findings applied. Highlights: full ISO 31000 + PMBOK 7 alignment for `/risk` (Velocity axis, treatment-strategy classification), Cone of Uncertainty + confidence bands for `/estimate`, IEEE 830 §4.3 quality attributes mapped to `/analyze` finding categories, definition-quality discipline (Aristotelian form) for `/glossary`, structured table output for `/clarify`, bidirectional traceability + `/implement-plan` integration for `/trace`, focus-factor + ceremonies-aware net velocity for `/sprint`. Two utility templates (`risk-template.md`, `sprint-template.md`) rewritten from concrete Nova Analytics worked examples back to placeholder convention to match the other 19 templates.
+### Changed
+- **`skills/trace/SKILL.md` and `skills/references/templates/trace-template.md`** — bidirectional traceability + `/implement-plan` integration:
+  - Inline template removed; standalone is the single source of truth (drift fix — inline had 8 columns, standalone had 11; mismatched column sets).
+  - **Bidirectional traceability is now the default.** Forward (FR → downstream) catches "what does this requirement become?"; reverse (US/UC/AC/NFR/Entity/API/WF/SC/Task → FR) catches "why does this artifact exist?". Senior BA needs both directions to validate provenance.
+  - **`/implement-plan` integration.** The matrix now includes a `Task` column populated from `12_implplan_*.md` (introduced in v3.4.0). The chain is now `FR → US → UC → AC → NFR → Entity → ADR → API → WF → SC → Task`.
+  - **Calibration interview added** (was generation-only): direction (forward / reverse / bidirectional, default bidirectional), scope, severity-thresholds source.
+  - **Coverage Gaps now grouped by severity** (🔴 Critical / 🟠 High / 🟡 Medium / 🟢 Low) read from `00_principles_*.md` §3 instead of just listing orphans flat.
+- **`skills/analyze/SKILL.md` and `skills/references/templates/analyze-template.md`** — IEEE 830 alignment + Owner column + delta mode:
+  - **Finding categories expanded from 5 to 8** and explicitly mapped to **IEEE 830 §4.3 SRS quality attributes**: Duplication (consistency), Ambiguity (unambiguous, including modal-verb confusion), Coverage Gap (complete + traceable), Terminology Drift (consistent), Invalid Reference (traceable), **Inconsistency** (same fact stated differently across artifacts), **Underspecified Source** (post-v3.5.0 provenance discipline — FR/UC/AC/Entity/Endpoint missing the `Source` field), **Stakeholder Validation Gap** (post-v3.5.0 assumption discipline — assumptions without Owner or past Validate by date).
+  - **Owner column added** to every finding. Senior BAs always assign accountability — a finding with no owner doesn't get fixed.
+  - **Calibration interview added**: severity threshold source, categories to include, delta mode (only new findings since the last `/analyze` run).
+- **`skills/clarify/SKILL.md`** — extended ambiguity catalogue + structured table output + ripple-effect handling:
+  - **Ambiguity categories expanded from 6 to 11.** Added: Quantification gaps (numbers without units, frequencies without windows), Time-zone gaps (timestamps without TZ), Currency gaps (amounts without currency), Scale gaps ("users" without specifying CCU/registered/MAU/DAU), Modal verb confusion (must / shall / should / may inconsistency per IEEE 830).
+  - **Output format changed from numbered list to structured table** with columns `# | Location | Category | Question | Answer`. The user can answer in-place by filling the rightmost column or defer with `(deferred)`.
+  - **Ripple-effect check added.** Before saving, the skill identifies any answer that affects other artifacts (e.g., a new role propagates from `/srs` to `/stories` personas, `/usecases` actors, `/scenarios` personas) and offers to update each affected file via `/revise`. Was previously a single sentence buried in step 2.
+- **`skills/estimate/SKILL.md`** — Cone of Uncertainty + confidence bands + planning poker honesty:
+  - **Cone of Uncertainty discipline.** Calibration question 5 asks the estimation phase (Discovery / Brief / SRS / AC / Wireframes / Implementation Plan) and emits the confidence band that matches the standard variance for that phase: ±400% / ±100% / ±50% / ±25% / ±10%. Every estimate now carries a confidence label (Order of magnitude / ROM / Budget / Definitive / Control). Single-point estimates without a confidence band are misread as commitments — the Cone discipline prevents this.
+  - **Risk multiplier and tech-debt allocation added** as calibration questions. Stories linked to 🔴 Critical risks default to +20%; tech-debt reservation defaults to 20% of velocity.
+  - **Planning poker honesty.** New section "Estimation discipline — what this skill is and is not" makes explicit that this is an analytical single-estimator pass, not a team commitment, and recommends running planning poker with the dev team using these numbers as the starting anchor. Senior BAs never confuse an analytical estimate with a team commit.
+  - **Estimation Summary table** now carries the confidence band per story, the risk-adjusted total, the tech-debt reservation, and the net feature capacity — not just a flat SP total that hides the assumptions.
+- **`skills/glossary/SKILL.md`** — Aristotelian definition discipline + ISO 1087-1 alignment + extended sources:
+  - **Standard alignment with ISO 1087-1** (Terminology Work — Vocabulary). Definitions follow the **Aristotelian form**: genus + differentia + purpose. The skill rejects circular definitions ("a User is a user of the system"), self-referential definitions, definitions that name features without identifying the genus, and definitions that name the genus without distinguishing from siblings.
+  - **New Step 2b — Definition quality check.** Walks every term in the glossary, classifies definitions as Circular / Self-referential / Missing genus / Missing differentia, and lists weak definitions in a sub-section of the drift report with a recommended rewrite. Senior BAs catch this on every glossary review.
+  - **Sources expanded** to include `00_discovery_*.md` (concept terms), `00_principles_*.md` (convention names), and `12_implplan_*.md` (phase names, technology choices). 12 source files → 14.
+  - **Drift report now requires a justification** for the canonical name choice — not just "Customer" but "Customer because it appears earliest, most frequently, and matches the domain reference". Without justification, drift recommendations are arbitrary.
+- **`skills/risk/SKILL.md` and `skills/references/templates/risk-template.md`** — full ISO 31000 + PMBOK 7 alignment, Velocity axis, Treatment Strategy classification, calibration interview, document control, placeholder template:
+  - **Standard alignment with ISO 31000** (Risk Management — Guidelines) and **PMI PMBOK 7**. Each risk now carries the canonical risk-management fields, not just probability × impact.
+  - **Velocity axis added** (Years / Months / Weeks / Days / Immediate). Velocity records how fast the risk materialises once triggered — determines reaction time and whether mitigation has to be in place before the trigger or whether reactive response is enough. P × I alone is insufficient for a senior risk register.
+  - **Treatment strategy field added** with the four canonical strategies from PMBOK 7 / ISO 31000: **Avoid** (eliminate the cause), **Reduce / Mitigate** (lower probability or impact), **Transfer** (insurance / vendor / contract), **Accept** (acknowledge and budget). Previous version only documented mitigation + contingency without classifying the strategy.
+  - **Review cadence per risk** (Monthly / Quarterly / Ad hoc). Without a cadence, risks become "set and forget" — the canonical failure mode.
+  - **Calibration interview added** (was generation-only): risk tolerance (Low / Medium / High), domain-specific frameworks (FAIR for cyber, ISO 14971 for medical, COSO ERM for financial, NIST RMF for US federal), review cadence, treatment-strategy preference.
+  - **`risk-template.md` rewritten from a concrete Nova Analytics worked example to placeholder convention** with `[PROJECT_NAME]`, `[Owner]`, `[Velocity]`, etc. Aligns with the other 19 templates that use placeholder syntax. The Nova Analytics example was misread as the canonical content by agents that didn't realise it was a sample.
+  - Document-control metadata (`Version`, `Status`) added to the template.
+- **`skills/sprint/SKILL.md` and `skills/references/templates/sprint-template.md`** — focus factor + ceremonies-aware net velocity, persona column, dependency awareness, placeholder template:
+  - **Net velocity replaces theoretical velocity as the assignment basis.** The skill now distinguishes theoretical velocity (100% capacity) from net velocity, applying focus factor (default 65%), buffer / slack (default 15%), and ceremonies cost (default 1 day per developer per 2-week sprint). Formula: `Net = Theoretical × Focus × (1 − Buffer) − Ceremonies`. Senior scrum masters never schedule against the theoretical number — that's how teams overcommit and miss sprints. Was assigning against raw theoretical velocity.
+  - **Calibration interview extended from 6 to 10 questions**: focus factor, buffer / slack, ceremonies overhead, holidays / PTO added.
+  - **Persona column added** to every sprint detail table (per the v3.5.0+ stories template). Sprint plans now reflect which personas each sprint serves so the team builds empathy.
+  - **Business Value Score and Depends on fields read from the v3.5.0+ stories template** explicitly in the priority-ordering algorithm. Was relying on implicit prerequisites only.
+  - **Context loading extended** to read `00_principles_*.md` for team capacity defaults and `00_discovery_*.md` for early MVP signals.
+  - **`sprint-template.md` rewritten from a concrete Nova Analytics worked example to placeholder convention** with `[PROJECT_NAME]`, `[Persona]`, etc. The new template carries the net-velocity header, the persona column, the explicit capacity model section, and explicit assumption documentation. Aligns with the other 19 templates.
+  - Document-control metadata (`Version`, `Status`) added to the template.
+### Cross-pattern impact
+After the pilot, Group A, and Group B, **15 of the 24 shipped skills** carry the senior-BA improvements: standards conformance to canonical frameworks (BABOK v3, IEEE 830, ISO 25010, ISO 31000, ISO 1087-1, OpenAPI 3.x, Cockburn use cases, INVEST, MoSCoW, PMBOK 7, Cone of Uncertainty, IEEE 830 §4.3 quality attributes), explicit "why" / provenance fields, cross-artifact forward and reverse traceability, document control, single-source-of-truth templates without inline drift, and BA-grade required-topics coverage. Group C (bookend skills: `/discovery`, `/principles`, `/research`, `/handoff`, `/implement-plan`, `/export`, `/publish`) remains at current rigour and may receive a lighter sweep in a future session.
+---
 ## [3.6.0] — 2026-04-10
 ### Highlights
@@ -595,7 +650,8 @@ CI scripts that relied on the old behaviour (`init` creates files only, `install
 ---
-[Unreleased]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.6.0...HEAD
+[Unreleased]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.7.0...HEAD
+[3.7.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.6.0...v3.7.0
 [3.6.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.5.0...v3.6.0
 [3.5.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.4.1...v3.5.0
 [3.4.1]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.4.0...v3.4.1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kudusov.takhir/ba-toolkit",
-  "version": "3.6.0",
+  "version": "3.7.0",
   "description": "AI-powered Business Analyst pipeline — 24 skills from concept discovery to a sequenced implementation plan an AI coding agent can execute, with one-command Notion + Confluence publish. Works with Claude Code, Codex CLI, Gemini CLI, Cursor, and Windsurf.",
   "keywords": [
     "business-analyst",

package/skills/analyze/SKILL.md CHANGED Viewed

@@ -19,19 +19,34 @@ Cross-cutting command. Performs a read-only analysis across all existing pipelin
 Read `references/environment.md` from the `ba-toolkit` directory to determine the output directory. If unavailable, apply the default rule.
+## Calibration interview
+> **Follow the [Interview Protocol](../references/interview-protocol.md):** ask one question at a time, present a 2-column `| ID | Variant |` markdown table of up to 4 options plus a free-text "Other" row last (5 rows max), mark exactly one row **Recommended**, render variants in the user's language (rule 11), and wait for an answer.
+>
+> **Inline context (protocol rule 9):** if the user wrote text after `/analyze` (e.g., `/analyze focus on security and performance`), parse it as a category-focus hint and run only those categories.
+If the user invokes `/analyze` with no inline hint, run a short calibration interview (skip any question already answered by `00_principles_*.md`):
+1. **Severity threshold** — which severity blocks `/done`? Read from `00_principles_*.md` §6 if present, otherwise ask. **Recommended:** CRITICAL only.
+2. **Categories to include** — all 8 canonical categories, or focus on a subset (e.g., only Coverage Gap and Invalid Reference for a fast pre-handoff sanity check)?
+3. **Delta mode** — produce a full report, or a delta against the prior `00_analyze_*.md` (only new findings since the last run)?
 ## Analysis categories
-### 1. Duplication (DUP)
+The 8 canonical categories cover the IEEE 830 §4.3 SRS quality attributes (correct, unambiguous, complete, consistent, verifiable, modifiable, traceable, ranked) plus modern BA concerns (terminology drift, source provenance, validation gaps).
+### 1. Duplication (DUP) — *IEEE 830: consistency*
 - Functionally equivalent or near-duplicate FRs within SRS.
 - User stories that describe the same action for the same role.
 - Repeated business rules across multiple artifacts.
-### 2. Ambiguity (AMB)
+### 2. Ambiguity (AMB) — *IEEE 830: unambiguous*
 - Requirements containing metrics-free adjectives: "fast", "reliable", "scalable", "secure", "simple", "efficient" without a measurable criterion.
 - Underspecified scope: "the system" or "the user" without a concrete actor.
 - Conditional requirements without defined conditions.
+- Modal verb confusion: inconsistent use of "must / shall / should / may" — IEEE 830 expects strict modal discipline.
-### 3. Coverage Gap (GAP)
+### 3. Coverage Gap (GAP) — *IEEE 830: complete + traceable*
 - FR without at least one linked US.
 - US without linked UC (if Use Cases artifact exists).
 - US without AC (if AC artifact exists).
@@ -39,20 +54,42 @@ Read `references/environment.md` from the `ba-toolkit` directory to determine th
 - Data entities not linked to any FR or US.
 - API endpoints not linked to any FR or US.
 - Wireframes not linked to any US.
+- Tasks (from `12_implplan_*.md`) not linked to any FR/US/AC.
-### 4. Terminology Drift (TERM)
+### 4. Terminology Drift (TERM) — *IEEE 830: consistent*
 - The same concept referred to by different names across artifacts (e.g., "Wallet" in SRS vs "Balance" in Stories vs "Account" in Data Dictionary).
 - Abbreviations expanded differently in different artifacts.
-### 5. Invalid Reference (REF)
+### 5. Invalid Reference (REF) — *IEEE 830: traceable*
 - Links to IDs that do not exist (FR-NNN, US-NNN, UC-NNN, etc.).
 - References to roles not defined in the SRS roles section.
 - API endpoint links in wireframes pointing to non-existent endpoints.
+- Brief Goal references (G-N) in the SRS §7 traceability table that do not exist in `01_brief_*.md` §2.
+### 6. Inconsistency (INC) — *IEEE 830: consistent*
+- Same fact stated differently across artifacts (e.g., `01_brief` says €1.5M GMV target, `02_srs` §1.2 says €2M).
+- A constraint in `01_brief` §6 contradicted by an FR in `02_srs` §3.
+- Priority disagreement: same item Must in one artifact and Should in another.
+### 7. Underspecified Source (SRC) — *post-v3.5.0 provenance discipline*
+- FR without `Source` field (introduced in v3.5.0).
+- Use Case without `Source` field.
+- AC without `Source` field.
+- Entity without `Source` or `Owner` field.
+- Endpoint without `Source` field.
+- Stakeholder name referenced as a source but missing from `01_brief` §5 stakeholders table.
+### 8. Stakeholder Validation Gap (VAL) — *post-v3.5.0 assumption discipline*
+- Assumption in `01_brief` §7 with no `Owner` or `Validate by` date.
+- Assumption past its `Validate by` date and still not converted to a fact or risk.
+- Brief goal in §2 with no measurable success metric.
 ## Generation
 **File:** `00_analyze_{slug}.md`
+The full report layout lives at `references/templates/analyze-template.md` and is the single source of truth. Each finding carries: ID, Severity, Category (one of the 8 above), Location (artifact + element ID), Description, Recommendation, and **Owner** (which role is accountable for fixing the finding — assigned by `00_principles_*.md` §4 Definition of Ready ownership where applicable, otherwise by domain default).
 ```markdown
 # Cross-Artifact Quality Analysis: {Name}
@@ -61,13 +98,15 @@ Read `references/environment.md` from the `ba-toolkit` directory to determine th
 ## Finding Report
-| ID | Category | Severity | Location | Summary | Recommendation |
-|----|----------|----------|----------|---------|----------------|
-| A1 | Duplication | HIGH | srs:FR-003, srs:FR-017 | Both describe user authentication | Merge into a single FR |
-| A2 | Ambiguity | MEDIUM | srs:NFR-002 | "Low latency" has no metric | Add target value in ms |
-| A3 | Coverage Gap | CRITICAL | srs:FR-008 | No linked user story | Create US or remove FR |
-| A4 | Terminology | HIGH | srs, stories | "Wallet" vs "Balance" used interchangeably | Standardize in glossary |
-| A5 | Invalid Ref | CRITICAL | ac:AC-003-01 | Links US-099 which does not exist | Fix reference |
+| ID | Category | Severity | Location | Summary | Recommendation | Owner |
+|----|----------|----------|----------|---------|----------------|-------|
+| A1 | Duplication | HIGH | srs:FR-003, srs:FR-017 | Both describe user authentication | Merge into a single FR | BA |
+| A2 | Ambiguity | MEDIUM | srs:NFR-002 | "Low latency" has no metric | Add target value in ms | BA |
+| A3 | Coverage Gap | CRITICAL | srs:FR-008 | No linked user story | Create US or remove FR | BA |
+| A4 | Terminology | HIGH | srs, stories | "Wallet" vs "Balance" used interchangeably | Standardize in glossary | BA |
+| A5 | Invalid Ref | CRITICAL | ac:AC-003-01 | Links US-099 which does not exist | Fix reference | BA |
+| A6 | Underspecified Source | HIGH | srs:FR-012 | No `Source` field per v3.5.0+ template | Add Source field | BA |
+| A7 | Stakeholder Validation Gap | HIGH | brief:Assumption-3 | No Owner; past Validate by date | Convert to risk or validate | PM |
 ## Coverage Summary
@@ -89,17 +128,18 @@ Top 5 highest-severity items to address first, with recommended commands:
 1. {Finding ID} — {summary} → `/clarify FR-NNN` or `/revise [section]`
 ```
-**Severity scale:**
-- **CRITICAL** — blocks pipeline advancement or handoff (missing mandatory link, non-existent ID).
-- **HIGH** — significant quality risk (duplication, key term drift, missing metric for Must-priority item).
-- **MEDIUM** — quality concern, does not block (missing metric for Should-priority, minor overlap).
+**Severity scale** *(read from `00_principles_*.md` §6 if present, otherwise defaults below):*
+- **CRITICAL** — blocks pipeline advancement or handoff (missing mandatory link, non-existent ID, IEEE 830 traceability violation).
+- **HIGH** — significant quality risk (duplication, key term drift, missing metric for Must-priority item, Underspecified Source on a Must-priority artifact).
+- **MEDIUM** — quality concern, does not block (missing metric for Should-priority, minor overlap, IEEE 830 modifiability concern).
 - **LOW** — style or completeness suggestion.
 **Rules:**
 - Finding IDs are sequential: A1, A2, ...
 - Each finding references the exact artifact file and element ID.
+- Each finding has an **Owner** assigned (BA / PM / Tech Lead / Designer / Stakeholder name).
 - Coverage rows are generated only for artifact pairs where both files exist.
-- The report is read-only — no artifacts are modified.
+- The report is read-only — no artifacts are modified by `/analyze`. Use `/clarify` or `/revise` to act on findings.
 ## Iterative refinement

package/skills/clarify/SKILL.md CHANGED Viewed

@@ -35,7 +35,7 @@ Read `references/environment.md` from the `ba-toolkit` directory to determine th
 ## Analysis pass
-Scan the target artifact (or the specified section) for the following ambiguity categories:
+Scan the target artifact (or the specified section) for the following 11 ambiguity categories. Categories A–F are the original set; G–K are post-v3.5.0 additions that catch the gaps a senior BA would notice on a regulated or international project.
 ### A. Metrics-free adjectives
 Requirements containing words like "fast", "reliable", "scalable", "secure", "user-friendly", "simple", "efficient", "high performance", "low latency" without a measurable criterion.
@@ -47,7 +47,7 @@ Terms, abbreviations, or roles used in the artifact but not defined in the gloss
 Business rules or constraints that contradict each other or contradict rules in prior artifacts.
 ### D. Missing mandatory fields
-Elements that lack required fields per the artifact's own template (e.g., a FR without Priority, a US without Linked FR, an AC without a Then clause).
+Elements that lack required fields per the artifact's own template (e.g., a FR without Priority, a US without Linked FR, an AC without a Then clause, an FR without `Source` per the v3.5.0+ template).
 ### E. Ambiguous actors or scope
 Actions assigned to "the system", "the user", or "admin" where the specific role or external system is unclear.
@@ -55,27 +55,47 @@ Actions assigned to "the system", "the user", or "admin" where the specific role
 ### F. Duplicate or overlapping requirements
 Functionally equivalent or near-duplicate elements within the artifact or across artifacts.
+### G. Quantification gaps
+Numbers without units, frequency claims without a window ("every 5 minutes" vs "5 minutes p95"), thresholds without a measurement method, percentages without a base.
+### H. Time-zone gaps
+Timestamps without a stated time zone, "midnight" without specifying which time zone's midnight, business-hour rules without a region.
+### I. Currency gaps
+Monetary amounts without a currency code, multi-region pricing without a per-region table, conversion rules without a rate source.
+### J. Scale gaps
+"Users" without specifying which population (concurrent / registered / monthly active / daily active), "data" without volume, "requests" without rate.
+### K. Modal verb confusion
+Inconsistent use of must / shall / should / may. IEEE 830 expects strict modal discipline: "shall" = mandatory contractual obligation, "should" = recommended, "may" = permitted, "must" = required by external constraint. Mixed modals across the same artifact undermine acceptance.
 ## Output to the user
-Present findings as a numbered list of targeted questions. Each item references the exact location (section, element ID, line summary):
+Present findings as a structured table the user can answer in-place rather than as a numbered list of free-text questions. Each row references the exact location (section + element ID), the category, the question, and an empty answer cell:
 ```
-Ambiguities found in {artifact_file}:
-1. [FR-003] "The system must respond quickly" — what is the acceptable response time in milliseconds under normal load?
-2. [US-007] Role "Manager" is used here but not defined in the SRS Roles section — is this equivalent to "Admin", or a separate role?
-3. [NFR-002 vs FR-015] NFR-002 requires TLS 1.3 only, but FR-015 mentions "standard encryption" — should FR-015 explicitly reference NFR-002?
-4. [AC-005-02] The "Then" clause states "the system handles the error correctly" — what is the specific expected behavior (message shown, state reset, redirect)?
+Ambiguities found in {artifact_file} ({N} questions):
+| # | Location | Category | Question | Answer |
+|---|----------|----------|----------|--------|
+| 1 | FR-003 | A — Metrics-free | "The system must respond quickly" — what is the acceptable response time in ms under normal load? | _(awaiting answer)_ |
+| 2 | US-007 | B — Undefined term | Role "Manager" used here but not defined in `02_srs_*.md` Roles. Equivalent to "Admin"? Separate role? | _(awaiting answer)_ |
+| 3 | FR-015 vs NFR-002 | C — Conflicting | NFR-002 requires TLS 1.3 only; FR-015 mentions "standard encryption". Reference NFR-002 explicitly? | _(awaiting answer)_ |
+| 4 | AC-005-02 | A — Metrics-free | "Then" clause says "system handles the error correctly" — specific expected behaviour? | _(awaiting answer)_ |
+| 5 | FR-022 | I — Currency gap | "User charged a fee" — currency? per region? | _(awaiting answer)_ |
 ```
+The table format lets the user fill in answers in the rightmost column and copy-paste the table back. It also lets the user defer specific questions explicitly by writing `(deferred)` instead of an answer.
 If no ambiguities are found, state that clearly and suggest `/analyze` for a broader cross-artifact check.
 ## Resolution
-After presenting the list, wait for the user to answer. Accept answers inline (user can reply to all questions in one message or answer one by one). After all answers are received:
+After presenting the table, wait for the user to answer. Accept answers inline (the user can reply with the table filled in, or answer free-form by row number). After all answers are received:
 1. Apply the answers to update the artifact (rewrite affected elements).
-2. If an answer affects a prior artifact (e.g., a role definition belongs in the SRS), flag it and offer to update that artifact as well.
+2. **Ripple-effect check.** Before saving, identify any answer that affects a prior or downstream artifact (a role definition affects `/srs` Roles section AND `/stories` personas AND `/usecases` actors AND `/scenarios` personas; a new business rule affects `/srs` business rules AND `/ac` Given/When/Then conditions AND `/datadict` constraints AND `/apicontract` validation rules). For each ripple, list the affected files and offer to update them with `/revise`. Do not auto-modify unrelated artifacts without confirmation.
 3. Save the updated artifact.
 ## Closing message

package/skills/estimate/SKILL.md CHANGED Viewed

@@ -35,12 +35,24 @@ Read `references/environment.md` from the `ba-toolkit` directory to determine th
 ## Calibration interview
+> **Follow the [Interview Protocol](../references/interview-protocol.md):** ask one question at a time, present a 2-column `| ID | Variant |` markdown table of up to 4 options plus a free-text "Other" row last (5 rows max), mark exactly one row **Recommended**, render variants in the user's language (rule 11), and wait for an answer.
+>
+> **Inline context (protocol rule 9):** if the user wrote text after `/estimate` (e.g., `/estimate t-shirt, post-srs phase, +20% buffer`), parse it and skip the matching questions.
 Before estimating, ask the following (skip questions where the answer is already clear from context):
-1. **Scale:** Story Points (Fibonacci: 1 / 2 / 3 / 5 / 8 / 13 / 21), T-shirt sizes (XS / S / M / L / XL), or ideal person-days? _(default: Story Points Fibonacci)_
+1. **Scale:** Story Points (Fibonacci: 1 / 2 / 3 / 5 / 8 / 13 / 21), T-shirt sizes (XS / S / M / L / XL), or ideal person-days? **Recommended:** Fibonacci Story Points.
 2. **Reference stories:** Do you have known reference stories to anchor the scale? (e.g. "US-003 is a 3" or "login is an S") If yes, calibrate against those.
-3. **Team assumptions:** Full-stack developer pair? Or separate frontend/backend? _(affects day estimates only)_
+3. **Team assumptions:** Full-stack developer pair? Or separate frontend/backend? *(affects day estimates only)*
 4. **Out of scope for this estimate:** Any stories to skip (e.g., already estimated, on ice)?
+5. **Estimation phase (Cone of Uncertainty band)** — what's the maturity of the underlying artifacts? **Recommended:** select the band that matches the latest completed pipeline step.
+   - **Discovery / Brief only** — variance ±400% (4× spread); confidence label "Order of magnitude".
+   - **Post-SRS** — variance ±100%; confidence label "Rough order of magnitude (ROM)".
+   - **Post-AC + Use Cases** — variance ±50%; confidence label "Budget estimate".
+   - **Post-Wireframes + Datadict + APIcontract** — variance ±25%; confidence label "Definitive estimate".
+   - **Post-Implementation Plan** — variance ±10%; confidence label "Control estimate".
+6. **Risk multiplier** — apply a buffer for high-risk stories (those linked to 🔴 Critical or 🟡 High risks in `00_risks_*.md`)? **Recommended:** +20% on Critical-linked stories, +10% on High-linked.
+7. **Technical debt allocation** — % of velocity reserved for tech debt, bug-fixes, support? **Recommended:** 20% (industry baseline).
 If the user types `/estimate` without additional input and prior context is sufficient, proceed with defaults and state assumptions clearly.
@@ -62,6 +74,17 @@ Apply the reference stories as anchors. If no references are given, calibrate in
 Do not pad estimates — model what a reasonably skilled team would take, not a worst-case scenario.
+## Estimation discipline — what this skill is and is not
+This skill produces a **single-estimator analytical pass**. A senior BA running estimation in production would augment this with:
+- **Planning poker** — team-based estimation where each developer commits independently and outliers discuss before re-voting. The canonical agile-team estimation technique. `/estimate` cannot do this — it must be run with the dev team after the analytical pass.
+- **Wideband Delphi** — multi-round expert estimation with anonymous re-voting. Used when the team is distributed or when politics distort consensus. Same caveat — requires real humans.
+**Cone of Uncertainty.** Boehm's classic principle: estimates made early in the lifecycle (post-Brief, post-Discovery) carry up to 4× variance from the eventual reality, narrowing to 1.25× by the time implementation is detailed. This skill **always emits a confidence label** based on the user-selected estimation phase (calibration question 5). A 5 SP estimate at the post-Brief phase is "5 SP ± 400%"; the same estimate at the post-Implementation-Plan phase is "5 SP ± 10%".
+**Single-point estimates without a confidence band are dangerous** — they are misread as commitments. Every estimate produced by this skill carries an explicit confidence band tied to the Cone of Uncertainty.
 ## Output
 ### If scope ≤ 20 stories — update `03_stories_{slug}.md` in-place
@@ -79,23 +102,32 @@ Standalone estimation report with the full table.
 ```
 ## Estimation Summary — [PROJECT_NAME]
 Scale: [Story Points / T-shirt / Days]
-| US | Title | Epic | Estimate | Key drivers |
-|----|-------|------|---------|-------------|
-| US-001 | [Title] | E-01 | 3 SP | 2 AC scenarios, simple UI |
-| US-002 | [Title] | E-01 | 8 SP | external payment API, 4 AC scenarios |
-| US-003 | [Title] | E-02 | 13 SP | new domain area, complex state machine |
+Estimation phase: [Discovery / Brief / SRS / AC / Wireframes / Implementation Plan]
+Confidence band: [±400% / ±100% / ±50% / ±25% / ±10%]
+Confidence label: [Order of magnitude / ROM / Budget / Definitive / Control]
+| US | Title | Epic | Estimate | Confidence | Key drivers |
+|----|-------|------|----------|------------|-------------|
+| US-001 | [Title] | E-01 | 3 SP ± 25% | Definitive | 2 AC scenarios, simple UI |
+| US-002 | [Title] | E-01 | 8 SP ± 25% | Definitive | external payment API, 4 AC scenarios |
+| US-003 | [Title] | E-02 | 13 SP ± 50% | Budget | new domain area, complex state machine, RISK-02 +20% |
 ...
 | Metric | Value |
 |--------|-------|
 | Total stories estimated | [N] |
-| Total Story Points | [N] SP |
+| Total Story Points (point estimate) | [N] SP |
+| Total with confidence band | [N – N] SP ([low] – [high]) |
 | Largest story | US-[NNN] ([N] SP) — [reason] |
 | Stories ≥ 13 SP (consider splitting) | [N]: US-[NNN], … |
 | Average per story | [N] SP |
+| Risk-adjusted total (high-risk +20%) | [N] SP |
+| Tech debt reservation ([N]%) | [N] SP |
+| **Net feature capacity** | [N] SP |
 ```
+> ⚠️ **Single-estimator caveat.** This is an analytical estimation pass by a single estimator. For a real team commitment, run **planning poker** with the dev team using these numbers as the starting anchor and re-vote on any item where the team's estimate differs by more than one Fibonacci step.
 ### Splitting recommendations
 For any story estimated at 13 SP or higher (or XL), suggest a concrete split:

package/skills/glossary/SKILL.md CHANGED Viewed

@@ -19,22 +19,27 @@ Examples:
 - `/glossary drift` — only show terminology drift findings, do not regenerate
 - `/glossary add [Term]: [Definition]` — manually add a term to the glossary
+**Standard alignment:** definition style follows the **Aristotelian definition** (genus + differentia + purpose) used by ISO 1087-1 (Terminology Work — Vocabulary). A well-formed definition names the **kind** the term belongs to (genus), then **what makes it specific** (differentia), and optionally its **purpose**. Example: "A **Workspace** is a tenant boundary [genus: tenant boundary] for billing and access control [differentia + purpose]." Bad definitions are circular ("a User is someone who uses the system") or self-referential ("a Subscription is a user's subscription") — `/glossary` rejects these and asks for a rewrite.
 ## Context loading
-0. If `00_principles_*.md` exists, load it — apply language convention (section 1) and ID naming convention (section 2).
+0. If `00_principles_*.md` exists, load it — apply language convention (section 1), ID naming convention (section 2), and any glossary-specific rules in section 8 (Project-Specific Notes).
 1. Scan the output directory for all existing artifacts. Load each one found:
+   - `00_discovery_{slug}.md` — concept terms, candidate audience names, reference product names
+   - `00_principles_{slug}.md` — convention names, mandatory category names
    - `01_brief_{slug}.md` — Brief Glossary section
    - `02_srs_{slug}.md` — Definitions and Abbreviations section, User Roles
-   - `03_stories_{slug}.md` — actor names used in "As a..." statements
-   - `04_usecases_{slug}.md` — actor names, system names
+   - `03_stories_{slug}.md` — persona names, role names used in "As a..." statements
+   - `04_usecases_{slug}.md` — actor names, system names, stakeholder names
    - `05_ac_{slug}.md` — state names, condition terms
-   - `06_nfr_{slug}.md` — category names, compliance standard names
-   - `07_datadict_{slug}.md` — entity names, field names, enum values
+   - `06_nfr_{slug}.md` — category names, compliance standard names, ISO 25010 sub-characteristics
+   - `07_datadict_{slug}.md` — entity names, field names, enum values, state machine state names
    - `07a_research_{slug}.md` — technology names, ADR decisions
-   - `08_apicontract_{slug}.md` — error codes, resource names
+   - `08_apicontract_{slug}.md` — error codes, resource names, scope names
    - `09_wireframes_{slug}.md` — screen names, UI component names
    - `10_scenarios_{slug}.md` — persona names, scenario types
    - `11_handoff_{slug}.md` — any additional terms
+   - `12_implplan_{slug}.md` — phase names, task ids, technology choices from the Tech Stack header
 2. Load `skills/references/domains/{domain}.md` — Domain Glossary section.
 3. If `00_glossary_{slug}.md` already exists, load it to merge rather than replace.
@@ -61,11 +66,12 @@ For each term, record:
 ### Step 2 — Terminology drift detection
-Identify cases where the same concept appears under different names in different artifacts:
+Identify cases where the same concept appears under different names in different artifacts. The recommendation must include a **justification** for the canonical name choice — not just "Customer" but "Customer because it appears earliest (in the Brief), most frequently (8 occurrences), and matches the domain reference":
 ```
 ⚠️ Drift: "Customer" (Brief), "User" (SRS), "Account" (Data Dictionary) — all refer to the registered buyer entity.
 → Recommend canonical name: "Customer"
+→ Justification: appears in the Brief (the source-of-truth glossary per project convention), used 8 times across artifacts vs 3 for "User" and 2 for "Account", and matches the e-commerce domain reference glossary.
 ```
 Flag drift as:
@@ -73,6 +79,17 @@ Flag drift as:
 - 🟡 **Medium** — synonym used in 1 artifact (easy to harmonise).
 - 🟢 **Low** — informal variation in a description field only.
+### Step 2b — Definition quality check
+For every term that has a definition, verify the definition follows the Aristotelian form (genus + differentia + purpose). Flag definitions that are:
+- 🔴 **Circular** — define the term using the term itself ("a User is a user of the system", "a Subscription is the user's subscription").
+- 🔴 **Self-referential** — define the term by quoting the artifact that uses it ("a Workspace is the workspace described in section 2").
+- 🟡 **Missing genus** — definition lists features but never names the kind ("Workspace: has billing, members, settings" — what *kind of thing* is it?).
+- 🟡 **Missing differentia** — names the genus but doesn't distinguish from siblings ("a Workspace is a container").
+The drift report includes a "Definition quality" sub-section that lists all weak definitions with a recommended rewrite. Senior BAs catch this on every glossary review.
 ### Step 3 — Undefined term detection
 Identify terms used in requirements but not defined anywhere in the glossary or domain reference:

package/skills/references/templates/analyze-template.md CHANGED Viewed

@@ -1,16 +1,19 @@
 # Quality Analysis Report: [PROJECT_NAME]
+**Version:** 0.1
+**Status:** Draft
 **Domain:** [DOMAIN]
 **Date:** [DATE]
 **Slug:** [SLUG]
 **Artifacts Analysed:** [List of artifact files included in this analysis]
+**Severity threshold (blocks /done):** [CRITICAL / HIGH / per `00_principles_[SLUG].md` §6]
 ---
 ## Finding Summary
 | Severity | Count |
-|---------|-------|
+|----------|-------|
 | 🔴 Critical | [N] |
 | 🟠 High | [N] |
 | 🟡 Medium | [N] |
@@ -21,20 +24,25 @@
 ## Findings
-| # | Severity | Category | Location | Description | Recommendation |
-|---|---------|---------|---------|-------------|---------------|
-| 1 | 🔴 Critical | [Duplication \| Ambiguity \| Coverage Gap \| Terminology Drift \| Invalid Reference] | [Artifact + section] | [What the issue is] | [How to fix it] |
-| 2 | 🟠 High | [Category] | [Location] | [Description] | [Recommendation] |
-| 3 | 🟡 Medium | [Category] | [Location] | [Description] | [Recommendation] |
-| 4 | 🟢 Low | [Category] | [Location] | [Description] | [Recommendation] |
-### Finding Categories
-- **Duplication** — same requirement, entity, or AC scenario defined in more than one artifact.
-- **Ambiguity** — undefined terms, metrics-free adjectives ("fast", "user-friendly"), or unclear actors.
-- **Coverage Gap** — FR with no US, US with no AC, or endpoint with no FR reference.
-- **Terminology Drift** — same concept called different names across artifacts (e.g. "user" vs "customer" vs "account").
-- **Invalid Reference** — cross-reference points to an ID that does not exist in the linked artifact.
+| # | Severity | Category | Location | Description | Recommendation | Owner |
+|---|----------|----------|----------|-------------|----------------|-------|
+| 1 | 🔴 Critical | Coverage Gap | [Artifact + section] | [What the issue is] | [How to fix it] | BA |
+| 2 | 🟠 High | Duplication | [Location] | [Description] | [Recommendation] | BA |
+| 3 | 🟡 Medium | Ambiguity | [Location] | [Description] | [Recommendation] | BA |
+| 4 | 🟠 High | Underspecified Source | [Location] | [Description] | [Recommendation] | BA |
+| 5 | 🟠 High | Stakeholder Validation Gap | [Location] | [Description] | [Recommendation] | PM |
+| 6 | 🟢 Low | Terminology Drift | [Location] | [Description] | [Recommendation] | BA |
+### Finding Categories *(8 canonical, mapped to IEEE 830 §4.3 SRS quality attributes)*
+- **Duplication** *(IEEE 830: consistency)* — same requirement, entity, or AC scenario defined in more than one artifact.
+- **Ambiguity** *(IEEE 830: unambiguous)* — undefined terms, metrics-free adjectives ("fast", "user-friendly"), unclear actors, modal verb confusion (must / shall / should / may).
+- **Coverage Gap** *(IEEE 830: complete + traceable)* — FR with no US, US with no AC, endpoint with no FR reference, Task with no upstream reference.
+- **Terminology Drift** *(IEEE 830: consistent)* — same concept called different names across artifacts (e.g. "user" vs "customer" vs "account").
+- **Invalid Reference** *(IEEE 830: traceable)* — cross-reference points to an ID that does not exist in the linked artifact.
+- **Inconsistency** *(IEEE 830: consistent)* — same fact stated differently across artifacts (e.g. brief says €1.5M GMV, srs says €2M).
+- **Underspecified Source** *(post-v3.5.0 provenance discipline)* — FR / UC / AC / Entity / Endpoint missing the `Source` field that the v3.5.0+ template requires.
+- **Stakeholder Validation Gap** *(post-v3.5.0 assumption discipline)* — assumption with no Owner, or past its Validate by date and not converted to fact or risk.
 ---