npm - product-playbook - Versions diffs - 1.2.2 → 1.2.4 - Mend

product-playbook 1.2.2 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/README.es.md +31 -0
package/README.ja.md +31 -0
package/README.ko.md +31 -0
package/README.md +31 -0
package/README.zh-CN.md +31 -0
package/README.zh-TW.md +31 -0
package/SKILL.md +61 -0
package/commands/product-feature.md +2 -0
package/i18n/en/SKILL.md +61 -0
package/i18n/en/commands/product-feature.md +2 -0
package/i18n/en/references/02b-jtbd.md +19 -7
package/i18n/en/references/rules-context.md +10 -1
package/i18n/en/references/rules-quality-review.md +13 -7
package/i18n/es/SKILL.md +61 -0
package/i18n/es/commands/product-feature.md +2 -0
package/i18n/es/references/02b-jtbd.md +24 -7
package/i18n/es/references/rules-context.md +10 -1
package/i18n/es/references/rules-quality-review.md +12 -7
package/i18n/ja/SKILL.md +61 -0
package/i18n/ja/commands/product-feature.md +2 -0
package/i18n/ja/references/02b-jtbd.md +24 -7
package/i18n/ja/references/rules-context.md +10 -1
package/i18n/ja/references/rules-quality-review.md +12 -7
package/i18n/ko/SKILL.md +61 -0
package/i18n/ko/commands/product-feature.md +2 -0
package/i18n/ko/references/02b-jtbd.md +24 -7
package/i18n/ko/references/rules-context.md +10 -1
package/i18n/ko/references/rules-quality-review.md +12 -7
package/i18n/zh-CN/SKILL.md +61 -0
package/i18n/zh-CN/commands/product-feature.md +2 -0
package/i18n/zh-CN/references/02b-jtbd.md +24 -7
package/i18n/zh-CN/references/rules-context.md +10 -1
package/i18n/zh-CN/references/rules-quality-review.md +12 -7
package/i18n/zh-TW/SKILL.md +61 -0
package/i18n/zh-TW/commands/product-feature.md +10 -8
package/i18n/zh-TW/references/02b-jtbd.md +24 -7
package/i18n/zh-TW/references/rules-context.md +10 -1
package/i18n/zh-TW/references/rules-quality-review.md +12 -7
package/package.json +1 -1
package/references/02b-jtbd.md +24 -7
package/references/rules-context.md +10 -1
package/references/rules-quality-review.md +12 -7

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -7,7 +7,7 @@
     {
       "name": "product-playbook",
       "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
-      "version": "1.2.2",
+      "version": "1.2.4",
       "source": "./."
     }
   ]

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "product-playbook",
   "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
-  "version": "1.2.2",
+  "version": "1.2.4",
   "author": {
     "name": "Charles Chen"
   },

package/README.es.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbook es un **Skill de Claude AI** que te guía sistemáticamente
 - 🧭 **6 modos de ejecución** — desde validación rápida en 30 minutos hasta planes de producto completos (incluyendo una ruta rápida de expansión de funcionalidades)
 - 📐 **22 frameworks de producto** — cubriendo toda la cadena Discovery → Define → Develop → Deliver
+- 🤝 **3 sub-agentes especialistas** — Discovery, Crítica de Estrategia y Pre-mortem operan como ventanas de contexto aisladas con experiencia específica de framework
 - 🔄 **Motor de propagación de cambios** — modifica cualquier paso y todos los outputs downstream se actualizan automáticamente
 - 📎 **Integración inteligente de archivos** — sube datos, capturas de pantalla o documentos; la IA los integra automáticamente en el paso relevante
 - 🔗 **Handoff de desarrollo** — genera CLAUDE.md + TASKS.md + TICKETS.md para un handoff fluido al desarrollo en Claude Code
@@ -155,6 +156,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — Generar PRD
 │   ├── product-report.md             # /product-report — Generar reporte HTML
 │   └── product-dev.md                # /product-dev — Generar paquete de handoff de desarrollo
+├── agents/                           # Sub-agentes especialistas (cargados automáticamente por el plugin de Claude Code)
+│   ├── discovery-specialist.md       # Especialista en Persona / JTBD / OST / Journey Map
+│   ├── strategy-critic.md            # Crítico de estrategia con la lente de Rumelt
+│   └── pre-mortem-runner.md          # 15+ escenarios de fallo + indicadores adelantados
 └── references/
     ├── 00-opportunity-check.md       # Evaluación de oportunidad + Modelo DHM
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Comparando la calidad de respuesta entre "con guía del Skill" y "sin guía del
 > Ver [`evals/`](./evals/) para metodología detallada y datos.
+### Iteración 5: Comparación A/B de Sub-agent (3 evaluaciones relevantes a despacho × 22 expectativas)
+Una corrida A/B enfocada que mide la contribución marginal de calidad de los 3 sub-agents especialistas (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) introducidos en v1.2.0+. Misma versión del skill (v1.2.3), mismos prompts, dos brazos:
+- **CON sub-agent**: el executor lee el archivo `agents/*.md` correspondiente y sigue el esquema de salida declarado por el especialista + autoverificaciones; el despacho se marca en la respuesta.
+- **SIN sub-agent**: el executor tiene prohibido leer cualquier `agents/*.md` o mencionar la delegación; debe manejar el paso inline como orchestrator usando sólo `SKILL.md` + `commands/` + `references/`.
+| Evaluación | Con Sub-agent | Sin Sub-agent | Delta |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
+| Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
+| **Pre-mortem (evaluación de riesgo en Build Mode)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
+| **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
+El consumo de tokens es prácticamente idéntico en ambos brazos (151K vs 154K) — mantener un especialista no cuesta más que manejar el paso inline.
+**Hallazgos Clave**
+- **Pre-mortem-runner es load-bearing** (+77.8%): sin él, el orchestrator produce una lista de riesgos delgada y en tiempo futuro, perdiendo el conteo de escenarios (≥15), cobertura de 5 categorías, disciplina de leading indicator, experimentos pre-launch de bajo costo, y el marco narrativo de "lanzó y falló" en pasado. El esquema estructurado del especialista hace trabajo real que `references/` por sí solo no reproduce.
+- **Discovery-specialist y strategy-critic son contribuidores moderados** (+14–17%): el orchestrator puede producir análisis razonables de Persona+JTBD y críticas de estrategia inline. El único assertion divergente entre brazos es el contrato de despacho mismo, no la calidad estructural.
+- **Implicación**: de los 3 especialistas, pre-mortem-runner ofrece el mayor lift de calidad solo y es el más justificado por estos resultados. Los otros dos podrían en principio plegarse de vuelta al orchestrator con páginas de referencia más fuertes, aunque no hay incentivo de costo para hacerlo (los tokens son iguales).
+**Advertencia del harness**: el executor `general-purpose` usado en este harness de eval no expone despacho `Task` anidado, por lo que el brazo CON aproxima el despacho real leyendo el `agents/*.md` del especialista y siguiendo su esquema inline (con un marcador de despacho explícito). El contraste estructural vs SIN es real, pero se necesitaría una corrida top-session para verificar de extremo a extremo la calidad del despacho via Task tool.
+> Artefactos crudos y divergencia por assertion en [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
 ---
 ## 💬 Comandos Disponibles

package/README.ja.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbookは、ゼロから一まで体系的にプロダクト企画
 - 🧭 **6つの実行モード** — 30分の迅速な検証からフルスケールのプロダクト企画まで（機能拡張ファストトラックを含む）
 - 📐 **22のプロダクトフレームワーク** — Discovery → Define → Develop → Deliverの全パイプラインをカバー
+- 🤝 **3つの専門サブエージェント** — Discovery、戦略批評、Pre-mortem が独立した context window で動作し、フレームワーク固有の専門性を持つ
 - 🔄 **変更伝播エンジン** — 任意のステップを修正すると下流の全出力が自動更新
 - 📎 **スマートファイル統合** — データ、スクリーンショット、ドキュメントをアップロードするとAIが関連ステップに自動統合
 - 🔗 **開発ハンドオフ** — CLAUDE.md + TASKS.md + TICKETS.mdを生成してClaude Code開発にシームレスに接続
@@ -156,6 +157,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — PRD生成
 │   ├── product-report.md             # /product-report — HTMLレポート生成
 │   └── product-dev.md                # /product-dev — 開発ハンドオフパッケージ生成
+├── agents/                           # 専門サブエージェント（Claude Code プラグインが自動読み込み）
+│   ├── discovery-specialist.md       # Persona / JTBD / OST / Journey Map スペシャリスト
+│   ├── strategy-critic.md            # Rumelt 視点の戦略批評者
+│   └── pre-mortem-runner.md          # 15+ の失敗シナリオ + リーディングインジケーター
 └── references/
     ├── 00-opportunity-check.md       # 機会評価 + DHMモデル
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -419,6 +424,32 @@ Claude Codeは自動的に：
 > 詳細な方法論とデータは[`evals/`](./evals/)を参照。
+### イテレーション5：Sub-agent A/B 比較（ディスパッチ関連3評価 × 22期待値）
+v1.2.0+ で導入された3つの専門 sub-agent（`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`）の品質への限界貢献を測定する集中 A/B 評価。同じスキル版（v1.2.3）、同じプロンプト、2つの arm：
+- **Sub-agent あり**：executor は該当する `agents/*.md` を読み、専門エージェントが宣言する出力スキーマと自己チェックに従う。レスポンス内に dispatch マーカーを記録。
+- **Sub-agent なし**：executor は `agents/*.md` を一切読まず、delegation に言及しない。`SKILL.md` + `commands/` + `references/` のみを使い、orchestrator が inline で処理する。
+| 評価項目 | Sub-agent あり | Sub-agent なし | 差分 |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery（Persona + JTBD） | 100%（7/7） | 85.7%（6/7） | +14.3% |
+| Strategy Critic | 100%（6/6） | 83.3%（5/6） | +16.7% |
+| **Pre-mortem（Build Mode リスク評価）** | **100%（9/9）** | **22.2%（2/9）** | **+77.8% ✅** |
+| **合計** | **100%（22/22）** | **59.1%（13/22）** | **+40.9%** |
+両 arm の token 消費はほぼ同じ（151K vs 154K）— 専門エージェントを保持することは inline 処理より高くはならない。
+**主要な発見**
+- **Pre-mortem-runner は load-bearing**（+77.8%）：これがないと、orchestrator は薄く未来形のリスクリストしか生成できず、シナリオ数（≥15）、5カテゴリーのカバレッジ、leading indicator の規律、低コスト pre-launch 実験、過去形「出荷して失敗した」のナラティブ枠組みを失う。構造化された専門エージェントのスキーマが本当の仕事をしており、`references/` だけでは再構築できない。
+- **Discovery-specialist と strategy-critic は中程度の貢献**（+14–17%）：orchestrator 単独でも Persona+JTBD と戦略批評を妥当なレベルで処理できる。両 arm で分岐する assertion は dispatch コントラクト自体であり、構造的品質ではない。
+- **含意**：3つの専門のうち、pre-mortem-runner が単独での品質向上が最大で、最も保持を正当化される。他の2つは原理的には強化された reference ページで orchestrator に折り返せるが、token コストが同じなので削減誘因はない。
+**ハーネスの注意**：この評価環境の `general-purpose` executor は nested `Task` を公開しないため、「Sub-agent あり」 arm は「専門エージェントの `agents/*.md` を読む + dispatch マーカー + スキーマを inline で遵守」で実際の dispatch を近似する。構造的対比は実物だが、エンドツーエンドの Task ツール dispatch を完全に検証するには top-session 実行が必要。
+> 生の成果物と assertion ごとの分岐は [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) を参照。
 ---
 ## 💬 利用可能なコマンド

package/README.ko.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbook은 제로부터 원까지 제품 기획 전 과정을 체
 - 🧭 **6가지 실행 모드** — 30분 빠른 검증부터 전체 제품 기획까지 (기능 확장 빠른 트랙 포함)
 - 📐 **22개 제품 프레임워크** — Discovery → Define → Develop → Deliver 전체 파이프라인 커버
+- 🤝 **3개 전문 서브에이전트** — Discovery, 전략 비평, Pre-mortem이 격리된 context window에서 작동하며 프레임워크별 전문성을 보유
 - 🔄 **변경 전파 엔진** — 어떤 단계든 수정하면 모든 하위 산출물이 자동 업데이트
 - 📎 **스마트 파일 통합** — 데이터, 스크린샷, 문서를 업로드하면 AI가 해당 단계에 자동 통합
 - 🔗 **개발 핸드오프** — CLAUDE.md + TASKS.md + TICKETS.md를 생성하여 Claude Code 개발로 원활하게 연결
@@ -155,6 +156,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — PRD 생성
 │   ├── product-report.md             # /product-report — HTML 보고서 생성
 │   └── product-dev.md                # /product-dev — 개발 핸드오프 패키지 생성
+├── agents/                           # 전문 서브에이전트 (Claude Code 플러그인이 자동 로드)
+│   ├── discovery-specialist.md       # Persona / JTBD / OST / Journey Map 스페셜리스트
+│   ├── strategy-critic.md            # Rumelt 관점의 전략 비평가
+│   └── pre-mortem-runner.md          # 15+ 실패 시나리오 + 선행 지표
 └── references/
     ├── 00-opportunity-check.md       # 기회 평가 + DHM Model
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code가 자동으로:
 > 상세한 방법론과 데이터는 [`evals/`](./evals/)를 참조하세요.
+### 반복 5: Sub-agent A/B 비교 (디스패치 관련 3개 평가 × 22개 기대값)
+v1.2.0+ 에서 도입된 3개의 전문 sub-agent (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) 의 품질에 대한 한계 기여를 측정하는 집중 A/B 평가. 동일 스킬 버전(v1.2.3), 동일 프롬프트, 2개 arm:
+- **Sub-agent 있음**: executor 가 해당 `agents/*.md` 파일을 읽고, 전문가가 선언한 출력 스키마와 자체 점검을 따름. 응답에 dispatch 마커 기록.
+- **Sub-agent 없음**: executor 는 어떤 `agents/*.md` 도 읽지 못하며, delegation 을 언급하지 못함. `SKILL.md` + `commands/` + `references/` 만 사용하여 orchestrator 가 inline 으로 처리.
+| 평가 항목 | Sub-agent 있음 | Sub-agent 없음 | 차이 |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
+| Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
+| **Pre-mortem (Build Mode 위험 평가)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
+| **합계** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
+두 arm 의 token 소비는 거의 동일함 (151K vs 154K) — 전문가를 유지하는 것이 inline 처리보다 더 비싸지 않음.
+**핵심 발견**
+- **Pre-mortem-runner 가 load-bearing** (+77.8%): 이것이 없으면 orchestrator 는 얇고 미래형인 위험 리스트만 생성하며, 시나리오 수 (≥15), 5개 카테고리 커버리지, leading indicator 규율, 저비용 pre-launch 실험, 과거형 "출시 후 실패" 내러티브 프레임을 놓침. 구조화된 전문가 스키마가 실제로 일을 하고 있으며, `references/` 만으로는 재구성할 수 없음.
+- **Discovery-specialist 와 strategy-critic 은 중간 기여자** (+14–17%): orchestrator 자체만으로도 Persona+JTBD 와 전략 비평을 합리적 수준에서 처리할 수 있음. 두 arm 에서 분기하는 유일한 assertion 은 dispatch 계약 자체이며, 구조적 품질이 아님.
+- **함의**: 3개 전문가 중 pre-mortem-runner 가 단독 품질 향상이 가장 크고 보존이 가장 정당화됨. 다른 2개는 원칙적으로 강화된 reference 페이지로 orchestrator 에 통합할 수 있지만, token 비용이 동일하므로 축소 동기는 없음.
+**Harness 주의사항**: 이 평가 환경의 `general-purpose` executor 는 nested `Task` 를 노출하지 않으므로, "Sub-agent 있음" arm 은 "전문가 `agents/*.md` 읽기 + dispatch 마커 + 스키마를 inline 으로 준수" 로 실제 dispatch 를 근사함. 구조적 대조는 실제이지만, 엔드투엔드 Task 도구 dispatch 를 완전히 검증하려면 top-session 실행이 필요함.
+> 원시 artifacts 와 assertion 별 분기는 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) 참조.
 ---
 ## 💬 사용 가능한 명령

package/README.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbook is a **Claude AI Skill** that systematically guides you thr
 - 🧭 **6 execution modes** — from 30-minute rapid validation to full-blown product plans (including a feature expansion fast track)
 - 📐 **22 product frameworks** — covering the entire Discovery → Define → Develop → Deliver pipeline
+- 🤝 **3 specialist sub-agents** — Discovery, Strategy Critique, and Pre-mortem run as isolated context windows with framework-specific expertise
 - 🔄 **Change propagation engine** — modify any step and all downstream outputs update automatically
 - 📎 **Smart file integration** — upload data, screenshots, or documents; the AI automatically integrates them into the relevant step
 - 🔗 **Dev handoff** — generates CLAUDE.md + TASKS.md + TICKETS.md for seamless handoff to Claude Code development
@@ -155,6 +156,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — Generate PRD
 │   ├── product-report.md             # /product-report — Generate HTML report
 │   └── product-dev.md                # /product-dev — Generate dev handoff package
+├── agents/                           # Specialist sub-agents (auto-loaded by Claude Code plugin)
+│   ├── discovery-specialist.md       # Persona / JTBD / OST / Journey Map specialist
+│   ├── strategy-critic.md            # Rumelt-lens strategy critic
+│   └── pre-mortem-runner.md          # 15+ failure scenarios + leading indicators
 └── references/
     ├── 00-opportunity-check.md       # Opportunity assessment + DHM Model
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ By comparing response quality between "with Skill guidance" and "without Skill g
 > See [`evals/`](./evals/) for detailed methodology and data.
+### Iteration 5: Sub-agent A/B Comparison (3 dispatch-relevant evals × 22 expectations)
+A focused A/B run measuring the marginal quality contribution of the 3 specialist sub-agents (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) shipped in v1.2.0+. Same skill version (v1.2.3), same prompts, two arms:
+- **WITH sub-agent**: executor reads the specialist's `agents/*.md` file and follows its declared output schema + self-checks; dispatch is marked in the response.
+- **WITHOUT sub-agent**: executor is forbidden from reading any `agents/*.md` or mentioning delegation; must handle the step inline as the orchestrator using only `SKILL.md` + `commands/` + `references/`.
+| Eval | With Sub-agent | Without Sub-agent | Delta |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
+| Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
+| **Pre-mortem (Build Mode risk)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
+| **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
+Token cost is essentially identical across arms (151K vs 154K) — keeping a specialist costs no more than handling the step inline.
+**Key Findings**
+- **Pre-mortem-runner is load-bearing** (+77.8%): without it the orchestrator produces a thin, future-tense risk list and misses scenario count (≥15), 5-category coverage, leading-indicator discipline, cheap pre-launch experiments, and past-tense "shipped-and-failed" framing. The structured specialist schema is doing real work that `references/` alone does not reproduce.
+- **Discovery-specialist and strategy-critic are modest contributors** (+14–17%): the orchestrator can produce reasonable Persona+JTBD analyses and strategy critiques inline. The diverging assertion in each case is the dispatch contract itself, not the structural quality.
+- **Implication**: of the 3 specialists, the pre-mortem-runner gives the largest standalone quality lift and is the most justified by these results. The other two could in principle be folded back into the orchestrator with stronger reference pages, though there is no cost incentive to do so (tokens are a wash).
+**Harness caveat**: the `general-purpose` executor used in this eval harness does not expose nested `Task` dispatch, so the WITH arm approximates real dispatch by reading the specialist's `agents/*.md` and following its schema inline (with an explicit dispatch marker). The structural contrast vs WITHOUT is real, but a true top-session run would be needed to verify end-to-end Task-tool dispatch quality.
+> Raw artifacts and per-assertion divergence in [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
 ---
 ## 💬 Available Commands

package/README.zh-CN.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbook 是一个 **Claude AI Skill**，能够系统性地引导你
 - 🧭 **6 种执行模式** — 从 30 分钟快速验证到完整企划（含功能扩充快速路径）
 - 📐 **22 个产品框架** — 涵盖 Discovery → Define → Develop → Deliver 全流程
+- 🤝 **3 个专家 sub-agent** — Discovery、策略批判、Pre-mortem 在独立 context window 中运作，各自携带专属框架专业
 - 🔄 **变更传播引擎** — 修改任何步骤，自动更新所有下游产出
 - 📎 **文件智慧整合** — 上传数据、截图、文件，AI 自动整合到对应步骤
 - 🔗 **开发衔接** — 产出 CLAUDE.md + TASKS.md + TICKETS.md，无缝衔接 Claude Code 开发
@@ -155,6 +156,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — 产出 PRD
 │   ├── product-report.md             # /product-report — 产出 HTML 报告
 │   └── product-dev.md                # /product-dev — 产出开发交接包
+├── agents/                           # 专家 sub-agent（Claude Code plugin 自动加载）
+│   ├── discovery-specialist.md       # Persona / JTBD / OST / Journey Map 专家
+│   ├── strategy-critic.md            # Rumelt 视角的策略批判者
+│   └── pre-mortem-runner.md          # 15+ failure scenarios + leading indicators
 └── references/
     ├── 00-opportunity-check.md       # 机会评估 + DHM Model
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 会自动：
 > 详细评测方法与数据见 [`evals/`](./evals/) 目录。
+### Iteration 5：Sub-agent A/B 对照（3 个专家相关评测 × 22 个期望值）
+针对 v1.2.0+ 推出的 3 个专家 sub-agent（`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`）所做的聚焦 A/B 测试，量化它们在品质上的边际贡献。相同 skill 版本（v1.2.3）、相同 prompt、两个 arm：
+- **有 Sub-agent**：executor 可读取对应的 `agents/*.md`，并遵循该专家声明的输出 schema 与自检；回应中标记 dispatch。
+- **无 Sub-agent**：executor 不得读取任何 `agents/*.md`，不得提及 delegation；只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 处理。
+| 评测项目 | 有 Sub-agent | 无 Sub-agent | 差异 |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery（Persona + JTBD） | 100%（7/7） | 85.7%（6/7） | +14.3% |
+| Strategy Critic | 100%（6/6） | 83.3%（5/6） | +16.7% |
+| **Pre-mortem（Build Mode 风险评估）** | **100%（9/9）** | **22.2%（2/9）** | **+77.8% ✅** |
+| **总计** | **100%（22/22）** | **59.1%（13/22）** | **+40.9%** |
+两个 arm 的 token 消耗几乎相同（151K vs 154K）——保留专家不会比 inline 处理更贵。
+**关键发现**
+- **Pre-mortem-runner 是 load-bearing**（+77.8%）：少了它，orchestrator 只能产出单薄、未来式的风险清单，缺失 scenario 数量（≥15）、五类别覆盖、leading-indicator 纪律、低成本上线前实验、以及过去式「已上线且失败」叙事框架。结构化的专家 schema 在做真正的工作，光看 `references/` 无法重建。
+- **Discovery-specialist 与 strategy-critic 属于中度贡献**（+14–17%）：orchestrator 自己处理 Persona+JTBD 与策略批判已可达合理水准。两个 arm 唯一分歧的 assertion 是 dispatch 契约本身，而非结构性品质。
+- **意涵**：3 个专家中，pre-mortem-runner 对品质提升的贡献最大、最值得保留；另外两个原则上可以靠加强 reference 文件 fold 回 orchestrator，但因为 token 成本相同，没有减量诱因。
+**Harness 警语**：此评测环境的 `general-purpose` executor 并未暴露 nested `Task`，因此「有 Sub-agent」arm 是以「读取专家 `agents/*.md` + 标记 dispatch + 遵循 schema inline」近似真实 dispatch。结构性对比是真的，但要完全验证端到端 Task 工具 dispatch 还需要 top-session 测试。
+> 原始 artifacts 与每项 assertion 分歧详见 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
 ---
 ## 💬 可用指令一览

package/README.zh-TW.md CHANGED Viewed

@@ -21,6 +21,7 @@ The Product Playbook 是一個 **Claude AI Skill**，能夠系統性地引導你
 - 🧭 **6 種執行模式** — 從 30 分鐘快速驗證到完整企劃（含功能擴充快速路徑）
 - 📐 **22 個產品框架** — 涵蓋 Discovery → Define → Develop → Deliver 全流程
+- 🤝 **3 個專家 sub-agent** — Discovery、策略批判、Pre-mortem 在獨立 context window 中運作，各自攜帶專屬框架專業
 - 🔄 **變更傳播引擎** — 修改任何步驟，自動更新所有下游產出
 - 📎 **檔案智慧整合** — 上傳數據、截圖、文件，AI 自動整合到對應步驟
 - 🔗 **開發銜接** — 產出 CLAUDE.md + TASKS.md + TICKETS.md，無縫銜接 Claude Code 開發
@@ -155,6 +156,10 @@ product-playbook/
 │   ├── product-prd.md                # /product-prd — 產出 PRD
 │   ├── product-report.md             # /product-report — 產出 HTML 報告
 │   └── product-dev.md                # /product-dev — 產出開發交接包
+├── agents/                           # 專家 sub-agent（Claude Code plugin 自動載入）
+│   ├── discovery-specialist.md       # Persona / JTBD / OST / Journey Map 專家
+│   ├── strategy-critic.md            # Rumelt 視角的策略批判者
+│   └── pre-mortem-runner.md          # 15+ failure scenarios + leading indicators
 └── references/
     ├── 00-opportunity-check.md       # 機會評估 + DHM Model
     ├── 01-strategy.md                # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 會自動：
 > 詳細評測方法與數據見 [`evals/`](./evals/) 目錄。
+### Iteration 5：Sub-agent A/B 對照（3 個專家相關評測 × 22 個期望值）
+針對 v1.2.0+ 推出的 3 個專家 sub-agent（`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`）所做的聚焦 A/B 測試，量化它們在品質上的邊際貢獻。相同 skill 版本（v1.2.3）、相同 prompt、兩個 arm：
+- **有 Sub-agent**：executor 可讀取對應的 `agents/*.md`，並遵循該專家宣告的輸出 schema 與自檢；回應中標記 dispatch。
+- **無 Sub-agent**：executor 不得讀取任何 `agents/*.md`，不得提及 delegation；只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 處理。
+| 評測項目 | 有 Sub-agent | 無 Sub-agent | 差異 |
+|-----------|:--------:|:------------:|:-----:|
+| Discovery（Persona + JTBD） | 100%（7/7） | 85.7%（6/7） | +14.3% |
+| Strategy Critic | 100%（6/6） | 83.3%（5/6） | +16.7% |
+| **Pre-mortem（Build Mode 風險評估）** | **100%（9/9）** | **22.2%（2/9）** | **+77.8% ✅** |
+| **總計** | **100%（22/22）** | **59.1%（13/22）** | **+40.9%** |
+兩個 arm 的 token 消耗幾乎相同（151K vs 154K）——保留專家不會比 inline 處理更貴。
+**關鍵發現**
+- **Pre-mortem-runner 是 load-bearing**（+77.8%）：少了它，orchestrator 只能產出單薄、未來式的風險清單，缺失 scenario 數量（≥15）、五類別覆蓋、leading-indicator 紀律、低成本上線前實驗、以及過去式「已上線且失敗」敘事框架。結構化的專家 schema 在做真正的工作，光看 `references/` 無法重建。
+- **Discovery-specialist 與 strategy-critic 屬於中度貢獻**（+14–17%）：orchestrator 自己處理 Persona+JTBD 與策略批判已可達合理水準。兩個 arm 唯一分歧的 assertion 是 dispatch 契約本身，而非結構性品質。
+- **意涵**：3 個專家中，pre-mortem-runner 對品質提升的貢獻最大、最值得保留；另外兩個原則上可以靠加強 reference 文件 fold 回 orchestrator，但因為 token 成本相同，沒有減量誘因。
+**Harness 警語**：此評測環境的 `general-purpose` executor 並未暴露 nested `Task`，因此「有 Sub-agent」arm 是以「讀取專家 `agents/*.md` + 標記 dispatch + 遵循 schema inline」近似真實 dispatch。結構性對比是真的，但要完全驗證端到端 Task 工具 dispatch 還需要 top-session 測試。
+> 原始 artifacts 與每項 assertion 分歧詳見 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
 ---
 ## 💬 可用指令一覽

package/SKILL.md CHANGED Viewed

@@ -134,6 +134,67 @@ When the user asks to list frameworks or uses supplementary commands, read `refe
 ---
+## 🤝 Sub-Agent Delegation Rules
+The Product Playbook ships with three specialist subagents that operate in isolated context windows. Delegate to them at the right step rather than handling everything in this main agent's context — specialists produce sharper output because they carry only the framework knowledge they need.
+### When to delegate to `discovery-specialist`
+Delegate at these steps:
+- **Full Mode**: S2 (Persona) → S3 (JTBD) → S4 (OST) → S5 (Journey Map) → S6 (Continuous Discovery hypotheses)
+- **Revision Mode**: S2 (current user analysis) → S3 (pain point synthesis) → S4 (opportunity identification)
+- **Build Mode**: S2 (problem clarification with JTBD lens)
+- **Custom Mode**: any step that selects Persona / JTBD / OST / Journey Map / Continuous Discovery
+How to invoke:
+> Use the `discovery-specialist` subagent to produce [Persona | JTBD | OST | Journey Map] for [product description]. Target audience: [B2C / B2B / B2B2C]. Available research data: [list uploaded files, or "none — flag low confidence"]. Reply in [language].
+Integrate the returned YAML into the current step's output. Surface the specialist's `open_questions` to the user as part of the step's confirmation prompt.
+### When to delegate to `strategy-critic`
+Delegate **immediately after** the user finalises any strategy artifact:
+- After Strategy Blocks completion (Full Mode S7)
+- After Rumelt Good Strategy Kernel completion (Full Mode S8)
+- After DHM Model completion (Full Mode S9)
+- After Empowered Teams charter (any mode that includes it)
+- Any time the user writes "this is our strategy" in plain prose without a named framework
+How to invoke:
+> Use the `strategy-critic` subagent to critique the following strategy artifact: [paste verbatim]. The artifact is [framework name or "generic strategy doc"]. Reply in [language].
+The critic returns critiques, not rewrites. Present the critic's `three_questions_to_ask_the_writer` to the user verbatim. Do not soften them. If the user revises in response, re-invoke the critic on the revised version.
+### When to delegate to `pre-mortem-runner`
+Delegate at these steps:
+- **Full Mode**: S10 (after MVP scoping is complete)
+- **Build Mode**: S4 (architecture-grounded pre-mortem)
+- **Revision Mode**: S8
+- **Feature Extension Mode**: S3 (risk assessment)
+- Any time the user explicitly requests pre-mortem / risk analysis / "what could go wrong"
+How to invoke:
+> Use the `pre-mortem-runner` subagent to pre-mortem the following [product | feature | strategy]: [paste verbatim]. Mode: [build_mode_architecture_grounded | standard | feature_extension]. If build mode, available architecture context: [paste relevant file contents or summary]. Reply in [language].
+The runner returns 15+ scenarios. In the user-facing output, lead with the `priority_three` and the `pre_launch_experiments`. Surface the full scenario list in a collapsible section or as an attached file.
+### Delegation hygiene
+1. **One sub-agent per step**. Do not chain sub-agents in a single turn — let the user confirm intermediate output before invoking the next specialist.
+2. **Pass language explicitly**. Sub-agents detect language from your prompt; if your prompt is in English but the user is working in 繁體中文, the sub-agent will reply in English. Always specify the user's working language.
+3. **Respect `status: out_of_scope`**. If a sub-agent refuses a request, take the routing recommendation seriously — the sub-agent's scope refusal is a feature, not a failure.
+4. **Hard Gate inheritance**. Sub-agents inherit the no-code-during-planning rule. They will refuse to write files or run bash even if you ask them to. This is intentional.
+5. **Quality self-check still applies**. After integrating sub-agent output into a step, run the existing quality self-check from `references/rules-quality-review.md` — the sub-agent did its own self-check, but the main agent owns the user-facing step output.
+---
 ## 🔗 Global Rule: Persona-Journey Bundling
 **Whenever a mode includes a Persona step, Journey Map is included by DEFAULT in the very next step.** Persona defines Who; Journey Map describes the journey Who experiences. This applies equally to 0-to-1 and existing products — the relevant variable is whether the user's Job spans multiple stages, not whether the product already exists. (Teresa Torres, Indi Young, and Amazon Working Backwards all treat Journey Map as essential during 0-to-1.)

package/commands/product-feature.md CHANGED Viewed

@@ -11,3 +11,5 @@ Execution mode: 🔧 Feature Extension Mode
 Feature description: $ARGUMENTS
 Follow the Feature Extension step sequence (S1 → S4). Load product context first per rules-context.md. Display a progress indicator at each step.
+**S0 → S1 sequencing (important)**: If Context Bootstrap (S0) is triggered because `.product-context.md` is missing, you MUST complete Bootstrap and S1 in the **same turn**, then pause **after S1 completion** awaiting user confirmation before S2. Do NOT pause between S0 and S1 — even if some Bootstrap fields are still missing, write a baseline `.product-context.md` with placeholders, enter S1, and ask for the missing fields as part of the S1 confirmation question. See `references/rules-context.md` "Bootstrap 與 S1 的順序" for details.

package/i18n/en/SKILL.md CHANGED Viewed

@@ -132,6 +132,67 @@ When product context read/write is triggered, read `references/rules-context.md`
 ---
+## 🤝 Sub-Agent Delegation Rules
+The Product Playbook ships with three specialist subagents that operate in isolated context windows. Delegate to them at the right step rather than handling everything in this main agent's context — specialists produce sharper output because they carry only the framework knowledge they need.
+### When to delegate to `discovery-specialist`
+Delegate at these steps:
+- **Full Mode**: S2 (Persona) → S3 (JTBD) → S4 (OST) → S5 (Journey Map) → S6 (Continuous Discovery hypotheses)
+- **Revision Mode**: S2 (current user analysis) → S3 (pain point synthesis) → S4 (opportunity identification)
+- **Build Mode**: S2 (problem clarification with JTBD lens)
+- **Custom Mode**: any step that selects Persona / JTBD / OST / Journey Map / Continuous Discovery
+How to invoke:
+> Use the `discovery-specialist` subagent to produce [Persona | JTBD | OST | Journey Map] for [product description]. Target audience: [B2C / B2B / B2B2C]. Available research data: [list uploaded files, or "none — flag low confidence"]. Reply in [language].
+Integrate the returned YAML into the current step's output. Surface the specialist's `open_questions` to the user as part of the step's confirmation prompt.
+### When to delegate to `strategy-critic`
+Delegate **immediately after** the user finalises any strategy artifact:
+- After Strategy Blocks completion (Full Mode S7)
+- After Rumelt Good Strategy Kernel completion (Full Mode S8)
+- After DHM Model completion (Full Mode S9)
+- After Empowered Teams charter (any mode that includes it)
+- Any time the user writes "this is our strategy" in plain prose without a named framework
+How to invoke:
+> Use the `strategy-critic` subagent to critique the following strategy artifact: [paste verbatim]. The artifact is [framework name or "generic strategy doc"]. Reply in [language].
+The critic returns critiques, not rewrites. Present the critic's `three_questions_to_ask_the_writer` to the user verbatim. Do not soften them. If the user revises in response, re-invoke the critic on the revised version.
+### When to delegate to `pre-mortem-runner`
+Delegate at these steps:
+- **Full Mode**: S10 (after MVP scoping is complete)
+- **Build Mode**: S4 (architecture-grounded pre-mortem)
+- **Revision Mode**: S8
+- **Feature Extension Mode**: S3 (risk assessment)
+- Any time the user explicitly requests pre-mortem / risk analysis / "what could go wrong"
+How to invoke:
+> Use the `pre-mortem-runner` subagent to pre-mortem the following [product | feature | strategy]: [paste verbatim]. Mode: [build_mode_architecture_grounded | standard | feature_extension]. If build mode, available architecture context: [paste relevant file contents or summary]. Reply in [language].
+The runner returns 15+ scenarios. In the user-facing output, lead with the `priority_three` and the `pre_launch_experiments`. Surface the full scenario list in a collapsible section or as an attached file.
+### Delegation hygiene
+1. **One sub-agent per step**. Do not chain sub-agents in a single turn — let the user confirm intermediate output before invoking the next specialist.
+2. **Pass language explicitly**. Sub-agents detect language from your prompt; if your prompt is in English but the user is working in 繁體中文, the sub-agent will reply in English. Always specify the user's working language.
+3. **Respect `status: out_of_scope`**. If a sub-agent refuses a request, take the routing recommendation seriously — the sub-agent's scope refusal is a feature, not a failure.
+4. **Hard Gate inheritance**. Sub-agents inherit the no-code-during-planning rule. They will refuse to write files or run bash even if you ask them to. This is intentional.
+5. **Quality self-check still applies**. After integrating sub-agent output into a step, run the existing quality self-check from `references/rules-quality-review.md` — the sub-agent did its own self-check, but the main agent owns the user-facing step output.
+---
 ## 🔗 Global Rule: Persona-Journey Bundling
 **Whenever a mode includes a Persona step, Journey Map is included by DEFAULT in the very next step.** Persona defines Who; Journey Map describes the journey Who experiences. This applies equally to 0-to-1 and existing products — the relevant variable is whether the user's Job spans multiple stages, not whether the product already exists. (Teresa Torres, Indi Young, and Amazon Working Backwards all treat Journey Map as essential during 0-to-1.)

package/i18n/en/commands/product-feature.md CHANGED Viewed

@@ -11,3 +11,5 @@ Execution mode: 🔧 Feature Extension Mode
 Feature description: $ARGUMENTS
 Follow the Feature Extension step sequence (S1 → S4). Load product context first per rules-context.md. Display a progress indicator at each step.
+**S0 → S1 sequencing (important)**: If Context Bootstrap (S0) is triggered because `.product-context.md` is missing, you MUST complete Bootstrap and S1 in the **same turn**, then pause **after S1 completion** awaiting user confirmation before S2. Do NOT pause between S0 and S1 — even if some Bootstrap fields are still missing, write a baseline `.product-context.md` with placeholders, enter S1, and ask for the missing fields as part of the S1 confirmation question. See `references/rules-context.md` "Bootstrap → S1 Sequencing" for details.

package/i18n/en/references/02b-jtbd.md CHANGED Viewed

@@ -4,17 +4,29 @@
 > "The unit of analysis is not the consumer, but the job the consumer is trying to get done." — Clayton Christensen
-**JTBD Statement Formula:**
+**JTBD Canonical Form (Hard Gate — three-clause structure required):**
+Every JTBD statement (Primary, Functional, Emotional, Social — every layer) MUST be written as a complete three-clause sentence in the canonical form. All three clauses are required:
 ```
-[Target customer] + wants to, in [what job context] + get [what job] done
+When [situation], I want to [motivation], so [outcome].
 ```
-Example: A first-time homebuyer comparing mortgage options wants to quickly estimate monthly payments late at night when they can't reach a bank, so they can walk their partner through their financial plan.
+**Failing examples** (fragments inside a table cell, missing clauses):
+- ❌ "Quickly capture key takeaways" (missing When; missing so)
+- ❌ "Jot down ideas during commute" (missing I want to; missing so outcome)
+**Passing example** (all three clauses present):
+- ✅ "**When** I've just finished reading an article and the key insight is still fresh, **I want to** capture one takeaway in 5 seconds, **so** weeks later I can still find it and connect it to a new idea."
+Example: **When** comparing mortgage options late at night and can't reach a bank, a first-time homebuyer **wants to** quickly estimate monthly payments, **so** they can walk their partner through their financial plan.
 **JTBD Four-Type Analysis Table:**
+Every cell (Persona 1 / Persona 2) MUST contain a complete three-clause JTBD sentence. Descriptive phrases without "When / I want to / so" structure are not acceptable.
 ```
-| JTBD Type | Definition | Persona 1 | Persona 2 |
+| JTBD Type | Definition | Persona 1 (must use "When … I want to … so …" full form) | Persona 2 (same) |
 |-----------|------------|-----------|-----------|
 | Functional Job | Completing a specific task or achieving a functional goal | | |
 | Emotional Job | How they feel or want to feel | | |
@@ -37,17 +49,17 @@ Example: A first-time homebuyer comparing mortgage options wants to quickly esti
 ### 📝 JTBD Quality Checklist
 Claude must self-check after producing JTBD output (each item must be marked ✅ or ❌; ❌ items must include how to improve):
+- [ ] Are **all three layers** (Functional / Emotional / Social) written in the full "When … I want to … so …" canonical form? (Any layer missing a clause → mark ❌)
 - [ ] Does it include a specific context? (Not "anytime, anywhere" — but "late at night when they can't reach a bank")
 - [ ] Does it focus on a single core job? (Not three jobs crammed into one sentence)
-- [ ] Are functional, emotional, and social jobs all identified?
 - [ ] Can it be used to evaluate "Does this solution actually address this job?"
 - [ ] Does it include "current workarounds" and "gap"? (Gap = opportunity)
 - [ ] Does Q5 of the Deep-Dive reach emotional motivation / professional identity / psychological fear? (Not just functional descriptions)
 **Execution Rules (Hard Gate):**
 - Must mark each item ✅ or ❌ — blank [ ] or unexplained ✅ lists are not allowed
-- If all items are ✅, must additionally state "What is the weakest part of this analysis and how to strengthen it"
-- ❌ Common issues: too abstract, too many jobs merged, missing context, substituting product features for job descriptions, Q5 staying at the functional level
+- **The checklist MUST contain at least one ❌** (see `references/rules-quality-review.md` "Mandatory Critique" rule). A ⚠️ warning marker cannot replace ❌; a "weakest aspect" note appended outside the checklist cannot replace a ❌ inside it. If after honest review every item still feels like a pass, lower the bar and find the item most worth marking ❌, then specify how to strengthen it.
+- ❌ Common issues: incomplete three-clause form (missing When / I want to / so), too abstract, too many jobs merged, missing context, substituting product features for job descriptions, Q5 staying at the functional level
 ---

package/i18n/en/references/rules-context.md CHANGED Viewed

@@ -134,9 +134,18 @@ Is this correct? Anything to add or correct?
 Only write to `.product-context.md` after the user confirms.
+### Bootstrap → S1 Sequencing (Hard Gate — Bootstrap does NOT block the flow)
+Bootstrap is Step 0; its purpose is to collect baseline context. **Bootstrap itself is not a pause point**:
+- **Default behavior**: Bootstrap and S1 MUST execute in the **same turn** as S0 → S1. The pause point is fixed at **after S1 completion**, awaiting user confirmation before S2 — not between S0 and S1.
+- **If the user message already provides the required fields** (per Section 7 mode requirements — e.g., Feature Extension requires Identity + Architecture & Tech Stack) → confirm the known fields in a table without re-asking, and proceed directly into S1.
+- **If some fields are missing** → Bootstrap surfaces a "known / pending" table in the same turn, then **immediately enters S1** with placeholders for unconfirmed fields, and folds the pending fields into the **S1 confirmation question** (so the user fills them when confirming S1).
+- **Forbidden**: pausing between S0 and S1 to wait for Round 1 answers. If your response shows S1 still as `⬜ pending` while the flow stops to wait for user input, you have failed this rule.
 ### After Bootstrap Completion
-Write the collected information to `.product-context.md`, leave uncollected sections empty (using placeholders), then proceed to the mode's formal S1.
+Write the collected information to `.product-context.md` (write a baseline even when some fields are placeholders — they will be overwritten after the user confirms during S1), then enter that mode's S1 in the same turn and produce the S1 content.
 ---

package/i18n/en/references/rules-quality-review.md CHANGED Viewed

@@ -10,13 +10,16 @@ After producing the output for each step, Claude must execute the following revi
 1. Find the quality checklist corresponding to the current step (see the "Review Criteria" section below)
 2. Mark each item as ✅ or ❌
-3. Items marked ❌ must include a specific explanation of "how to improve"
+3. Items marked ❌ must include a specific explanation of "how to improve", and must state how the gap will block a downstream step or artifact (e.g., "this blocks PR-FAQ writing because without a concrete scenario the Lead paragraph can't be written")
-### Step 2: Mandatory Critique
+### Step 2: Mandatory Critique (Hard Gate)
-- **Not all items may be ✅**: If all items pass, you must proactively identify "the weakest aspect of this output" and explain how to strengthen it
-- This isn't nitpicking — it ensures that the self-review doesn't become a rubber stamp
-- Following the Amazon PR-FAQ spirit: quality comes from finding problems, not from confirming there are none
+- **At least one item MUST be marked ❌**: a self-check that is all ✅ does not pass. Claude must honestly identify at least one item that does not fully meet the bar and mark it ❌
+- ⚠️ **A warning marker (⚠️) does NOT substitute for ❌**: ⚠️ is an auxiliary marker (e.g., on a "weakest aspect" supplementary note) and cannot replace a ❌ inside the checklist itself
+- **No bypass via appendix**: writing the "weakest aspect" as a separate note outside the checklist while keeping every checklist item ✅ counts as failing the self-check
+- ❌ must point at a **substantive content gap**, not surface-level concerns like "formatting could be prettier" or "wording could be tighter"
+- If after honest review every item still feels like a pass, lower the bar and re-review — any planning artifact has some dimension that is weakest and has room to improve; mark that one ❌ and specify the concrete fix
+- Design intent: following the Amazon PR-FAQ critique culture — quality comes from proactively finding problems, not from passively confirming there are none
 ### Step 3: Presentation Format
@@ -24,10 +27,13 @@ After producing the output for each step, Claude must execute the following revi
 📝 Quality Self-Check:
 - ✅ [Check item]
 - ✅ [Check item]
-- ❌ [Check item] → Improvement direction: [specific explanation]
-⚠️ Weakest aspect: [description] → Strengthening suggestion: [specific action]
+- ✅ [Check item]
+- ❌ [Check item] → Content gap: [specific] → Downstream impact: [which step/artifact this blocks] → Improvement direction: [specific action]
+⚠️ Supplementary note (optional): [additional context — does NOT replace the ❌ above]
 ```
+**Self-check on the self-check**: if your output has no ❌, go back to Step 2 and redo.
 ---
 ## Review Criteria (By Step)