product-playbook 1.2.2 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/.claude-plugin/marketplace.json +1 -1
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/README.es.md +31 -0
  4. package/README.ja.md +31 -0
  5. package/README.ko.md +31 -0
  6. package/README.md +31 -0
  7. package/README.zh-CN.md +31 -0
  8. package/README.zh-TW.md +31 -0
  9. package/SKILL.md +61 -0
  10. package/commands/product-feature.md +2 -0
  11. package/i18n/en/SKILL.md +61 -0
  12. package/i18n/en/commands/product-feature.md +2 -0
  13. package/i18n/en/references/02b-jtbd.md +19 -7
  14. package/i18n/en/references/rules-context.md +10 -1
  15. package/i18n/en/references/rules-quality-review.md +13 -7
  16. package/i18n/es/SKILL.md +61 -0
  17. package/i18n/es/commands/product-feature.md +2 -0
  18. package/i18n/es/references/02b-jtbd.md +24 -7
  19. package/i18n/es/references/rules-context.md +10 -1
  20. package/i18n/es/references/rules-quality-review.md +12 -7
  21. package/i18n/ja/SKILL.md +61 -0
  22. package/i18n/ja/commands/product-feature.md +2 -0
  23. package/i18n/ja/references/02b-jtbd.md +24 -7
  24. package/i18n/ja/references/rules-context.md +10 -1
  25. package/i18n/ja/references/rules-quality-review.md +12 -7
  26. package/i18n/ko/SKILL.md +61 -0
  27. package/i18n/ko/commands/product-feature.md +2 -0
  28. package/i18n/ko/references/02b-jtbd.md +24 -7
  29. package/i18n/ko/references/rules-context.md +10 -1
  30. package/i18n/ko/references/rules-quality-review.md +12 -7
  31. package/i18n/zh-CN/SKILL.md +61 -0
  32. package/i18n/zh-CN/commands/product-feature.md +2 -0
  33. package/i18n/zh-CN/references/02b-jtbd.md +24 -7
  34. package/i18n/zh-CN/references/rules-context.md +10 -1
  35. package/i18n/zh-CN/references/rules-quality-review.md +12 -7
  36. package/i18n/zh-TW/SKILL.md +61 -0
  37. package/i18n/zh-TW/commands/product-feature.md +10 -8
  38. package/i18n/zh-TW/references/02b-jtbd.md +24 -7
  39. package/i18n/zh-TW/references/rules-context.md +10 -1
  40. package/i18n/zh-TW/references/rules-quality-review.md +12 -7
  41. package/package.json +1 -1
  42. package/references/02b-jtbd.md +24 -7
  43. package/references/rules-context.md +10 -1
  44. package/references/rules-quality-review.md +12 -7
@@ -7,7 +7,7 @@
7
7
  {
8
8
  "name": "product-playbook",
9
9
  "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
10
- "version": "1.2.2",
10
+ "version": "1.2.4",
11
11
  "source": "./."
12
12
  }
13
13
  ]
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "product-playbook",
3
3
  "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
4
- "version": "1.2.2",
4
+ "version": "1.2.4",
5
5
  "author": {
6
6
  "name": "Charles Chen"
7
7
  },
package/README.es.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook es un **Skill de Claude AI** que te guía sistemáticamente
21
21
 
22
22
  - 🧭 **6 modos de ejecución** — desde validación rápida en 30 minutos hasta planes de producto completos (incluyendo una ruta rápida de expansión de funcionalidades)
23
23
  - 📐 **22 frameworks de producto** — cubriendo toda la cadena Discovery → Define → Develop → Deliver
24
+ - 🤝 **3 sub-agentes especialistas** — Discovery, Crítica de Estrategia y Pre-mortem operan como ventanas de contexto aisladas con experiencia específica de framework
24
25
  - 🔄 **Motor de propagación de cambios** — modifica cualquier paso y todos los outputs downstream se actualizan automáticamente
25
26
  - 📎 **Integración inteligente de archivos** — sube datos, capturas de pantalla o documentos; la IA los integra automáticamente en el paso relevante
26
27
  - 🔗 **Handoff de desarrollo** — genera CLAUDE.md + TASKS.md + TICKETS.md para un handoff fluido al desarrollo en Claude Code
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — Generar PRD
156
157
  │ ├── product-report.md # /product-report — Generar reporte HTML
157
158
  │ └── product-dev.md # /product-dev — Generar paquete de handoff de desarrollo
159
+ ├── agents/ # Sub-agentes especialistas (cargados automáticamente por el plugin de Claude Code)
160
+ │ ├── discovery-specialist.md # Especialista en Persona / JTBD / OST / Journey Map
161
+ │ ├── strategy-critic.md # Crítico de estrategia con la lente de Rumelt
162
+ │ └── pre-mortem-runner.md # 15+ escenarios de fallo + indicadores adelantados
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # Evaluación de oportunidad + Modelo DHM
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Comparando la calidad de respuesta entre "con guía del Skill" y "sin guía del
418
423
 
419
424
  > Ver [`evals/`](./evals/) para metodología detallada y datos.
420
425
 
426
+ ### Iteración 5: Comparación A/B de Sub-agent (3 evaluaciones relevantes a despacho × 22 expectativas)
427
+
428
+ Una corrida A/B enfocada que mide la contribución marginal de calidad de los 3 sub-agents especialistas (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) introducidos en v1.2.0+. Misma versión del skill (v1.2.3), mismos prompts, dos brazos:
429
+
430
+ - **CON sub-agent**: el executor lee el archivo `agents/*.md` correspondiente y sigue el esquema de salida declarado por el especialista + autoverificaciones; el despacho se marca en la respuesta.
431
+ - **SIN sub-agent**: el executor tiene prohibido leer cualquier `agents/*.md` o mencionar la delegación; debe manejar el paso inline como orchestrator usando sólo `SKILL.md` + `commands/` + `references/`.
432
+
433
+ | Evaluación | Con Sub-agent | Sin Sub-agent | Delta |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (evaluación de riesgo en Build Mode)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ El consumo de tokens es prácticamente idéntico en ambos brazos (151K vs 154K) — mantener un especialista no cuesta más que manejar el paso inline.
441
+
442
+ **Hallazgos Clave**
443
+
444
+ - **Pre-mortem-runner es load-bearing** (+77.8%): sin él, el orchestrator produce una lista de riesgos delgada y en tiempo futuro, perdiendo el conteo de escenarios (≥15), cobertura de 5 categorías, disciplina de leading indicator, experimentos pre-launch de bajo costo, y el marco narrativo de "lanzó y falló" en pasado. El esquema estructurado del especialista hace trabajo real que `references/` por sí solo no reproduce.
445
+ - **Discovery-specialist y strategy-critic son contribuidores moderados** (+14–17%): el orchestrator puede producir análisis razonables de Persona+JTBD y críticas de estrategia inline. El único assertion divergente entre brazos es el contrato de despacho mismo, no la calidad estructural.
446
+ - **Implicación**: de los 3 especialistas, pre-mortem-runner ofrece el mayor lift de calidad solo y es el más justificado por estos resultados. Los otros dos podrían en principio plegarse de vuelta al orchestrator con páginas de referencia más fuertes, aunque no hay incentivo de costo para hacerlo (los tokens son iguales).
447
+
448
+ **Advertencia del harness**: el executor `general-purpose` usado en este harness de eval no expone despacho `Task` anidado, por lo que el brazo CON aproxima el despacho real leyendo el `agents/*.md` del especialista y siguiendo su esquema inline (con un marcador de despacho explícito). El contraste estructural vs SIN es real, pero se necesitaría una corrida top-session para verificar de extremo a extremo la calidad del despacho via Task tool.
449
+
450
+ > Artefactos crudos y divergencia por assertion en [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 Comandos Disponibles
package/README.ja.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbookは、ゼロから一まで体系的にプロダクト企画
21
21
 
22
22
  - 🧭 **6つの実行モード** — 30分の迅速な検証からフルスケールのプロダクト企画まで(機能拡張ファストトラックを含む)
23
23
  - 📐 **22のプロダクトフレームワーク** — Discovery → Define → Develop → Deliverの全パイプラインをカバー
24
+ - 🤝 **3つの専門サブエージェント** — Discovery、戦略批評、Pre-mortem が独立した context window で動作し、フレームワーク固有の専門性を持つ
24
25
  - 🔄 **変更伝播エンジン** — 任意のステップを修正すると下流の全出力が自動更新
25
26
  - 📎 **スマートファイル統合** — データ、スクリーンショット、ドキュメントをアップロードするとAIが関連ステップに自動統合
26
27
  - 🔗 **開発ハンドオフ** — CLAUDE.md + TASKS.md + TICKETS.mdを生成してClaude Code開発にシームレスに接続
@@ -156,6 +157,10 @@ product-playbook/
156
157
  │ ├── product-prd.md # /product-prd — PRD生成
157
158
  │ ├── product-report.md # /product-report — HTMLレポート生成
158
159
  │ └── product-dev.md # /product-dev — 開発ハンドオフパッケージ生成
160
+ ├── agents/ # 専門サブエージェント(Claude Code プラグインが自動読み込み)
161
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map スペシャリスト
162
+ │ ├── strategy-critic.md # Rumelt 視点の戦略批評者
163
+ │ └── pre-mortem-runner.md # 15+ の失敗シナリオ + リーディングインジケーター
159
164
  └── references/
160
165
  ├── 00-opportunity-check.md # 機会評価 + DHMモデル
161
166
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -419,6 +424,32 @@ Claude Codeは自動的に:
419
424
 
420
425
  > 詳細な方法論とデータは[`evals/`](./evals/)を参照。
421
426
 
427
+ ### イテレーション5:Sub-agent A/B 比較(ディスパッチ関連3評価 × 22期待値)
428
+
429
+ v1.2.0+ で導入された3つの専門 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)の品質への限界貢献を測定する集中 A/B 評価。同じスキル版(v1.2.3)、同じプロンプト、2つの arm:
430
+
431
+ - **Sub-agent あり**:executor は該当する `agents/*.md` を読み、専門エージェントが宣言する出力スキーマと自己チェックに従う。レスポンス内に dispatch マーカーを記録。
432
+ - **Sub-agent なし**:executor は `agents/*.md` を一切読まず、delegation に言及しない。`SKILL.md` + `commands/` + `references/` のみを使い、orchestrator が inline で処理する。
433
+
434
+ | 評価項目 | Sub-agent あり | Sub-agent なし | 差分 |
435
+ |-----------|:--------:|:------------:|:-----:|
436
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
437
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
438
+ | **Pre-mortem(Build Mode リスク評価)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
439
+ | **合計** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
440
+
441
+ 両 arm の token 消費はほぼ同じ(151K vs 154K)— 専門エージェントを保持することは inline 処理より高くはならない。
442
+
443
+ **主要な発見**
444
+
445
+ - **Pre-mortem-runner は load-bearing**(+77.8%):これがないと、orchestrator は薄く未来形のリスクリストしか生成できず、シナリオ数(≥15)、5カテゴリーのカバレッジ、leading indicator の規律、低コスト pre-launch 実験、過去形「出荷して失敗した」のナラティブ枠組みを失う。構造化された専門エージェントのスキーマが本当の仕事をしており、`references/` だけでは再構築できない。
446
+ - **Discovery-specialist と strategy-critic は中程度の貢献**(+14–17%):orchestrator 単独でも Persona+JTBD と戦略批評を妥当なレベルで処理できる。両 arm で分岐する assertion は dispatch コントラクト自体であり、構造的品質ではない。
447
+ - **含意**:3つの専門のうち、pre-mortem-runner が単独での品質向上が最大で、最も保持を正当化される。他の2つは原理的には強化された reference ページで orchestrator に折り返せるが、token コストが同じなので削減誘因はない。
448
+
449
+ **ハーネスの注意**:この評価環境の `general-purpose` executor は nested `Task` を公開しないため、「Sub-agent あり」 arm は「専門エージェントの `agents/*.md` を読む + dispatch マーカー + スキーマを inline で遵守」で実際の dispatch を近似する。構造的対比は実物だが、エンドツーエンドの Task ツール dispatch を完全に検証するには top-session 実行が必要。
450
+
451
+ > 生の成果物と assertion ごとの分岐は [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) を参照。
452
+
422
453
  ---
423
454
 
424
455
  ## 💬 利用可能なコマンド
package/README.ko.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook은 제로부터 원까지 제품 기획 전 과정을 체
21
21
 
22
22
  - 🧭 **6가지 실행 모드** — 30분 빠른 검증부터 전체 제품 기획까지 (기능 확장 빠른 트랙 포함)
23
23
  - 📐 **22개 제품 프레임워크** — Discovery → Define → Develop → Deliver 전체 파이프라인 커버
24
+ - 🤝 **3개 전문 서브에이전트** — Discovery, 전략 비평, Pre-mortem이 격리된 context window에서 작동하며 프레임워크별 전문성을 보유
24
25
  - 🔄 **변경 전파 엔진** — 어떤 단계든 수정하면 모든 하위 산출물이 자동 업데이트
25
26
  - 📎 **스마트 파일 통합** — 데이터, 스크린샷, 문서를 업로드하면 AI가 해당 단계에 자동 통합
26
27
  - 🔗 **개발 핸드오프** — CLAUDE.md + TASKS.md + TICKETS.md를 생성하여 Claude Code 개발로 원활하게 연결
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — PRD 생성
156
157
  │ ├── product-report.md # /product-report — HTML 보고서 생성
157
158
  │ └── product-dev.md # /product-dev — 개발 핸드오프 패키지 생성
159
+ ├── agents/ # 전문 서브에이전트 (Claude Code 플러그인이 자동 로드)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 스페셜리스트
161
+ │ ├── strategy-critic.md # Rumelt 관점의 전략 비평가
162
+ │ └── pre-mortem-runner.md # 15+ 실패 시나리오 + 선행 지표
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 기회 평가 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code가 자동으로:
418
423
 
419
424
  > 상세한 방법론과 데이터는 [`evals/`](./evals/)를 참조하세요.
420
425
 
426
+ ### 반복 5: Sub-agent A/B 비교 (디스패치 관련 3개 평가 × 22개 기대값)
427
+
428
+ v1.2.0+ 에서 도입된 3개의 전문 sub-agent (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) 의 품질에 대한 한계 기여를 측정하는 집중 A/B 평가. 동일 스킬 버전(v1.2.3), 동일 프롬프트, 2개 arm:
429
+
430
+ - **Sub-agent 있음**: executor 가 해당 `agents/*.md` 파일을 읽고, 전문가가 선언한 출력 스키마와 자체 점검을 따름. 응답에 dispatch 마커 기록.
431
+ - **Sub-agent 없음**: executor 는 어떤 `agents/*.md` 도 읽지 못하며, delegation 을 언급하지 못함. `SKILL.md` + `commands/` + `references/` 만 사용하여 orchestrator 가 inline 으로 처리.
432
+
433
+ | 평가 항목 | Sub-agent 있음 | Sub-agent 없음 | 차이 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (Build Mode 위험 평가)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **합계** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ 두 arm 의 token 소비는 거의 동일함 (151K vs 154K) — 전문가를 유지하는 것이 inline 처리보다 더 비싸지 않음.
441
+
442
+ **핵심 발견**
443
+
444
+ - **Pre-mortem-runner 가 load-bearing** (+77.8%): 이것이 없으면 orchestrator 는 얇고 미래형인 위험 리스트만 생성하며, 시나리오 수 (≥15), 5개 카테고리 커버리지, leading indicator 규율, 저비용 pre-launch 실험, 과거형 "출시 후 실패" 내러티브 프레임을 놓침. 구조화된 전문가 스키마가 실제로 일을 하고 있으며, `references/` 만으로는 재구성할 수 없음.
445
+ - **Discovery-specialist 와 strategy-critic 은 중간 기여자** (+14–17%): orchestrator 자체만으로도 Persona+JTBD 와 전략 비평을 합리적 수준에서 처리할 수 있음. 두 arm 에서 분기하는 유일한 assertion 은 dispatch 계약 자체이며, 구조적 품질이 아님.
446
+ - **함의**: 3개 전문가 중 pre-mortem-runner 가 단독 품질 향상이 가장 크고 보존이 가장 정당화됨. 다른 2개는 원칙적으로 강화된 reference 페이지로 orchestrator 에 통합할 수 있지만, token 비용이 동일하므로 축소 동기는 없음.
447
+
448
+ **Harness 주의사항**: 이 평가 환경의 `general-purpose` executor 는 nested `Task` 를 노출하지 않으므로, "Sub-agent 있음" arm 은 "전문가 `agents/*.md` 읽기 + dispatch 마커 + 스키마를 inline 으로 준수" 로 실제 dispatch 를 근사함. 구조적 대조는 실제이지만, 엔드투엔드 Task 도구 dispatch 를 완전히 검증하려면 top-session 실행이 필요함.
449
+
450
+ > 원시 artifacts 와 assertion 별 분기는 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) 참조.
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 사용 가능한 명령
package/README.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook is a **Claude AI Skill** that systematically guides you thr
21
21
 
22
22
  - 🧭 **6 execution modes** — from 30-minute rapid validation to full-blown product plans (including a feature expansion fast track)
23
23
  - 📐 **22 product frameworks** — covering the entire Discovery → Define → Develop → Deliver pipeline
24
+ - 🤝 **3 specialist sub-agents** — Discovery, Strategy Critique, and Pre-mortem run as isolated context windows with framework-specific expertise
24
25
  - 🔄 **Change propagation engine** — modify any step and all downstream outputs update automatically
25
26
  - 📎 **Smart file integration** — upload data, screenshots, or documents; the AI automatically integrates them into the relevant step
26
27
  - 🔗 **Dev handoff** — generates CLAUDE.md + TASKS.md + TICKETS.md for seamless handoff to Claude Code development
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — Generate PRD
156
157
  │ ├── product-report.md # /product-report — Generate HTML report
157
158
  │ └── product-dev.md # /product-dev — Generate dev handoff package
159
+ ├── agents/ # Specialist sub-agents (auto-loaded by Claude Code plugin)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map specialist
161
+ │ ├── strategy-critic.md # Rumelt-lens strategy critic
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # Opportunity assessment + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ By comparing response quality between "with Skill guidance" and "without Skill g
418
423
 
419
424
  > See [`evals/`](./evals/) for detailed methodology and data.
420
425
 
426
+ ### Iteration 5: Sub-agent A/B Comparison (3 dispatch-relevant evals × 22 expectations)
427
+
428
+ A focused A/B run measuring the marginal quality contribution of the 3 specialist sub-agents (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) shipped in v1.2.0+. Same skill version (v1.2.3), same prompts, two arms:
429
+
430
+ - **WITH sub-agent**: executor reads the specialist's `agents/*.md` file and follows its declared output schema + self-checks; dispatch is marked in the response.
431
+ - **WITHOUT sub-agent**: executor is forbidden from reading any `agents/*.md` or mentioning delegation; must handle the step inline as the orchestrator using only `SKILL.md` + `commands/` + `references/`.
432
+
433
+ | Eval | With Sub-agent | Without Sub-agent | Delta |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (Build Mode risk)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ Token cost is essentially identical across arms (151K vs 154K) — keeping a specialist costs no more than handling the step inline.
441
+
442
+ **Key Findings**
443
+
444
+ - **Pre-mortem-runner is load-bearing** (+77.8%): without it the orchestrator produces a thin, future-tense risk list and misses scenario count (≥15), 5-category coverage, leading-indicator discipline, cheap pre-launch experiments, and past-tense "shipped-and-failed" framing. The structured specialist schema is doing real work that `references/` alone does not reproduce.
445
+ - **Discovery-specialist and strategy-critic are modest contributors** (+14–17%): the orchestrator can produce reasonable Persona+JTBD analyses and strategy critiques inline. The diverging assertion in each case is the dispatch contract itself, not the structural quality.
446
+ - **Implication**: of the 3 specialists, the pre-mortem-runner gives the largest standalone quality lift and is the most justified by these results. The other two could in principle be folded back into the orchestrator with stronger reference pages, though there is no cost incentive to do so (tokens are a wash).
447
+
448
+ **Harness caveat**: the `general-purpose` executor used in this eval harness does not expose nested `Task` dispatch, so the WITH arm approximates real dispatch by reading the specialist's `agents/*.md` and following its schema inline (with an explicit dispatch marker). The structural contrast vs WITHOUT is real, but a true top-session run would be needed to verify end-to-end Task-tool dispatch quality.
449
+
450
+ > Raw artifacts and per-assertion divergence in [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 Available Commands
package/README.zh-CN.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook 是一个 **Claude AI Skill**,能够系统性地引导你
21
21
 
22
22
  - 🧭 **6 种执行模式** — 从 30 分钟快速验证到完整企划(含功能扩充快速路径)
23
23
  - 📐 **22 个产品框架** — 涵盖 Discovery → Define → Develop → Deliver 全流程
24
+ - 🤝 **3 个专家 sub-agent** — Discovery、策略批判、Pre-mortem 在独立 context window 中运作,各自携带专属框架专业
24
25
  - 🔄 **变更传播引擎** — 修改任何步骤,自动更新所有下游产出
25
26
  - 📎 **文件智慧整合** — 上传数据、截图、文件,AI 自动整合到对应步骤
26
27
  - 🔗 **开发衔接** — 产出 CLAUDE.md + TASKS.md + TICKETS.md,无缝衔接 Claude Code 开发
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — 产出 PRD
156
157
  │ ├── product-report.md # /product-report — 产出 HTML 报告
157
158
  │ └── product-dev.md # /product-dev — 产出开发交接包
159
+ ├── agents/ # 专家 sub-agent(Claude Code plugin 自动加载)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 专家
161
+ │ ├── strategy-critic.md # Rumelt 视角的策略批判者
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 机会评估 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 会自动:
418
423
 
419
424
  > 详细评测方法与数据见 [`evals/`](./evals/) 目录。
420
425
 
426
+ ### Iteration 5:Sub-agent A/B 对照(3 个专家相关评测 × 22 个期望值)
427
+
428
+ 针对 v1.2.0+ 推出的 3 个专家 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)所做的聚焦 A/B 测试,量化它们在品质上的边际贡献。相同 skill 版本(v1.2.3)、相同 prompt、两个 arm:
429
+
430
+ - **有 Sub-agent**:executor 可读取对应的 `agents/*.md`,并遵循该专家声明的输出 schema 与自检;回应中标记 dispatch。
431
+ - **无 Sub-agent**:executor 不得读取任何 `agents/*.md`,不得提及 delegation;只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 处理。
432
+
433
+ | 评测项目 | 有 Sub-agent | 无 Sub-agent | 差异 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
436
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
437
+ | **Pre-mortem(Build Mode 风险评估)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
438
+ | **总计** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
439
+
440
+ 两个 arm 的 token 消耗几乎相同(151K vs 154K)——保留专家不会比 inline 处理更贵。
441
+
442
+ **关键发现**
443
+
444
+ - **Pre-mortem-runner 是 load-bearing**(+77.8%):少了它,orchestrator 只能产出单薄、未来式的风险清单,缺失 scenario 数量(≥15)、五类别覆盖、leading-indicator 纪律、低成本上线前实验、以及过去式「已上线且失败」叙事框架。结构化的专家 schema 在做真正的工作,光看 `references/` 无法重建。
445
+ - **Discovery-specialist 与 strategy-critic 属于中度贡献**(+14–17%):orchestrator 自己处理 Persona+JTBD 与策略批判已可达合理水准。两个 arm 唯一分歧的 assertion 是 dispatch 契约本身,而非结构性品质。
446
+ - **意涵**:3 个专家中,pre-mortem-runner 对品质提升的贡献最大、最值得保留;另外两个原则上可以靠加强 reference 文件 fold 回 orchestrator,但因为 token 成本相同,没有减量诱因。
447
+
448
+ **Harness 警语**:此评测环境的 `general-purpose` executor 并未暴露 nested `Task`,因此「有 Sub-agent」arm 是以「读取专家 `agents/*.md` + 标记 dispatch + 遵循 schema inline」近似真实 dispatch。结构性对比是真的,但要完全验证端到端 Task 工具 dispatch 还需要 top-session 测试。
449
+
450
+ > 原始 artifacts 与每项 assertion 分歧详见 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 可用指令一览
package/README.zh-TW.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook 是一個 **Claude AI Skill**,能夠系統性地引導你
21
21
 
22
22
  - 🧭 **6 種執行模式** — 從 30 分鐘快速驗證到完整企劃(含功能擴充快速路徑)
23
23
  - 📐 **22 個產品框架** — 涵蓋 Discovery → Define → Develop → Deliver 全流程
24
+ - 🤝 **3 個專家 sub-agent** — Discovery、策略批判、Pre-mortem 在獨立 context window 中運作,各自攜帶專屬框架專業
24
25
  - 🔄 **變更傳播引擎** — 修改任何步驟,自動更新所有下游產出
25
26
  - 📎 **檔案智慧整合** — 上傳數據、截圖、文件,AI 自動整合到對應步驟
26
27
  - 🔗 **開發銜接** — 產出 CLAUDE.md + TASKS.md + TICKETS.md,無縫銜接 Claude Code 開發
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — 產出 PRD
156
157
  │ ├── product-report.md # /product-report — 產出 HTML 報告
157
158
  │ └── product-dev.md # /product-dev — 產出開發交接包
159
+ ├── agents/ # 專家 sub-agent(Claude Code plugin 自動載入)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 專家
161
+ │ ├── strategy-critic.md # Rumelt 視角的策略批判者
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 機會評估 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 會自動:
418
423
 
419
424
  > 詳細評測方法與數據見 [`evals/`](./evals/) 目錄。
420
425
 
426
+ ### Iteration 5:Sub-agent A/B 對照(3 個專家相關評測 × 22 個期望值)
427
+
428
+ 針對 v1.2.0+ 推出的 3 個專家 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)所做的聚焦 A/B 測試,量化它們在品質上的邊際貢獻。相同 skill 版本(v1.2.3)、相同 prompt、兩個 arm:
429
+
430
+ - **有 Sub-agent**:executor 可讀取對應的 `agents/*.md`,並遵循該專家宣告的輸出 schema 與自檢;回應中標記 dispatch。
431
+ - **無 Sub-agent**:executor 不得讀取任何 `agents/*.md`,不得提及 delegation;只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 處理。
432
+
433
+ | 評測項目 | 有 Sub-agent | 無 Sub-agent | 差異 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
436
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
437
+ | **Pre-mortem(Build Mode 風險評估)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
438
+ | **總計** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
439
+
440
+ 兩個 arm 的 token 消耗幾乎相同(151K vs 154K)——保留專家不會比 inline 處理更貴。
441
+
442
+ **關鍵發現**
443
+
444
+ - **Pre-mortem-runner 是 load-bearing**(+77.8%):少了它,orchestrator 只能產出單薄、未來式的風險清單,缺失 scenario 數量(≥15)、五類別覆蓋、leading-indicator 紀律、低成本上線前實驗、以及過去式「已上線且失敗」敘事框架。結構化的專家 schema 在做真正的工作,光看 `references/` 無法重建。
445
+ - **Discovery-specialist 與 strategy-critic 屬於中度貢獻**(+14–17%):orchestrator 自己處理 Persona+JTBD 與策略批判已可達合理水準。兩個 arm 唯一分歧的 assertion 是 dispatch 契約本身,而非結構性品質。
446
+ - **意涵**:3 個專家中,pre-mortem-runner 對品質提升的貢獻最大、最值得保留;另外兩個原則上可以靠加強 reference 文件 fold 回 orchestrator,但因為 token 成本相同,沒有減量誘因。
447
+
448
+ **Harness 警語**:此評測環境的 `general-purpose` executor 並未暴露 nested `Task`,因此「有 Sub-agent」arm 是以「讀取專家 `agents/*.md` + 標記 dispatch + 遵循 schema inline」近似真實 dispatch。結構性對比是真的,但要完全驗證端到端 Task 工具 dispatch 還需要 top-session 測試。
449
+
450
+ > 原始 artifacts 與每項 assertion 分歧詳見 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 可用指令一覽
package/SKILL.md CHANGED
@@ -134,6 +134,67 @@ When the user asks to list frameworks or uses supplementary commands, read `refe
134
134
 
135
135
  ---
136
136
 
137
+ ## 🤝 Sub-Agent Delegation Rules
138
+
139
+ The Product Playbook ships with three specialist subagents that operate in isolated context windows. Delegate to them at the right step rather than handling everything in this main agent's context — specialists produce sharper output because they carry only the framework knowledge they need.
140
+
141
+ ### When to delegate to `discovery-specialist`
142
+
143
+ Delegate at these steps:
144
+
145
+ - **Full Mode**: S2 (Persona) → S3 (JTBD) → S4 (OST) → S5 (Journey Map) → S6 (Continuous Discovery hypotheses)
146
+ - **Revision Mode**: S2 (current user analysis) → S3 (pain point synthesis) → S4 (opportunity identification)
147
+ - **Build Mode**: S2 (problem clarification with JTBD lens)
148
+ - **Custom Mode**: any step that selects Persona / JTBD / OST / Journey Map / Continuous Discovery
149
+
150
+ How to invoke:
151
+
152
+ > Use the `discovery-specialist` subagent to produce [Persona | JTBD | OST | Journey Map] for [product description]. Target audience: [B2C / B2B / B2B2C]. Available research data: [list uploaded files, or "none — flag low confidence"]. Reply in [language].
153
+
154
+ Integrate the returned YAML into the current step's output. Surface the specialist's `open_questions` to the user as part of the step's confirmation prompt.
155
+
156
+ ### When to delegate to `strategy-critic`
157
+
158
+ Delegate **immediately after** the user finalises any strategy artifact:
159
+
160
+ - After Strategy Blocks completion (Full Mode S7)
161
+ - After Rumelt Good Strategy Kernel completion (Full Mode S8)
162
+ - After DHM Model completion (Full Mode S9)
163
+ - After Empowered Teams charter (any mode that includes it)
164
+ - Any time the user writes "this is our strategy" in plain prose without a named framework
165
+
166
+ How to invoke:
167
+
168
+ > Use the `strategy-critic` subagent to critique the following strategy artifact: [paste verbatim]. The artifact is [framework name or "generic strategy doc"]. Reply in [language].
169
+
170
+ The critic returns critiques, not rewrites. Present the critic's `three_questions_to_ask_the_writer` to the user verbatim. Do not soften them. If the user revises in response, re-invoke the critic on the revised version.
171
+
172
+ ### When to delegate to `pre-mortem-runner`
173
+
174
+ Delegate at these steps:
175
+
176
+ - **Full Mode**: S10 (after MVP scoping is complete)
177
+ - **Build Mode**: S4 (architecture-grounded pre-mortem)
178
+ - **Revision Mode**: S8
179
+ - **Feature Extension Mode**: S3 (risk assessment)
180
+ - Any time the user explicitly requests pre-mortem / risk analysis / "what could go wrong"
181
+
182
+ How to invoke:
183
+
184
+ > Use the `pre-mortem-runner` subagent to pre-mortem the following [product | feature | strategy]: [paste verbatim]. Mode: [build_mode_architecture_grounded | standard | feature_extension]. If build mode, available architecture context: [paste relevant file contents or summary]. Reply in [language].
185
+
186
+ The runner returns 15+ scenarios. In the user-facing output, lead with the `priority_three` and the `pre_launch_experiments`. Surface the full scenario list in a collapsible section or as an attached file.
187
+
188
+ ### Delegation hygiene
189
+
190
+ 1. **One sub-agent per step**. Do not chain sub-agents in a single turn — let the user confirm intermediate output before invoking the next specialist.
191
+ 2. **Pass language explicitly**. Sub-agents detect language from your prompt; if your prompt is in English but the user is working in 繁體中文, the sub-agent will reply in English. Always specify the user's working language.
192
+ 3. **Respect `status: out_of_scope`**. If a sub-agent refuses a request, take the routing recommendation seriously — the sub-agent's scope refusal is a feature, not a failure.
193
+ 4. **Hard Gate inheritance**. Sub-agents inherit the no-code-during-planning rule. They will refuse to write files or run bash even if you ask them to. This is intentional.
194
+ 5. **Quality self-check still applies**. After integrating sub-agent output into a step, run the existing quality self-check from `references/rules-quality-review.md` — the sub-agent did its own self-check, but the main agent owns the user-facing step output.
195
+
196
+ ---
197
+
137
198
  ## 🔗 Global Rule: Persona-Journey Bundling
138
199
 
139
200
  **Whenever a mode includes a Persona step, Journey Map is included by DEFAULT in the very next step.** Persona defines Who; Journey Map describes the journey Who experiences. This applies equally to 0-to-1 and existing products — the relevant variable is whether the user's Job spans multiple stages, not whether the product already exists. (Teresa Torres, Indi Young, and Amazon Working Backwards all treat Journey Map as essential during 0-to-1.)
@@ -11,3 +11,5 @@ Execution mode: 🔧 Feature Extension Mode
11
11
  Feature description: $ARGUMENTS
12
12
 
13
13
  Follow the Feature Extension step sequence (S1 → S4). Load product context first per rules-context.md. Display a progress indicator at each step.
14
+
15
+ **S0 → S1 sequencing (important)**: If Context Bootstrap (S0) is triggered because `.product-context.md` is missing, you MUST complete Bootstrap and S1 in the **same turn**, then pause **after S1 completion** awaiting user confirmation before S2. Do NOT pause between S0 and S1 — even if some Bootstrap fields are still missing, write a baseline `.product-context.md` with placeholders, enter S1, and ask for the missing fields as part of the S1 confirmation question. See `references/rules-context.md` "Bootstrap 與 S1 的順序" for details.
package/i18n/en/SKILL.md CHANGED
@@ -132,6 +132,67 @@ When product context read/write is triggered, read `references/rules-context.md`
132
132
 
133
133
  ---
134
134
 
135
+ ## 🤝 Sub-Agent Delegation Rules
136
+
137
+ The Product Playbook ships with three specialist subagents that operate in isolated context windows. Delegate to them at the right step rather than handling everything in this main agent's context — specialists produce sharper output because they carry only the framework knowledge they need.
138
+
139
+ ### When to delegate to `discovery-specialist`
140
+
141
+ Delegate at these steps:
142
+
143
+ - **Full Mode**: S2 (Persona) → S3 (JTBD) → S4 (OST) → S5 (Journey Map) → S6 (Continuous Discovery hypotheses)
144
+ - **Revision Mode**: S2 (current user analysis) → S3 (pain point synthesis) → S4 (opportunity identification)
145
+ - **Build Mode**: S2 (problem clarification with JTBD lens)
146
+ - **Custom Mode**: any step that selects Persona / JTBD / OST / Journey Map / Continuous Discovery
147
+
148
+ How to invoke:
149
+
150
+ > Use the `discovery-specialist` subagent to produce [Persona | JTBD | OST | Journey Map] for [product description]. Target audience: [B2C / B2B / B2B2C]. Available research data: [list uploaded files, or "none — flag low confidence"]. Reply in [language].
151
+
152
+ Integrate the returned YAML into the current step's output. Surface the specialist's `open_questions` to the user as part of the step's confirmation prompt.
153
+
154
+ ### When to delegate to `strategy-critic`
155
+
156
+ Delegate **immediately after** the user finalises any strategy artifact:
157
+
158
+ - After Strategy Blocks completion (Full Mode S7)
159
+ - After Rumelt Good Strategy Kernel completion (Full Mode S8)
160
+ - After DHM Model completion (Full Mode S9)
161
+ - After Empowered Teams charter (any mode that includes it)
162
+ - Any time the user writes "this is our strategy" in plain prose without a named framework
163
+
164
+ How to invoke:
165
+
166
+ > Use the `strategy-critic` subagent to critique the following strategy artifact: [paste verbatim]. The artifact is [framework name or "generic strategy doc"]. Reply in [language].
167
+
168
+ The critic returns critiques, not rewrites. Present the critic's `three_questions_to_ask_the_writer` to the user verbatim. Do not soften them. If the user revises in response, re-invoke the critic on the revised version.
169
+
170
+ ### When to delegate to `pre-mortem-runner`
171
+
172
+ Delegate at these steps:
173
+
174
+ - **Full Mode**: S10 (after MVP scoping is complete)
175
+ - **Build Mode**: S4 (architecture-grounded pre-mortem)
176
+ - **Revision Mode**: S8
177
+ - **Feature Extension Mode**: S3 (risk assessment)
178
+ - Any time the user explicitly requests pre-mortem / risk analysis / "what could go wrong"
179
+
180
+ How to invoke:
181
+
182
+ > Use the `pre-mortem-runner` subagent to pre-mortem the following [product | feature | strategy]: [paste verbatim]. Mode: [build_mode_architecture_grounded | standard | feature_extension]. If build mode, available architecture context: [paste relevant file contents or summary]. Reply in [language].
183
+
184
+ The runner returns 15+ scenarios. In the user-facing output, lead with the `priority_three` and the `pre_launch_experiments`. Surface the full scenario list in a collapsible section or as an attached file.
185
+
186
+ ### Delegation hygiene
187
+
188
+ 1. **One sub-agent per step**. Do not chain sub-agents in a single turn — let the user confirm intermediate output before invoking the next specialist.
189
+ 2. **Pass language explicitly**. Sub-agents detect language from your prompt; if your prompt is in English but the user is working in 繁體中文, the sub-agent will reply in English. Always specify the user's working language.
190
+ 3. **Respect `status: out_of_scope`**. If a sub-agent refuses a request, take the routing recommendation seriously — the sub-agent's scope refusal is a feature, not a failure.
191
+ 4. **Hard Gate inheritance**. Sub-agents inherit the no-code-during-planning rule. They will refuse to write files or run bash even if you ask them to. This is intentional.
192
+ 5. **Quality self-check still applies**. After integrating sub-agent output into a step, run the existing quality self-check from `references/rules-quality-review.md` — the sub-agent did its own self-check, but the main agent owns the user-facing step output.
193
+
194
+ ---
195
+
135
196
  ## 🔗 Global Rule: Persona-Journey Bundling
136
197
 
137
198
  **Whenever a mode includes a Persona step, Journey Map is included by DEFAULT in the very next step.** Persona defines Who; Journey Map describes the journey Who experiences. This applies equally to 0-to-1 and existing products — the relevant variable is whether the user's Job spans multiple stages, not whether the product already exists. (Teresa Torres, Indi Young, and Amazon Working Backwards all treat Journey Map as essential during 0-to-1.)
@@ -11,3 +11,5 @@ Execution mode: 🔧 Feature Extension Mode
11
11
  Feature description: $ARGUMENTS
12
12
 
13
13
  Follow the Feature Extension step sequence (S1 → S4). Load product context first per rules-context.md. Display a progress indicator at each step.
14
+
15
+ **S0 → S1 sequencing (important)**: If Context Bootstrap (S0) is triggered because `.product-context.md` is missing, you MUST complete Bootstrap and S1 in the **same turn**, then pause **after S1 completion** awaiting user confirmation before S2. Do NOT pause between S0 and S1 — even if some Bootstrap fields are still missing, write a baseline `.product-context.md` with placeholders, enter S1, and ask for the missing fields as part of the S1 confirmation question. See `references/rules-context.md` "Bootstrap → S1 Sequencing" for details.
@@ -4,17 +4,29 @@
4
4
 
5
5
  > "The unit of analysis is not the consumer, but the job the consumer is trying to get done." — Clayton Christensen
6
6
 
7
- **JTBD Statement Formula:**
7
+ **JTBD Canonical Form (Hard Gate — three-clause structure required):**
8
+
9
+ Every JTBD statement (Primary, Functional, Emotional, Social — every layer) MUST be written as a complete three-clause sentence in the canonical form. All three clauses are required:
10
+
8
11
  ```
9
- [Target customer] + wants to, in [what job context] + get [what job] done
12
+ When [situation], I want to [motivation], so [outcome].
10
13
  ```
11
14
 
12
- Example: A first-time homebuyer comparing mortgage options wants to quickly estimate monthly payments late at night when they can't reach a bank, so they can walk their partner through their financial plan.
15
+ **Failing examples** (fragments inside a table cell, missing clauses):
16
+ - ❌ "Quickly capture key takeaways" (missing When; missing so)
17
+ - ❌ "Jot down ideas during commute" (missing I want to; missing so outcome)
18
+
19
+ **Passing example** (all three clauses present):
20
+ - ✅ "**When** I've just finished reading an article and the key insight is still fresh, **I want to** capture one takeaway in 5 seconds, **so** weeks later I can still find it and connect it to a new idea."
21
+
22
+ Example: **When** comparing mortgage options late at night and can't reach a bank, a first-time homebuyer **wants to** quickly estimate monthly payments, **so** they can walk their partner through their financial plan.
13
23
 
14
24
  **JTBD Four-Type Analysis Table:**
15
25
 
26
+ Every cell (Persona 1 / Persona 2) MUST contain a complete three-clause JTBD sentence. Descriptive phrases without "When / I want to / so" structure are not acceptable.
27
+
16
28
  ```
17
- | JTBD Type | Definition | Persona 1 | Persona 2 |
29
+ | JTBD Type | Definition | Persona 1 (must use "When … I want to … so …" full form) | Persona 2 (same) |
18
30
  |-----------|------------|-----------|-----------|
19
31
  | Functional Job | Completing a specific task or achieving a functional goal | | |
20
32
  | Emotional Job | How they feel or want to feel | | |
@@ -37,17 +49,17 @@ Example: A first-time homebuyer comparing mortgage options wants to quickly esti
37
49
  ### 📝 JTBD Quality Checklist
38
50
 
39
51
  Claude must self-check after producing JTBD output (each item must be marked ✅ or ❌; ❌ items must include how to improve):
52
+ - [ ] Are **all three layers** (Functional / Emotional / Social) written in the full "When … I want to … so …" canonical form? (Any layer missing a clause → mark ❌)
40
53
  - [ ] Does it include a specific context? (Not "anytime, anywhere" — but "late at night when they can't reach a bank")
41
54
  - [ ] Does it focus on a single core job? (Not three jobs crammed into one sentence)
42
- - [ ] Are functional, emotional, and social jobs all identified?
43
55
  - [ ] Can it be used to evaluate "Does this solution actually address this job?"
44
56
  - [ ] Does it include "current workarounds" and "gap"? (Gap = opportunity)
45
57
  - [ ] Does Q5 of the Deep-Dive reach emotional motivation / professional identity / psychological fear? (Not just functional descriptions)
46
58
 
47
59
  **Execution Rules (Hard Gate):**
48
60
  - Must mark each item ✅ or ❌ — blank [ ] or unexplained ✅ lists are not allowed
49
- - If all items are ✅, must additionally state "What is the weakest part of this analysis and how to strengthen it"
50
- - ❌ Common issues: too abstract, too many jobs merged, missing context, substituting product features for job descriptions, Q5 staying at the functional level
61
+ - **The checklist MUST contain at least one ❌** (see `references/rules-quality-review.md` "Mandatory Critique" rule). A ⚠️ warning marker cannot replace ❌; a "weakest aspect" note appended outside the checklist cannot replace a ❌ inside it. If after honest review every item still feels like a pass, lower the bar and find the item most worth marking ❌, then specify how to strengthen it.
62
+ - ❌ Common issues: incomplete three-clause form (missing When / I want to / so), too abstract, too many jobs merged, missing context, substituting product features for job descriptions, Q5 staying at the functional level
51
63
 
52
64
  ---
53
65
 
@@ -134,9 +134,18 @@ Is this correct? Anything to add or correct?
134
134
 
135
135
  Only write to `.product-context.md` after the user confirms.
136
136
 
137
+ ### Bootstrap → S1 Sequencing (Hard Gate — Bootstrap does NOT block the flow)
138
+
139
+ Bootstrap is Step 0; its purpose is to collect baseline context. **Bootstrap itself is not a pause point**:
140
+
141
+ - **Default behavior**: Bootstrap and S1 MUST execute in the **same turn** as S0 → S1. The pause point is fixed at **after S1 completion**, awaiting user confirmation before S2 — not between S0 and S1.
142
+ - **If the user message already provides the required fields** (per Section 7 mode requirements — e.g., Feature Extension requires Identity + Architecture & Tech Stack) → confirm the known fields in a table without re-asking, and proceed directly into S1.
143
+ - **If some fields are missing** → Bootstrap surfaces a "known / pending" table in the same turn, then **immediately enters S1** with placeholders for unconfirmed fields, and folds the pending fields into the **S1 confirmation question** (so the user fills them when confirming S1).
144
+ - **Forbidden**: pausing between S0 and S1 to wait for Round 1 answers. If your response shows S1 still as `⬜ pending` while the flow stops to wait for user input, you have failed this rule.
145
+
137
146
  ### After Bootstrap Completion
138
147
 
139
- Write the collected information to `.product-context.md`, leave uncollected sections empty (using placeholders), then proceed to the mode's formal S1.
148
+ Write the collected information to `.product-context.md` (write a baseline even when some fields are placeholders — they will be overwritten after the user confirms during S1), then enter that mode's S1 in the same turn and produce the S1 content.
140
149
 
141
150
  ---
142
151
 
@@ -10,13 +10,16 @@ After producing the output for each step, Claude must execute the following revi
10
10
 
11
11
  1. Find the quality checklist corresponding to the current step (see the "Review Criteria" section below)
12
12
  2. Mark each item as ✅ or ❌
13
- 3. Items marked ❌ must include a specific explanation of "how to improve"
13
+ 3. Items marked ❌ must include a specific explanation of "how to improve", and must state how the gap will block a downstream step or artifact (e.g., "this blocks PR-FAQ writing because without a concrete scenario the Lead paragraph can't be written")
14
14
 
15
- ### Step 2: Mandatory Critique
15
+ ### Step 2: Mandatory Critique (Hard Gate)
16
16
 
17
- - **Not all items may be ✅**: If all items pass, you must proactively identify "the weakest aspect of this output" and explain how to strengthen it
18
- - This isn't nitpicking it ensures that the self-review doesn't become a rubber stamp
19
- - Following the Amazon PR-FAQ spirit: quality comes from finding problems, not from confirming there are none
17
+ - **At least one item MUST be marked ❌**: a self-check that is all does not pass. Claude must honestly identify at least one item that does not fully meet the bar and mark it
18
+ - ⚠️ **A warning marker (⚠️) does NOT substitute for ❌**: ⚠️ is an auxiliary marker (e.g., on a "weakest aspect" supplementary note) and cannot replace a ❌ inside the checklist itself
19
+ - **No bypass via appendix**: writing the "weakest aspect" as a separate note outside the checklist while keeping every checklist item ✅ counts as failing the self-check
20
+ - ❌ must point at a **substantive content gap**, not surface-level concerns like "formatting could be prettier" or "wording could be tighter"
21
+ - If after honest review every item still feels like a pass, lower the bar and re-review — any planning artifact has some dimension that is weakest and has room to improve; mark that one ❌ and specify the concrete fix
22
+ - Design intent: following the Amazon PR-FAQ critique culture — quality comes from proactively finding problems, not from passively confirming there are none
20
23
 
21
24
  ### Step 3: Presentation Format
22
25
 
@@ -24,10 +27,13 @@ After producing the output for each step, Claude must execute the following revi
24
27
  📝 Quality Self-Check:
25
28
  - ✅ [Check item]
26
29
  - ✅ [Check item]
27
- - [Check item] → Improvement direction: [specific explanation]
28
- ⚠️ Weakest aspect: [description] → Strengthening suggestion: [specific action]
30
+ - [Check item]
31
+ - [Check item] → Content gap: [specific] → Downstream impact: [which step/artifact this blocks] → Improvement direction: [specific action]
32
+ ⚠️ Supplementary note (optional): [additional context — does NOT replace the ❌ above]
29
33
  ```
30
34
 
35
+ **Self-check on the self-check**: if your output has no ❌, go back to Step 2 and redo.
36
+
31
37
  ---
32
38
 
33
39
  ## Review Criteria (By Step)