product-playbook 1.2.2 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/.claude-plugin/marketplace.json +1 -1
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/README.es.md +31 -0
  4. package/README.ja.md +31 -0
  5. package/README.ko.md +31 -0
  6. package/README.md +56 -0
  7. package/README.zh-CN.md +31 -0
  8. package/README.zh-TW.md +31 -0
  9. package/SKILL.md +104 -156
  10. package/commands/product-feature.md +2 -0
  11. package/i18n/en/SKILL.md +104 -156
  12. package/i18n/en/commands/product-feature.md +2 -0
  13. package/i18n/en/references/02b-jtbd.md +19 -7
  14. package/i18n/en/references/rules-context-template.md +177 -0
  15. package/i18n/en/references/rules-context.md +74 -242
  16. package/i18n/en/references/rules-quality-review.md +26 -141
  17. package/i18n/en/references/rules-subagent-dispatch.md +61 -0
  18. package/i18n/es/SKILL.md +103 -155
  19. package/i18n/es/commands/product-feature.md +2 -0
  20. package/i18n/es/references/02b-jtbd.md +24 -7
  21. package/i18n/es/references/rules-context-template.md +177 -0
  22. package/i18n/es/references/rules-context.md +70 -238
  23. package/i18n/es/references/rules-quality-review.md +25 -140
  24. package/i18n/es/references/rules-subagent-dispatch.md +61 -0
  25. package/i18n/ja/SKILL.md +106 -158
  26. package/i18n/ja/commands/product-feature.md +2 -0
  27. package/i18n/ja/references/02b-jtbd.md +24 -7
  28. package/i18n/ja/references/rules-context-template.md +177 -0
  29. package/i18n/ja/references/rules-context.md +74 -242
  30. package/i18n/ja/references/rules-quality-review.md +27 -142
  31. package/i18n/ja/references/rules-subagent-dispatch.md +61 -0
  32. package/i18n/ko/SKILL.md +99 -151
  33. package/i18n/ko/commands/product-feature.md +2 -0
  34. package/i18n/ko/references/02b-jtbd.md +24 -7
  35. package/i18n/ko/references/rules-context-template.md +177 -0
  36. package/i18n/ko/references/rules-context.md +72 -240
  37. package/i18n/ko/references/rules-quality-review.md +24 -139
  38. package/i18n/ko/references/rules-subagent-dispatch.md +61 -0
  39. package/i18n/zh-CN/SKILL.md +96 -148
  40. package/i18n/zh-CN/commands/product-feature.md +2 -0
  41. package/i18n/zh-CN/references/02b-jtbd.md +24 -7
  42. package/i18n/zh-CN/references/rules-context-template.md +177 -0
  43. package/i18n/zh-CN/references/rules-context.md +75 -243
  44. package/i18n/zh-CN/references/rules-quality-review.md +24 -139
  45. package/i18n/zh-CN/references/rules-subagent-dispatch.md +61 -0
  46. package/i18n/zh-TW/SKILL.md +80 -132
  47. package/i18n/zh-TW/commands/product-feature.md +10 -8
  48. package/i18n/zh-TW/references/02b-jtbd.md +24 -7
  49. package/i18n/zh-TW/references/rules-context-template.md +177 -0
  50. package/i18n/zh-TW/references/rules-context.md +62 -230
  51. package/i18n/zh-TW/references/rules-quality-review.md +25 -140
  52. package/i18n/zh-TW/references/rules-subagent-dispatch.md +64 -0
  53. package/package.json +1 -1
  54. package/references/02b-jtbd.md +24 -7
  55. package/references/rules-context-template.md +177 -0
  56. package/references/rules-context.md +80 -248
  57. package/references/rules-quality-review.md +27 -142
  58. package/references/rules-subagent-dispatch.md +61 -0
@@ -7,7 +7,7 @@
7
7
  {
8
8
  "name": "product-playbook",
9
9
  "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
10
- "version": "1.2.2",
10
+ "version": "1.2.5",
11
11
  "source": "./."
12
12
  }
13
13
  ]
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "product-playbook",
3
3
  "description": "MUST use when user wants to plan or strategize a product/feature. 22 PM frameworks, 6 modes, multi-language, from idea to dev handoff",
4
- "version": "1.2.2",
4
+ "version": "1.2.5",
5
5
  "author": {
6
6
  "name": "Charles Chen"
7
7
  },
package/README.es.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook es un **Skill de Claude AI** que te guía sistemáticamente
21
21
 
22
22
  - 🧭 **6 modos de ejecución** — desde validación rápida en 30 minutos hasta planes de producto completos (incluyendo una ruta rápida de expansión de funcionalidades)
23
23
  - 📐 **22 frameworks de producto** — cubriendo toda la cadena Discovery → Define → Develop → Deliver
24
+ - 🤝 **3 sub-agentes especialistas** — Discovery, Crítica de Estrategia y Pre-mortem operan como ventanas de contexto aisladas con experiencia específica de framework
24
25
  - 🔄 **Motor de propagación de cambios** — modifica cualquier paso y todos los outputs downstream se actualizan automáticamente
25
26
  - 📎 **Integración inteligente de archivos** — sube datos, capturas de pantalla o documentos; la IA los integra automáticamente en el paso relevante
26
27
  - 🔗 **Handoff de desarrollo** — genera CLAUDE.md + TASKS.md + TICKETS.md para un handoff fluido al desarrollo en Claude Code
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — Generar PRD
156
157
  │ ├── product-report.md # /product-report — Generar reporte HTML
157
158
  │ └── product-dev.md # /product-dev — Generar paquete de handoff de desarrollo
159
+ ├── agents/ # Sub-agentes especialistas (cargados automáticamente por el plugin de Claude Code)
160
+ │ ├── discovery-specialist.md # Especialista en Persona / JTBD / OST / Journey Map
161
+ │ ├── strategy-critic.md # Crítico de estrategia con la lente de Rumelt
162
+ │ └── pre-mortem-runner.md # 15+ escenarios de fallo + indicadores adelantados
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # Evaluación de oportunidad + Modelo DHM
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Comparando la calidad de respuesta entre "con guía del Skill" y "sin guía del
418
423
 
419
424
  > Ver [`evals/`](./evals/) para metodología detallada y datos.
420
425
 
426
+ ### Iteración 5: Comparación A/B de Sub-agent (3 evaluaciones relevantes a despacho × 22 expectativas)
427
+
428
+ Una corrida A/B enfocada que mide la contribución marginal de calidad de los 3 sub-agents especialistas (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) introducidos en v1.2.0+. Misma versión del skill (v1.2.3), mismos prompts, dos brazos:
429
+
430
+ - **CON sub-agent**: el executor lee el archivo `agents/*.md` correspondiente y sigue el esquema de salida declarado por el especialista + autoverificaciones; el despacho se marca en la respuesta.
431
+ - **SIN sub-agent**: el executor tiene prohibido leer cualquier `agents/*.md` o mencionar la delegación; debe manejar el paso inline como orchestrator usando sólo `SKILL.md` + `commands/` + `references/`.
432
+
433
+ | Evaluación | Con Sub-agent | Sin Sub-agent | Delta |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (evaluación de riesgo en Build Mode)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ El consumo de tokens es prácticamente idéntico en ambos brazos (151K vs 154K) — mantener un especialista no cuesta más que manejar el paso inline.
441
+
442
+ **Hallazgos Clave**
443
+
444
+ - **Pre-mortem-runner es load-bearing** (+77.8%): sin él, el orchestrator produce una lista de riesgos delgada y en tiempo futuro, perdiendo el conteo de escenarios (≥15), cobertura de 5 categorías, disciplina de leading indicator, experimentos pre-launch de bajo costo, y el marco narrativo de "lanzó y falló" en pasado. El esquema estructurado del especialista hace trabajo real que `references/` por sí solo no reproduce.
445
+ - **Discovery-specialist y strategy-critic son contribuidores moderados** (+14–17%): el orchestrator puede producir análisis razonables de Persona+JTBD y críticas de estrategia inline. El único assertion divergente entre brazos es el contrato de despacho mismo, no la calidad estructural.
446
+ - **Implicación**: de los 3 especialistas, pre-mortem-runner ofrece el mayor lift de calidad solo y es el más justificado por estos resultados. Los otros dos podrían en principio plegarse de vuelta al orchestrator con páginas de referencia más fuertes, aunque no hay incentivo de costo para hacerlo (los tokens son iguales).
447
+
448
+ **Advertencia del harness**: el executor `general-purpose` usado en este harness de eval no expone despacho `Task` anidado, por lo que el brazo CON aproxima el despacho real leyendo el `agents/*.md` del especialista y siguiendo su esquema inline (con un marcador de despacho explícito). El contraste estructural vs SIN es real, pero se necesitaría una corrida top-session para verificar de extremo a extremo la calidad del despacho via Task tool.
449
+
450
+ > Artefactos crudos y divergencia por assertion en [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 Comandos Disponibles
package/README.ja.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbookは、ゼロから一まで体系的にプロダクト企画
21
21
 
22
22
  - 🧭 **6つの実行モード** — 30分の迅速な検証からフルスケールのプロダクト企画まで(機能拡張ファストトラックを含む)
23
23
  - 📐 **22のプロダクトフレームワーク** — Discovery → Define → Develop → Deliverの全パイプラインをカバー
24
+ - 🤝 **3つの専門サブエージェント** — Discovery、戦略批評、Pre-mortem が独立した context window で動作し、フレームワーク固有の専門性を持つ
24
25
  - 🔄 **変更伝播エンジン** — 任意のステップを修正すると下流の全出力が自動更新
25
26
  - 📎 **スマートファイル統合** — データ、スクリーンショット、ドキュメントをアップロードするとAIが関連ステップに自動統合
26
27
  - 🔗 **開発ハンドオフ** — CLAUDE.md + TASKS.md + TICKETS.mdを生成してClaude Code開発にシームレスに接続
@@ -156,6 +157,10 @@ product-playbook/
156
157
  │ ├── product-prd.md # /product-prd — PRD生成
157
158
  │ ├── product-report.md # /product-report — HTMLレポート生成
158
159
  │ └── product-dev.md # /product-dev — 開発ハンドオフパッケージ生成
160
+ ├── agents/ # 専門サブエージェント(Claude Code プラグインが自動読み込み)
161
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map スペシャリスト
162
+ │ ├── strategy-critic.md # Rumelt 視点の戦略批評者
163
+ │ └── pre-mortem-runner.md # 15+ の失敗シナリオ + リーディングインジケーター
159
164
  └── references/
160
165
  ├── 00-opportunity-check.md # 機会評価 + DHMモデル
161
166
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -419,6 +424,32 @@ Claude Codeは自動的に:
419
424
 
420
425
  > 詳細な方法論とデータは[`evals/`](./evals/)を参照。
421
426
 
427
+ ### イテレーション5:Sub-agent A/B 比較(ディスパッチ関連3評価 × 22期待値)
428
+
429
+ v1.2.0+ で導入された3つの専門 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)の品質への限界貢献を測定する集中 A/B 評価。同じスキル版(v1.2.3)、同じプロンプト、2つの arm:
430
+
431
+ - **Sub-agent あり**:executor は該当する `agents/*.md` を読み、専門エージェントが宣言する出力スキーマと自己チェックに従う。レスポンス内に dispatch マーカーを記録。
432
+ - **Sub-agent なし**:executor は `agents/*.md` を一切読まず、delegation に言及しない。`SKILL.md` + `commands/` + `references/` のみを使い、orchestrator が inline で処理する。
433
+
434
+ | 評価項目 | Sub-agent あり | Sub-agent なし | 差分 |
435
+ |-----------|:--------:|:------------:|:-----:|
436
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
437
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
438
+ | **Pre-mortem(Build Mode リスク評価)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
439
+ | **合計** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
440
+
441
+ 両 arm の token 消費はほぼ同じ(151K vs 154K)— 専門エージェントを保持することは inline 処理より高くはならない。
442
+
443
+ **主要な発見**
444
+
445
+ - **Pre-mortem-runner は load-bearing**(+77.8%):これがないと、orchestrator は薄く未来形のリスクリストしか生成できず、シナリオ数(≥15)、5カテゴリーのカバレッジ、leading indicator の規律、低コスト pre-launch 実験、過去形「出荷して失敗した」のナラティブ枠組みを失う。構造化された専門エージェントのスキーマが本当の仕事をしており、`references/` だけでは再構築できない。
446
+ - **Discovery-specialist と strategy-critic は中程度の貢献**(+14–17%):orchestrator 単独でも Persona+JTBD と戦略批評を妥当なレベルで処理できる。両 arm で分岐する assertion は dispatch コントラクト自体であり、構造的品質ではない。
447
+ - **含意**:3つの専門のうち、pre-mortem-runner が単独での品質向上が最大で、最も保持を正当化される。他の2つは原理的には強化された reference ページで orchestrator に折り返せるが、token コストが同じなので削減誘因はない。
448
+
449
+ **ハーネスの注意**:この評価環境の `general-purpose` executor は nested `Task` を公開しないため、「Sub-agent あり」 arm は「専門エージェントの `agents/*.md` を読む + dispatch マーカー + スキーマを inline で遵守」で実際の dispatch を近似する。構造的対比は実物だが、エンドツーエンドの Task ツール dispatch を完全に検証するには top-session 実行が必要。
450
+
451
+ > 生の成果物と assertion ごとの分岐は [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) を参照。
452
+
422
453
  ---
423
454
 
424
455
  ## 💬 利用可能なコマンド
package/README.ko.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook은 제로부터 원까지 제품 기획 전 과정을 체
21
21
 
22
22
  - 🧭 **6가지 실행 모드** — 30분 빠른 검증부터 전체 제품 기획까지 (기능 확장 빠른 트랙 포함)
23
23
  - 📐 **22개 제품 프레임워크** — Discovery → Define → Develop → Deliver 전체 파이프라인 커버
24
+ - 🤝 **3개 전문 서브에이전트** — Discovery, 전략 비평, Pre-mortem이 격리된 context window에서 작동하며 프레임워크별 전문성을 보유
24
25
  - 🔄 **변경 전파 엔진** — 어떤 단계든 수정하면 모든 하위 산출물이 자동 업데이트
25
26
  - 📎 **스마트 파일 통합** — 데이터, 스크린샷, 문서를 업로드하면 AI가 해당 단계에 자동 통합
26
27
  - 🔗 **개발 핸드오프** — CLAUDE.md + TASKS.md + TICKETS.md를 생성하여 Claude Code 개발로 원활하게 연결
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — PRD 생성
156
157
  │ ├── product-report.md # /product-report — HTML 보고서 생성
157
158
  │ └── product-dev.md # /product-dev — 개발 핸드오프 패키지 생성
159
+ ├── agents/ # 전문 서브에이전트 (Claude Code 플러그인이 자동 로드)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 스페셜리스트
161
+ │ ├── strategy-critic.md # Rumelt 관점의 전략 비평가
162
+ │ └── pre-mortem-runner.md # 15+ 실패 시나리오 + 선행 지표
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 기회 평가 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code가 자동으로:
418
423
 
419
424
  > 상세한 방법론과 데이터는 [`evals/`](./evals/)를 참조하세요.
420
425
 
426
+ ### 반복 5: Sub-agent A/B 비교 (디스패치 관련 3개 평가 × 22개 기대값)
427
+
428
+ v1.2.0+ 에서 도입된 3개의 전문 sub-agent (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) 의 품질에 대한 한계 기여를 측정하는 집중 A/B 평가. 동일 스킬 버전(v1.2.3), 동일 프롬프트, 2개 arm:
429
+
430
+ - **Sub-agent 있음**: executor 가 해당 `agents/*.md` 파일을 읽고, 전문가가 선언한 출력 스키마와 자체 점검을 따름. 응답에 dispatch 마커 기록.
431
+ - **Sub-agent 없음**: executor 는 어떤 `agents/*.md` 도 읽지 못하며, delegation 을 언급하지 못함. `SKILL.md` + `commands/` + `references/` 만 사용하여 orchestrator 가 inline 으로 처리.
432
+
433
+ | 평가 항목 | Sub-agent 있음 | Sub-agent 없음 | 차이 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (Build Mode 위험 평가)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **합계** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ 두 arm 의 token 소비는 거의 동일함 (151K vs 154K) — 전문가를 유지하는 것이 inline 처리보다 더 비싸지 않음.
441
+
442
+ **핵심 발견**
443
+
444
+ - **Pre-mortem-runner 가 load-bearing** (+77.8%): 이것이 없으면 orchestrator 는 얇고 미래형인 위험 리스트만 생성하며, 시나리오 수 (≥15), 5개 카테고리 커버리지, leading indicator 규율, 저비용 pre-launch 실험, 과거형 "출시 후 실패" 내러티브 프레임을 놓침. 구조화된 전문가 스키마가 실제로 일을 하고 있으며, `references/` 만으로는 재구성할 수 없음.
445
+ - **Discovery-specialist 와 strategy-critic 은 중간 기여자** (+14–17%): orchestrator 자체만으로도 Persona+JTBD 와 전략 비평을 합리적 수준에서 처리할 수 있음. 두 arm 에서 분기하는 유일한 assertion 은 dispatch 계약 자체이며, 구조적 품질이 아님.
446
+ - **함의**: 3개 전문가 중 pre-mortem-runner 가 단독 품질 향상이 가장 크고 보존이 가장 정당화됨. 다른 2개는 원칙적으로 강화된 reference 페이지로 orchestrator 에 통합할 수 있지만, token 비용이 동일하므로 축소 동기는 없음.
447
+
448
+ **Harness 주의사항**: 이 평가 환경의 `general-purpose` executor 는 nested `Task` 를 노출하지 않으므로, "Sub-agent 있음" arm 은 "전문가 `agents/*.md` 읽기 + dispatch 마커 + 스키마를 inline 으로 준수" 로 실제 dispatch 를 근사함. 구조적 대조는 실제이지만, 엔드투엔드 Task 도구 dispatch 를 완전히 검증하려면 top-session 실행이 필요함.
449
+
450
+ > 원시 artifacts 와 assertion 별 분기는 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/) 참조.
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 사용 가능한 명령
package/README.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook is a **Claude AI Skill** that systematically guides you thr
21
21
 
22
22
  - 🧭 **6 execution modes** — from 30-minute rapid validation to full-blown product plans (including a feature expansion fast track)
23
23
  - 📐 **22 product frameworks** — covering the entire Discovery → Define → Develop → Deliver pipeline
24
+ - 🤝 **3 specialist sub-agents** — Discovery, Strategy Critique, and Pre-mortem run as isolated context windows with framework-specific expertise
24
25
  - 🔄 **Change propagation engine** — modify any step and all downstream outputs update automatically
25
26
  - 📎 **Smart file integration** — upload data, screenshots, or documents; the AI automatically integrates them into the relevant step
26
27
  - 🔗 **Dev handoff** — generates CLAUDE.md + TASKS.md + TICKETS.md for seamless handoff to Claude Code development
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — Generate PRD
156
157
  │ ├── product-report.md # /product-report — Generate HTML report
157
158
  │ └── product-dev.md # /product-dev — Generate dev handoff package
159
+ ├── agents/ # Specialist sub-agents (auto-loaded by Claude Code plugin)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map specialist
161
+ │ ├── strategy-critic.md # Rumelt-lens strategy critic
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # Opportunity assessment + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,57 @@ By comparing response quality between "with Skill guidance" and "without Skill g
418
423
 
419
424
  > See [`evals/`](./evals/) for detailed methodology and data.
420
425
 
426
+ ### Iteration 5: Sub-agent A/B Comparison (3 dispatch-relevant evals × 22 expectations)
427
+
428
+ A focused A/B run measuring the marginal quality contribution of the 3 specialist sub-agents (`discovery-specialist`, `strategy-critic`, `pre-mortem-runner`) shipped in v1.2.0+. Same skill version (v1.2.3), same prompts, two arms:
429
+
430
+ - **WITH sub-agent**: executor reads the specialist's `agents/*.md` file and follows its declared output schema + self-checks; dispatch is marked in the response.
431
+ - **WITHOUT sub-agent**: executor is forbidden from reading any `agents/*.md` or mentioning delegation; must handle the step inline as the orchestrator using only `SKILL.md` + `commands/` + `references/`.
432
+
433
+ | Eval | With Sub-agent | Without Sub-agent | Delta |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery (Persona + JTBD) | 100% (7/7) | 85.7% (6/7) | +14.3% |
436
+ | Strategy Critic | 100% (6/6) | 83.3% (5/6) | +16.7% |
437
+ | **Pre-mortem (Build Mode risk)** | **100% (9/9)** | **22.2% (2/9)** | **+77.8% ✅** |
438
+ | **TOTAL** | **100% (22/22)** | **59.1% (13/22)** | **+40.9%** |
439
+
440
+ Token cost is essentially identical across arms (151K vs 154K) — keeping a specialist costs no more than handling the step inline.
441
+
442
+ **Key Findings**
443
+
444
+ - **Pre-mortem-runner is load-bearing** (+77.8%): without it the orchestrator produces a thin, future-tense risk list and misses scenario count (≥15), 5-category coverage, leading-indicator discipline, cheap pre-launch experiments, and past-tense "shipped-and-failed" framing. The structured specialist schema is doing real work that `references/` alone does not reproduce.
445
+ - **Discovery-specialist and strategy-critic are modest contributors** (+14–17%): the orchestrator can produce reasonable Persona+JTBD analyses and strategy critiques inline. The diverging assertion in each case is the dispatch contract itself, not the structural quality.
446
+ - **Implication**: of the 3 specialists, the pre-mortem-runner gives the largest standalone quality lift and is the most justified by these results. The other two could in principle be folded back into the orchestrator with stronger reference pages, though there is no cost incentive to do so (tokens are a wash).
447
+
448
+ **Harness caveat**: the `general-purpose` executor used in this eval harness does not expose nested `Task` dispatch, so the WITH arm approximates real dispatch by reading the specialist's `agents/*.md` and following its schema inline (with an explicit dispatch marker). The structural contrast vs WITHOUT is real, but a true top-session run would be needed to verify end-to-end Task-tool dispatch quality.
449
+
450
+ > Raw artifacts and per-assertion divergence in [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/).
451
+
452
+ ### Iteration 6: Token Optimization Pass (v1.2.5)
453
+
454
+ A token-reduction iteration. Same skill content semantics, smaller footprint per session. Goal: ≥25% token reduction while holding quality at 100%.
455
+
456
+ **Changes shipped:**
457
+ - **SKILL.md slim** — extracted Sub-Agent Delegation Rules to lazy `rules-subagent-dispatch.md`; tightened Hard Gate descriptions; consolidated Mode Overview duplication. **6,188 → 2,877 tokens (-54%)** for the eager entry point.
458
+ - **rules-context.md split** — kept decision logic eager (1,594 tokens); moved verbose YAML templates + Bootstrap procedure + Conflict UX scripts to lazy `rules-context-template.md` (1,849 tokens, loaded only on trigger).
459
+ - **rules-quality-review.md slim** — distilled from 1,040 → 817 tokens with compact 3-step protocol + 1-line per-framework checklists.
460
+ - **Specialist agents slim** — removed embedded framework knowledge that duplicated `references/*.md`, replaced with on-demand pointers. **discovery-specialist −25%, strategy-critic −18%, pre-mortem-runner −20%** per dispatch.
461
+
462
+ **Estimated savings per 9-step Full Mode session:**
463
+
464
+ | Source | Before | After | Saved |
465
+ |--------|:------:|:-----:|:-----:|
466
+ | Eager (SKILL + context + progress) | ~8,800 | ~5,500 | **−3,300** |
467
+ | Quality review (×9 step loads) | ~9,360 | ~7,353 | **−2,007** |
468
+ | Sub-agent dispatches (3 specialists) | ~9,005 | ~7,106 | **−1,899** |
469
+ | **Total per session** | **~27,200** | **~18,900** | **−8,300 (−30%)** |
470
+
471
+ **Quality validation:** pre-mortem-runner (the most quality-sensitive specialist per Iteration 5) re-ran eval-12 on v1.2.5 slimmed content. Result: **9/9 assertions PASS** — 16 scenarios across all 5 categories, 5 architecture-grounded scenarios citing real stack components, 5 cheap pre-launch experiments with binary decision rules, past-tense framing maintained. Static cross-check confirmed eval-10/11 assertions (13 total) all have explicit support in the slim agent prompts.
472
+
473
+ **Token cost trade-off:** the split adds 2 new lazy files (`rules-subagent-dispatch.md` 978 tokens, `rules-context-template.md` 1,849 tokens) that load only when triggered. In the most common session paths, they never load. In Bootstrap-or-Conflict paths, the eager savings still net positive.
474
+
475
+ **Mirrored to 5 i18n locales** (zh-TW, zh-CN, ja, es, ko) preserving existing translations — structural slim applied identically per language.
476
+
421
477
  ---
422
478
 
423
479
  ## 💬 Available Commands
package/README.zh-CN.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook 是一个 **Claude AI Skill**,能够系统性地引导你
21
21
 
22
22
  - 🧭 **6 种执行模式** — 从 30 分钟快速验证到完整企划(含功能扩充快速路径)
23
23
  - 📐 **22 个产品框架** — 涵盖 Discovery → Define → Develop → Deliver 全流程
24
+ - 🤝 **3 个专家 sub-agent** — Discovery、策略批判、Pre-mortem 在独立 context window 中运作,各自携带专属框架专业
24
25
  - 🔄 **变更传播引擎** — 修改任何步骤,自动更新所有下游产出
25
26
  - 📎 **文件智慧整合** — 上传数据、截图、文件,AI 自动整合到对应步骤
26
27
  - 🔗 **开发衔接** — 产出 CLAUDE.md + TASKS.md + TICKETS.md,无缝衔接 Claude Code 开发
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — 产出 PRD
156
157
  │ ├── product-report.md # /product-report — 产出 HTML 报告
157
158
  │ └── product-dev.md # /product-dev — 产出开发交接包
159
+ ├── agents/ # 专家 sub-agent(Claude Code plugin 自动加载)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 专家
161
+ │ ├── strategy-critic.md # Rumelt 视角的策略批判者
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 机会评估 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 会自动:
418
423
 
419
424
  > 详细评测方法与数据见 [`evals/`](./evals/) 目录。
420
425
 
426
+ ### Iteration 5:Sub-agent A/B 对照(3 个专家相关评测 × 22 个期望值)
427
+
428
+ 针对 v1.2.0+ 推出的 3 个专家 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)所做的聚焦 A/B 测试,量化它们在品质上的边际贡献。相同 skill 版本(v1.2.3)、相同 prompt、两个 arm:
429
+
430
+ - **有 Sub-agent**:executor 可读取对应的 `agents/*.md`,并遵循该专家声明的输出 schema 与自检;回应中标记 dispatch。
431
+ - **无 Sub-agent**:executor 不得读取任何 `agents/*.md`,不得提及 delegation;只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 处理。
432
+
433
+ | 评测项目 | 有 Sub-agent | 无 Sub-agent | 差异 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
436
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
437
+ | **Pre-mortem(Build Mode 风险评估)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
438
+ | **总计** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
439
+
440
+ 两个 arm 的 token 消耗几乎相同(151K vs 154K)——保留专家不会比 inline 处理更贵。
441
+
442
+ **关键发现**
443
+
444
+ - **Pre-mortem-runner 是 load-bearing**(+77.8%):少了它,orchestrator 只能产出单薄、未来式的风险清单,缺失 scenario 数量(≥15)、五类别覆盖、leading-indicator 纪律、低成本上线前实验、以及过去式「已上线且失败」叙事框架。结构化的专家 schema 在做真正的工作,光看 `references/` 无法重建。
445
+ - **Discovery-specialist 与 strategy-critic 属于中度贡献**(+14–17%):orchestrator 自己处理 Persona+JTBD 与策略批判已可达合理水准。两个 arm 唯一分歧的 assertion 是 dispatch 契约本身,而非结构性品质。
446
+ - **意涵**:3 个专家中,pre-mortem-runner 对品质提升的贡献最大、最值得保留;另外两个原则上可以靠加强 reference 文件 fold 回 orchestrator,但因为 token 成本相同,没有减量诱因。
447
+
448
+ **Harness 警语**:此评测环境的 `general-purpose` executor 并未暴露 nested `Task`,因此「有 Sub-agent」arm 是以「读取专家 `agents/*.md` + 标记 dispatch + 遵循 schema inline」近似真实 dispatch。结构性对比是真的,但要完全验证端到端 Task 工具 dispatch 还需要 top-session 测试。
449
+
450
+ > 原始 artifacts 与每项 assertion 分歧详见 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 可用指令一览
package/README.zh-TW.md CHANGED
@@ -21,6 +21,7 @@ The Product Playbook 是一個 **Claude AI Skill**,能夠系統性地引導你
21
21
 
22
22
  - 🧭 **6 種執行模式** — 從 30 分鐘快速驗證到完整企劃(含功能擴充快速路徑)
23
23
  - 📐 **22 個產品框架** — 涵蓋 Discovery → Define → Develop → Deliver 全流程
24
+ - 🤝 **3 個專家 sub-agent** — Discovery、策略批判、Pre-mortem 在獨立 context window 中運作,各自攜帶專屬框架專業
24
25
  - 🔄 **變更傳播引擎** — 修改任何步驟,自動更新所有下游產出
25
26
  - 📎 **檔案智慧整合** — 上傳數據、截圖、文件,AI 自動整合到對應步驟
26
27
  - 🔗 **開發銜接** — 產出 CLAUDE.md + TASKS.md + TICKETS.md,無縫銜接 Claude Code 開發
@@ -155,6 +156,10 @@ product-playbook/
155
156
  │ ├── product-prd.md # /product-prd — 產出 PRD
156
157
  │ ├── product-report.md # /product-report — 產出 HTML 報告
157
158
  │ └── product-dev.md # /product-dev — 產出開發交接包
159
+ ├── agents/ # 專家 sub-agent(Claude Code plugin 自動載入)
160
+ │ ├── discovery-specialist.md # Persona / JTBD / OST / Journey Map 專家
161
+ │ ├── strategy-critic.md # Rumelt 視角的策略批判者
162
+ │ └── pre-mortem-runner.md # 15+ failure scenarios + leading indicators
158
163
  └── references/
159
164
  ├── 00-opportunity-check.md # 機會評估 + DHM Model
160
165
  ├── 01-strategy.md # Strategy Blocks + Rumelt + OKR
@@ -418,6 +423,32 @@ Claude Code 會自動:
418
423
 
419
424
  > 詳細評測方法與數據見 [`evals/`](./evals/) 目錄。
420
425
 
426
+ ### Iteration 5:Sub-agent A/B 對照(3 個專家相關評測 × 22 個期望值)
427
+
428
+ 針對 v1.2.0+ 推出的 3 個專家 sub-agent(`discovery-specialist`、`strategy-critic`、`pre-mortem-runner`)所做的聚焦 A/B 測試,量化它們在品質上的邊際貢獻。相同 skill 版本(v1.2.3)、相同 prompt、兩個 arm:
429
+
430
+ - **有 Sub-agent**:executor 可讀取對應的 `agents/*.md`,並遵循該專家宣告的輸出 schema 與自檢;回應中標記 dispatch。
431
+ - **無 Sub-agent**:executor 不得讀取任何 `agents/*.md`,不得提及 delegation;只能用 `SKILL.md` + `commands/` + `references/` 由 orchestrator 自行 inline 處理。
432
+
433
+ | 評測項目 | 有 Sub-agent | 無 Sub-agent | 差異 |
434
+ |-----------|:--------:|:------------:|:-----:|
435
+ | Discovery(Persona + JTBD) | 100%(7/7) | 85.7%(6/7) | +14.3% |
436
+ | Strategy Critic | 100%(6/6) | 83.3%(5/6) | +16.7% |
437
+ | **Pre-mortem(Build Mode 風險評估)** | **100%(9/9)** | **22.2%(2/9)** | **+77.8% ✅** |
438
+ | **總計** | **100%(22/22)** | **59.1%(13/22)** | **+40.9%** |
439
+
440
+ 兩個 arm 的 token 消耗幾乎相同(151K vs 154K)——保留專家不會比 inline 處理更貴。
441
+
442
+ **關鍵發現**
443
+
444
+ - **Pre-mortem-runner 是 load-bearing**(+77.8%):少了它,orchestrator 只能產出單薄、未來式的風險清單,缺失 scenario 數量(≥15)、五類別覆蓋、leading-indicator 紀律、低成本上線前實驗、以及過去式「已上線且失敗」敘事框架。結構化的專家 schema 在做真正的工作,光看 `references/` 無法重建。
445
+ - **Discovery-specialist 與 strategy-critic 屬於中度貢獻**(+14–17%):orchestrator 自己處理 Persona+JTBD 與策略批判已可達合理水準。兩個 arm 唯一分歧的 assertion 是 dispatch 契約本身,而非結構性品質。
446
+ - **意涵**:3 個專家中,pre-mortem-runner 對品質提升的貢獻最大、最值得保留;另外兩個原則上可以靠加強 reference 文件 fold 回 orchestrator,但因為 token 成本相同,沒有減量誘因。
447
+
448
+ **Harness 警語**:此評測環境的 `general-purpose` executor 並未暴露 nested `Task`,因此「有 Sub-agent」arm 是以「讀取專家 `agents/*.md` + 標記 dispatch + 遵循 schema inline」近似真實 dispatch。結構性對比是真的,但要完全驗證端到端 Task 工具 dispatch 還需要 top-session 測試。
449
+
450
+ > 原始 artifacts 與每項 assertion 分歧詳見 [`~/product-playbook-workspace/iteration-3/benchmark.md`](./evals/)。
451
+
421
452
  ---
422
453
 
423
454
  ## 💬 可用指令一覽