@tgoodington/intuition 8.1.3 → 9.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (154) hide show
  1. package/README.md +9 -9
  2. package/docs/project_notes/.project-memory-state.json +100 -0
  3. package/docs/project_notes/branches/.gitkeep +0 -0
  4. package/docs/project_notes/bugs.md +41 -0
  5. package/docs/project_notes/decisions.md +147 -0
  6. package/docs/project_notes/issues.md +101 -0
  7. package/docs/project_notes/key_facts.md +88 -0
  8. package/docs/project_notes/trunk/.gitkeep +0 -0
  9. package/docs/project_notes/trunk/.planning_research/decision_file_naming.md +15 -0
  10. package/docs/project_notes/trunk/.planning_research/decisions_log.md +32 -0
  11. package/docs/project_notes/trunk/.planning_research/orientation.md +51 -0
  12. package/docs/project_notes/trunk/audit/plan-rename-hitlist.md +654 -0
  13. package/docs/project_notes/trunk/blueprint-conflicts.md +109 -0
  14. package/docs/project_notes/trunk/blueprints/database-architect.md +416 -0
  15. package/docs/project_notes/trunk/blueprints/devops-infrastructure.md +514 -0
  16. package/docs/project_notes/trunk/blueprints/technical-writer.md +788 -0
  17. package/docs/project_notes/trunk/build_brief.md +119 -0
  18. package/docs/project_notes/trunk/build_report.md +250 -0
  19. package/docs/project_notes/trunk/detail_brief.md +94 -0
  20. package/docs/project_notes/trunk/plan.md +182 -0
  21. package/docs/project_notes/trunk/planning_brief.md +96 -0
  22. package/docs/project_notes/trunk/prompt_brief.md +60 -0
  23. package/docs/project_notes/trunk/prompt_output.json +98 -0
  24. package/docs/project_notes/trunk/scratch/database-architect-decisions.json +72 -0
  25. package/docs/project_notes/trunk/scratch/database-architect-research-plan.md +10 -0
  26. package/docs/project_notes/trunk/scratch/database-architect-stage1.md +226 -0
  27. package/docs/project_notes/trunk/scratch/devops-infrastructure-decisions.json +71 -0
  28. package/docs/project_notes/trunk/scratch/devops-infrastructure-research-plan.md +7 -0
  29. package/docs/project_notes/trunk/scratch/devops-infrastructure-stage1.md +164 -0
  30. package/docs/project_notes/trunk/scratch/technical-writer-decisions.json +88 -0
  31. package/docs/project_notes/trunk/scratch/technical-writer-research-plan.md +7 -0
  32. package/docs/project_notes/trunk/scratch/technical-writer-stage1.md +266 -0
  33. package/docs/project_notes/trunk/team_assignment.json +108 -0
  34. package/docs/project_notes/trunk/test_brief.md +75 -0
  35. package/docs/project_notes/trunk/test_report.md +26 -0
  36. package/docs/project_notes/trunk/verification/devops-infrastructure-verification.md +172 -0
  37. package/docs/v9/decision-framework-direction.md +142 -0
  38. package/docs/v9/decision-framework-implementation.md +114 -0
  39. package/docs/v9/domain-adaptive-team-architecture.md +1016 -0
  40. package/docs/v9/test/SESSION_SUMMARY.md +117 -0
  41. package/docs/v9/test/TEST_PLAN.md +119 -0
  42. package/docs/v9/test/blueprints/legal-analyst.md +166 -0
  43. package/docs/v9/test/output/07_cover_letter.md +41 -0
  44. package/docs/v9/test/phase2/mock_plan.md +89 -0
  45. package/docs/v9/test/phase2/producers.json +32 -0
  46. package/docs/v9/test/phase2/specialists/database-architect.specialist.md +10 -0
  47. package/docs/v9/test/phase2/specialists/financial-analyst.specialist.md +10 -0
  48. package/docs/v9/test/phase2/specialists/legal-analyst.specialist.md +10 -0
  49. package/docs/v9/test/phase2/specialists/technical-writer.specialist.md +10 -0
  50. package/docs/v9/test/phase2/team_assignment.json +61 -0
  51. package/docs/v9/test/phase3/blueprints/legal-analyst.md +840 -0
  52. package/docs/v9/test/phase3/legal-analyst-full.specialist.md +111 -0
  53. package/docs/v9/test/phase3/project_context/nh_landlord_tenant_notes.md +35 -0
  54. package/docs/v9/test/phase3/project_context/property_facts.md +32 -0
  55. package/docs/v9/test/phase3b/blueprints/legal-analyst.md +1715 -0
  56. package/docs/v9/test/phase3b/legal-analyst.specialist.md +153 -0
  57. package/docs/v9/test/phase3b/scratch/legal-analyst-stage1.md +270 -0
  58. package/docs/v9/test/phase4/TEST_PLAN.md +32 -0
  59. package/docs/v9/test/phase4/blueprints/financial-analyst-T2.md +538 -0
  60. package/docs/v9/test/phase4/blueprints/legal-analyst-T4.md +253 -0
  61. package/docs/v9/test/phase4/cross-blueprint-check.md +280 -0
  62. package/docs/v9/test/phase4/scratch/financial-analyst-T2-stage1.md +67 -0
  63. package/docs/v9/test/phase4/scratch/legal-analyst-T4-stage1.md +54 -0
  64. package/docs/v9/test/phase4/specialists/financial-analyst.specialist.md +156 -0
  65. package/docs/v9/test/phase4/specialists/legal-analyst.specialist.md +153 -0
  66. package/docs/v9/test/phase5/TEST_PLAN.md +35 -0
  67. package/docs/v9/test/phase5/blueprints/code-architect-hw-vetter.md +375 -0
  68. package/docs/v9/test/phase5/output/04_compliance_checklist.md +149 -0
  69. package/docs/v9/test/phase5/output/hardware-vetter-SKILL-v2.md +561 -0
  70. package/docs/v9/test/phase5/output/hardware-vetter-SKILL.md +459 -0
  71. package/docs/v9/test/phase5/producers/code-writer.producer.md +49 -0
  72. package/docs/v9/test/phase5/producers/document-writer.producer.md +62 -0
  73. package/docs/v9/test/phase5/regression-comparison-v2.md +60 -0
  74. package/docs/v9/test/phase5/regression-comparison.md +197 -0
  75. package/docs/v9/test/phase5/review-5A-specialist.md +213 -0
  76. package/docs/v9/test/phase5/specialist-test/TEST_PLAN.md +60 -0
  77. package/docs/v9/test/phase5/specialist-test/blueprint-comparison.md +252 -0
  78. package/docs/v9/test/phase5/specialist-test/blueprints/code-architect-hw-vetter.md +916 -0
  79. package/docs/v9/test/phase5/specialist-test/scratch/code-architect-stage1.md +427 -0
  80. package/docs/v9/test/phase5/specialists/code-architect.specialist.md +168 -0
  81. package/docs/v9/test/phase5b/TEST_PLAN.md +219 -0
  82. package/docs/v9/test/phase5b/blueprints/5B-10-stage2-with-decisions.md +286 -0
  83. package/docs/v9/test/phase5b/decisions/5B-2-accept-all-decisions.json +68 -0
  84. package/docs/v9/test/phase5b/decisions/5B-3-promote-decisions.json +70 -0
  85. package/docs/v9/test/phase5b/decisions/5B-4-individual-decisions.json +68 -0
  86. package/docs/v9/test/phase5b/decisions/5B-5-triage-decisions.json +110 -0
  87. package/docs/v9/test/phase5b/decisions/5B-6-fallback-decisions.json +40 -0
  88. package/docs/v9/test/phase5b/decisions/5B-8-partial-decisions.json +46 -0
  89. package/docs/v9/test/phase5b/decisions/5B-9-complete-decisions.json +54 -0
  90. package/docs/v9/test/phase5b/scratch/code-architect-stage1.md +133 -0
  91. package/docs/v9/test/phase5b/specialists/code-architect.specialist.md +202 -0
  92. package/docs/v9/test/phase5b/stage1-many-decisions.md +139 -0
  93. package/docs/v9/test/phase5b/stage1-no-assumptions.md +70 -0
  94. package/docs/v9/test/phase5b/stage1-with-assumptions.md +86 -0
  95. package/docs/v9/test/phase5b/test-5B-1-results.md +157 -0
  96. package/docs/v9/test/phase5b/test-5B-10-results.md +130 -0
  97. package/docs/v9/test/phase5b/test-5B-2-results.md +75 -0
  98. package/docs/v9/test/phase5b/test-5B-3-results.md +104 -0
  99. package/docs/v9/test/phase5b/test-5B-4-results.md +114 -0
  100. package/docs/v9/test/phase5b/test-5B-5-results.md +126 -0
  101. package/docs/v9/test/phase5b/test-5B-6-results.md +60 -0
  102. package/docs/v9/test/phase5b/test-5B-7-results.md +141 -0
  103. package/docs/v9/test/phase5b/test-5B-8-results.md +115 -0
  104. package/docs/v9/test/phase5b/test-5B-9-results.md +76 -0
  105. package/docs/v9/test/producers/document-writer.producer.md +62 -0
  106. package/docs/v9/test/specialists/legal-analyst.specialist.md +58 -0
  107. package/package.json +4 -2
  108. package/producers/code-writer/code-writer.producer.md +86 -0
  109. package/producers/data-file-writer/data-file-writer.producer.md +116 -0
  110. package/producers/document-writer/document-writer.producer.md +117 -0
  111. package/producers/form-filler/form-filler.producer.md +99 -0
  112. package/producers/presentation-creator/presentation-creator.producer.md +109 -0
  113. package/producers/spreadsheet-builder/spreadsheet-builder.producer.md +107 -0
  114. package/scripts/install-skills.js +97 -9
  115. package/scripts/uninstall-skills.js +7 -2
  116. package/skills/intuition-agent-advisor/SKILL.md +327 -220
  117. package/skills/intuition-assemble/SKILL.md +261 -0
  118. package/skills/intuition-build/SKILL.md +379 -319
  119. package/skills/intuition-debugger/SKILL.md +390 -390
  120. package/skills/intuition-design/SKILL.md +385 -381
  121. package/skills/intuition-detail/SKILL.md +377 -0
  122. package/skills/intuition-engineer/SKILL.md +307 -303
  123. package/skills/intuition-handoff/SKILL.md +264 -222
  124. package/skills/intuition-handoff/references/handoff_core.md +54 -54
  125. package/skills/intuition-initialize/SKILL.md +21 -6
  126. package/skills/intuition-initialize/references/agents_template.md +118 -118
  127. package/skills/intuition-initialize/references/claude_template.md +134 -134
  128. package/skills/intuition-initialize/references/intuition_readme_template.md +4 -4
  129. package/skills/intuition-initialize/references/state_template.json +17 -2
  130. package/skills/{intuition-plan → intuition-outline}/SKILL.md +561 -481
  131. package/skills/{intuition-plan → intuition-outline}/references/magellan_core.md +16 -16
  132. package/skills/{intuition-plan → intuition-outline}/references/templates/plan_template.md +6 -6
  133. package/skills/intuition-prompt/SKILL.md +374 -312
  134. package/skills/intuition-start/SKILL.md +46 -13
  135. package/skills/intuition-start/references/start_core.md +60 -60
  136. package/skills/intuition-test/SKILL.md +345 -0
  137. package/specialists/api-designer/api-designer.specialist.md +291 -0
  138. package/specialists/business-analyst/business-analyst.specialist.md +270 -0
  139. package/specialists/copywriter/copywriter.specialist.md +268 -0
  140. package/specialists/database-architect/database-architect.specialist.md +275 -0
  141. package/specialists/devops-infrastructure/devops-infrastructure.specialist.md +314 -0
  142. package/specialists/financial-analyst/financial-analyst.specialist.md +269 -0
  143. package/specialists/frontend-component/frontend-component.specialist.md +293 -0
  144. package/specialists/instructional-designer/instructional-designer.specialist.md +285 -0
  145. package/specialists/legal-analyst/legal-analyst.specialist.md +260 -0
  146. package/specialists/marketing-strategist/marketing-strategist.specialist.md +281 -0
  147. package/specialists/project-manager/project-manager.specialist.md +266 -0
  148. package/specialists/research-analyst/research-analyst.specialist.md +273 -0
  149. package/specialists/security-auditor/security-auditor.specialist.md +354 -0
  150. package/specialists/technical-writer/technical-writer.specialist.md +275 -0
  151. /package/skills/{intuition-plan → intuition-outline}/references/sub_agents.md +0 -0
  152. /package/skills/{intuition-plan → intuition-outline}/references/templates/confidence_scoring.md +0 -0
  153. /package/skills/{intuition-plan → intuition-outline}/references/templates/plan_format.md +0 -0
  154. /package/skills/{intuition-plan → intuition-outline}/references/templates/planning_process.md +0 -0
@@ -0,0 +1,916 @@
1
+ # Blueprint: Hardware Vetter Claude Code Skill
2
+
3
+ **Specialist:** Code Architect
4
+ **Task:** Task 2 — Build the Hardware Vetter Claude Code Skill
5
+ **Date:** 2026-02-27
6
+ **Status:** Complete — Ready for Producer
7
+
8
+ ---
9
+
10
+ ## 1. Task Reference
11
+
12
+ **Plan Task:** Task 2 — Build the Hardware Vetter Claude Code Skill
13
+
14
+ **Acceptance Criteria:**
15
+ 1. Skill accepts all four hardware change types: CPU upgrade, GPU addition, RAM change, and entire system replacement
16
+ 2. Skill reads current hardware specs and model data from `docs/model_catalog.json` and `src/pipeline/config.py` without requiring manual data entry
17
+ 3. For each registered and candidate model, the skill produces a fit/no-fit verdict on proposed hardware with VRAM/RAM resource estimates
18
+ 4. Skill delivers a before/after comparison between current and proposed hardware showing performance impact per model
19
+ 5. Skill identifies which candidate models (beyond the current 3 registered) become feasible on the proposed hardware
20
+ 6. Output report includes an upgrade recommendation with clear rationale
21
+ 7. When benchmark data is unavailable for a specific hardware + model combination, the skill estimates from specs and flags the result as "projected" rather than "verified"
22
+ 8. Report is written to a timestamped markdown file in the project
23
+
24
+ **Dependencies:** Task 1 (Augment Model Catalog for Hardware Evaluation) — provides GPU VRAM fields in `docs/model_catalog.json`. CPU/RAM-only evaluation works without Task 1; GPU analysis requires it.
25
+
26
+ ---
27
+
28
+ ## 2. Research Findings
29
+
30
+ ### Model Catalog (`docs/model_catalog.json`)
31
+
32
+ - 1003 lines, schema v1.2, 11 models total
33
+ - Top-level structure: `{ "schema_version", "hardware_profile", "infrastructure_options", "models": { ... } }`
34
+ - `hardware_profile` fields (exact):
35
+ ```json
36
+ {
37
+ "server_name": "aos-ai-beta",
38
+ "ram_gb": 192,
39
+ "cpu_model": "2x Intel Xeon E5-2620 v4",
40
+ "cpu_cores": 32,
41
+ "cpu_clock_ghz_base": 2.1,
42
+ "cpu_clock_ghz_turbo": 2.5,
43
+ "cpu_architecture": "Broadwell",
44
+ "storage_type": "NVMe",
45
+ "storage_gb": "TBD",
46
+ "gpu": "none",
47
+ "notes": "CPU-only inference, no GPU available. 32 threads total (16 per socket)."
48
+ }
49
+ ```
50
+ - Per-model `hardware_requirements` fields (exact):
51
+ ```json
52
+ {
53
+ "ram_gb_q4": number,
54
+ "ram_gb_q8": number,
55
+ "ram_gb_fp16": number,
56
+ "recommended_ram_gb": number,
57
+ "recommended_quantization": "q4_K_M",
58
+ "gpu_vram_gb_q4": number,
59
+ "gpu_vram_gb_q8": number,
60
+ "gpu_offload_support": "cpu_only_viable|full_offload|partial_offload"
61
+ }
62
+ ```
63
+ - Per-model other key fields: `display_name`, `ollama_id`, `parameter_count`, `size_tier`, `default_quantization`, `strengths`, `weaknesses`, `raw_benchmarks`, `category_scores`, `feasibility`
64
+ - **Field warning:** `gpu_vram_gb_fp16` does NOT exist in the catalog — never reference it
65
+ - **Field warning:** The RAM field is `ram_gb` (not `total_ram_gb`)
66
+
67
+ ### Pipeline Config (`src/pipeline/config.py`)
68
+
69
+ - Pydantic Settings class with 3 registered model IDs:
70
+ - `default_model: str = "qwen2.5:14b"` → catalog key `qwen2.5-14b`
71
+ - `fast_model: str = "qwen2.5:7b"` → catalog key `qwen2.5-7b`
72
+ - `chat_model: str = "llama3.1:8b"` → catalog key `llama3.1-8b`
73
+ - Ollama ID format: config uses colon (`qwen2.5:14b`), catalog keys use hyphen (`qwen2.5-14b`), catalog `ollama_id` field uses colon. Match via `ollama_id` field.
74
+
75
+ ### All 11 Catalog Models
76
+
77
+ | Key | Display Name | Parameters | Registered |
78
+ |-----|-------------|-----------|------------|
79
+ | tinyllama-1.1b | TinyLlama 1.1B | 1.1B | No |
80
+ | llama3.2-1b | Llama 3.2 1B | 1B | No |
81
+ | llama3.2-3b | Llama 3.2 3B | 3B | No |
82
+ | phi3-3.8b | Phi-3 3.8B | 3.8B | No |
83
+ | qwen2.5-7b | Qwen 2.5 7B | 7B | Yes (fast_model) |
84
+ | mistral-7b | Mistral 7B | 7B | No |
85
+ | gemma2-9b | Gemma 2 9B | 9B | No |
86
+ | llama3.1-8b | Llama 3.1 8B | 8B | Yes (chat_model) |
87
+ | qwen2.5-14b | Qwen 2.5 14B | 14.7B | Yes (default_model) |
88
+ | llama3.3-70b | Llama 3.3 70B | 70B | No |
89
+ | mixtral-8x22b | Mixtral 8x22B | 141B | No |
90
+
91
+ ### Report Output Convention
92
+
93
+ - Path: `docs/reports/hardware_eval_YYYY-MM-DD_[slug].md`
94
+ - Slug derived from change type (see Section 5 for exact rules)
95
+
96
+ ---
97
+
98
+ ## 3. Approach
99
+
100
+ Build a single-file Claude Code skill (`SKILL.md`) that:
101
+
102
+ 1. **Loads data automatically** — reads `docs/model_catalog.json` and `src/pipeline/config.py` to extract current hardware specs, all model definitions, and registered model IDs, with no manual data entry from the user.
103
+ 2. **Runs an interactive question flow** — uses `AskUserQuestion` to collect the proposed hardware change through a unified question sequence that handles all 4 change types (CPU upgrade, GPU addition, RAM change, full system replacement).
104
+ 3. **Analyzes every model** — evaluates all 11 catalog models against both current and proposed hardware, computing feasibility tiers, resource headroom, throughput estimates, and concurrent capacity.
105
+ 4. **Separates verified from projected** — uses catalog benchmark data when available (Verified), falls back to parameter-based estimation formulas when not (Projected), with clear inline labeling.
106
+ 5. **Generates a structured markdown report** — writes a before/after comparison report with executive summary, per-model analysis tables, newly feasible model identification, and upgrade recommendation to `docs/reports/`.
107
+ 6. **Degrades gracefully** — handles missing GPU fields (CPU-only path), missing benchmarks (Projected estimates), and unreadable config (all models treated as candidates).
108
+
109
+ ---
110
+
111
+ ## 4. Decisions Made
112
+
113
+ | # | Decision | Alternatives Considered | Choice & Rationale |
114
+ |---|----------|------------------------|---------------------|
115
+ | 1 | Blueprint scope | Patch existing skill vs. full rewrite | **Full rewrite from scratch** — user decision. Spec all 8 sections completely without referencing existing skill. |
116
+ | 2 | Question flow architecture | Separate flows per change type vs. unified flow | **Unified flow** — "Full system" subsumes individual selections. All change types ask the same base questions, with component-specific follow-ups appearing sequentially as needed. Simpler logic, less branching. |
117
+ | 3 | Comparison metrics | Subset of metrics vs. all 4 | **All 4 metrics** — feasibility tier, headroom percentage, throughput estimate, concurrent capacity. Executive summary provides high-level verdict. |
118
+ | 4 | Verified vs. Projected distinction | Color coding, separate tables, inline labels | **Inline labels** — "(Verified)" / "(Projected)" next to each data point. Methodology section has aggregate breakdown. Markdown has no color support. |
119
+ | 5 | Schema validation depth | Strict JSON Schema vs. lightweight existence checks | **Lightweight** — check existence of `hardware_profile`, `models` keys. Degrade gracefully for missing GPU fields. No strict schema enforcement. |
120
+ | 6 | Concurrent model loading | In scope vs. out of scope | **Out of scope** — not included in this blueprint. |
121
+ | 7 | Unified memory handling | In scope vs. out of scope | **Out of scope** — not included in this blueprint. |
122
+ | 8 | Glob tool usage | Include vs. exclude | **Exclude** — all file paths are known. No glob needed. |
123
+ | 9 | WebSearch for benchmarks | Required vs. degradable | **Degradable** — attempt WebSearch for benchmark data on proposed hardware. If unavailable or search fails, fall back to Projected estimates. Never block on search failure. |
124
+
125
+ ---
126
+
127
+ ## 5. Deliverable Specification
128
+
129
+ ### 5.1 File Details
130
+
131
+ - **File:** `.claude/skills/hardware-vetter/SKILL.md`
132
+ - **Type:** Claude Code skill file (markdown with imperative instructions)
133
+ - **Target length:** 600–700 lines
134
+ - **Tools used:** `Read`, `AskUserQuestion`, `Write`, `WebSearch` (degradable)
135
+
136
+ ### 5.2 Skill Metadata Block
137
+
138
+ ```markdown
139
+ ---
140
+ name: hardware-vetter
141
+ description: Evaluate proposed hardware changes against registered and candidate AI models
142
+ model: opus
143
+ ```
144
+
145
+ ### 5.3 Overall Skill Structure (8 sections in order)
146
+
147
+ 1. Critical Rules (top of file — highest attention)
148
+ 2. Purpose and Scope
149
+ 3. Data Loading
150
+ 4. Question Flow
151
+ 5. Analysis Engine
152
+ 6. Benchmark Search (degradable)
153
+ 7. Report Generation
154
+ 8. Completion
155
+
156
+ ### 5.4 Section 1: Critical Rules
157
+
158
+ Place these rules at the absolute top of SKILL.md, before any other content (after metadata block):
159
+
160
+ ```
161
+ ## CRITICAL RULES
162
+
163
+ - You MUST read `docs/model_catalog.json` and `src/pipeline/config.py` BEFORE asking any questions. NEVER ask the user for data that exists in these files.
164
+ - The RAM field in hardware_profile is `ram_gb`. NEVER reference `total_ram_gb`.
165
+ - The GPU VRAM fields are `gpu_vram_gb_q4` and `gpu_vram_gb_q8` ONLY. NEVER reference `gpu_vram_gb_fp16` — it does not exist.
166
+ - Match registered models from config.py to catalog via the `ollama_id` field. Config uses colon format (e.g., `qwen2.5:14b`), catalog uses `ollama_id` with the same colon format.
167
+ - Every numeric estimate MUST be labeled "(Verified)" if from catalog benchmarks or "(Projected)" if calculated from formulas.
168
+ - If `docs/model_catalog.json` cannot be read or parsed, STOP and tell the user. Do not proceed without the catalog.
169
+ - If `src/pipeline/config.py` cannot be read, treat ALL catalog models as candidates (no registered/candidate distinction). Warn the user and continue.
170
+ - NEVER ask the user to manually enter model specs, hardware specs, or benchmark numbers that exist in the catalog.
171
+ ```
172
+
173
+ ### 5.5 Section 2: Purpose and Scope
174
+
175
+ Brief section (10–15 lines) stating:
176
+ - This skill evaluates proposed hardware changes for the AI inference server
177
+ - Supports 4 change types: CPU upgrade, GPU addition, RAM change, full system replacement
178
+ - Reads all data from project files automatically
179
+ - Produces a before/after comparison report with per-model analysis
180
+ - Writes report to `docs/reports/`
181
+
182
+ ### 5.6 Section 3: Data Loading
183
+
184
+ #### Step 3.1: Read Model Catalog
185
+
186
+ ```
187
+ Use the Read tool to read `docs/model_catalog.json`.
188
+ Parse the JSON and extract:
189
+ - `hardware_profile` — the current server hardware specs
190
+ - `models` — all model entries (the object, not an array)
191
+ - `infrastructure_options` — for reference context
192
+
193
+ If the file cannot be read or JSON parsing fails, STOP immediately. Tell the user:
194
+ "Cannot read or parse docs/model_catalog.json. This file is required. Please ensure it exists and contains valid JSON."
195
+ ```
196
+
197
+ #### Step 3.2: Read Pipeline Config
198
+
199
+ ```
200
+ Use the Read tool to read `src/pipeline/config.py`.
201
+ Extract the three registered model Ollama IDs by finding these patterns:
202
+ - `default_model` assignment → extract the string value
203
+ - `fast_model` assignment → extract the string value
204
+ - `chat_model` assignment → extract the string value
205
+
206
+ These will be in Ollama colon format (e.g., "qwen2.5:14b").
207
+
208
+ If the file cannot be read or the patterns are not found:
209
+ - Warn the user: "Could not read src/pipeline/config.py. Proceeding with all models as candidates (no registered/candidate distinction)."
210
+ - Set registered_models to empty list
211
+ - Continue execution
212
+ ```
213
+
214
+ #### Step 3.3: Classify Models
215
+
216
+ ```
217
+ For each model in the catalog `models` object:
218
+ - If its `ollama_id` field matches any of the three extracted config IDs → classify as "registered"
219
+ - Otherwise → classify as "candidate"
220
+
221
+ Store two lists:
222
+ - registered_models: list of {key, display_name, ollama_id, role} where role is "default_model", "fast_model", or "chat_model"
223
+ - candidate_models: list of {key, display_name, ollama_id}
224
+ ```
225
+
226
+ #### Step 3.4: Detect GPU Capability
227
+
228
+ ```
229
+ Check whether GPU VRAM fields exist in any model's hardware_requirements:
230
+ - Look for `gpu_vram_gb_q4` in at least one model entry
231
+ - If present → set gpu_data_available = true
232
+ - If absent in ALL models → set gpu_data_available = false
233
+
234
+ Also check hardware_profile.gpu:
235
+ - If "none" → current_has_gpu = false
236
+ - Otherwise → current_has_gpu = true, parse GPU details
237
+ ```
238
+
239
+ ### 5.7 Section 4: Question Flow
240
+
241
+ All questions use `AskUserQuestion`. Present the data summary before the first question.
242
+
243
+ #### Step 4.1: Present Current Hardware Summary
244
+
245
+ Before asking anything, display to the user:
246
+
247
+ ```
248
+ **Current Hardware Profile:**
249
+ - Server: {hardware_profile.server_name}
250
+ - CPU: {hardware_profile.cpu_model} ({hardware_profile.cpu_cores} cores, {hardware_profile.cpu_clock_ghz_base}–{hardware_profile.cpu_clock_ghz_turbo} GHz)
251
+ - RAM: {hardware_profile.ram_gb} GB
252
+ - GPU: {hardware_profile.gpu}
253
+ - Storage: {hardware_profile.storage_type}
254
+
255
+ **Registered Models (from pipeline config):**
256
+ - Default: {display_name} ({ollama_id})
257
+ - Fast: {display_name} ({ollama_id})
258
+ - Chat: {display_name} ({ollama_id})
259
+
260
+ **Candidate Models (catalog, not currently registered):**
261
+ - {display_name} ({parameter_count}) — for each candidate
262
+ ```
263
+
264
+ #### Step 4.2: Ask Change Type
265
+
266
+ ```
267
+ AskUserQuestion:
268
+ "What type of hardware change are you evaluating?
269
+
270
+ 1. CPU upgrade — replacing the processor(s)
271
+ 2. GPU addition — adding a GPU to the system
272
+ 3. RAM change — increasing or changing system memory
273
+ 4. Full system replacement — replacing the entire server
274
+
275
+ Enter a number (1–4):"
276
+ ```
277
+
278
+ Valid responses: "1", "2", "3", "4" (or the text labels). Map to internal enum: `cpu_upgrade`, `gpu_addition`, `ram_change`, `full_system`.
279
+
280
+ If `full_system` is selected, ask ALL component questions below in sequence (4.3, 4.4, 4.5).
281
+ If a single component is selected, ask ONLY that component's question.
282
+
283
+ #### Step 4.3: CPU Details (asked for `cpu_upgrade` or `full_system`)
284
+
285
+ ```
286
+ AskUserQuestion:
287
+ "Describe the proposed CPU:
288
+
289
+ Please provide:
290
+ - CPU model name (e.g., Intel Xeon w5-3435X, AMD EPYC 9354)
291
+ - Core count
292
+ - Base clock speed (GHz)
293
+ - Turbo/boost clock speed (GHz)
294
+ - Architecture (e.g., Sapphire Rapids, Zen 4)
295
+
296
+ You can paste a spec line or describe it naturally."
297
+ ```
298
+
299
+ Parse the response to extract:
300
+ - `proposed_cpu_model`: string (full model name)
301
+ - `proposed_cpu_cores`: integer
302
+ - `proposed_cpu_clock_base`: float (GHz)
303
+ - `proposed_cpu_clock_turbo`: float (GHz)
304
+ - `proposed_cpu_architecture`: string
305
+
306
+ If any field cannot be parsed, ask a targeted follow-up:
307
+ ```
308
+ AskUserQuestion:
309
+ "I couldn't determine the {missing_field}. Could you specify the {human_readable_field_name}?"
310
+ ```
311
+
312
+ #### Step 4.4: GPU Details (asked for `gpu_addition` or `full_system`)
313
+
314
+ ```
315
+ AskUserQuestion:
316
+ "Describe the proposed GPU:
317
+
318
+ Please provide:
319
+ - GPU model name (e.g., NVIDIA RTX 4090, NVIDIA A100 80GB)
320
+ - VRAM amount (GB)
321
+ - Memory bandwidth (GB/s) if known
322
+ - Number of GPUs if adding multiple
323
+
324
+ You can paste a spec line or describe it naturally."
325
+ ```
326
+
327
+ Parse the response to extract:
328
+ - `proposed_gpu_model`: string
329
+ - `proposed_gpu_vram_gb`: float
330
+ - `proposed_gpu_bandwidth_gbs`: float or null
331
+ - `proposed_gpu_count`: integer (default 1)
332
+
333
+ If VRAM cannot be parsed, ask a targeted follow-up. Bandwidth is optional (used only for throughput estimates).
334
+
335
+ If `gpu_data_available` is false, warn:
336
+ ```
337
+ "Note: The model catalog does not yet include GPU VRAM requirements for the models. GPU analysis will use parameter-based estimates and all GPU results will be marked as Projected. For verified GPU analysis, complete Task 1 (Augment Model Catalog for Hardware Evaluation) first."
338
+ ```
339
+
340
+ #### Step 4.5: RAM Details (asked for `ram_change` or `full_system`)
341
+
342
+ ```
343
+ AskUserQuestion:
344
+ "Describe the proposed RAM configuration:
345
+
346
+ Please provide:
347
+ - Total RAM amount (GB)
348
+ - RAM type/speed if known (e.g., DDR5-4800, DDR4-3200)
349
+
350
+ You can enter just the amount (e.g., '256 GB') or include details."
351
+ ```
352
+
353
+ Parse:
354
+ - `proposed_ram_gb`: float
355
+ - `proposed_ram_type`: string or null
356
+
357
+ If amount cannot be parsed, ask a targeted follow-up.
358
+
359
+ #### Step 4.6: Build Proposed Hardware Profile
360
+
361
+ Construct `proposed_hardware` by copying the current `hardware_profile` and overriding changed fields:
362
+
363
+ - For `cpu_upgrade`: override cpu_model, cpu_cores, cpu_clock_ghz_base, cpu_clock_ghz_turbo, cpu_architecture
364
+ - For `gpu_addition`: override gpu with proposed GPU info string (e.g., "NVIDIA RTX 4090 24GB x1"), add gpu_vram_gb, gpu_count, gpu_bandwidth_gbs
365
+ - For `ram_change`: override ram_gb, optionally add ram_type
366
+ - For `full_system`: override all changed fields. Any component NOT specified stays as current.
367
+
368
+ #### Step 4.7: Confirm Proposed Configuration
369
+
370
+ ```
371
+ AskUserQuestion:
372
+ "Here is the proposed hardware configuration:
373
+
374
+ **Proposed Hardware:**
375
+ - Server: {server_name} {or 'New System' if full replacement}
376
+ - CPU: {proposed or current}
377
+ - RAM: {proposed or current} GB
378
+ - GPU: {proposed or current}
379
+
380
+ **Changes from current:**
381
+ - {list only changed components with before → after}
382
+
383
+ Does this look correct? (yes/no)"
384
+ ```
385
+
386
+ If "no", ask which component to correct, then re-ask that component's question. Loop until confirmed.
387
+
388
+ ### 5.8 Section 5: Analysis Engine
389
+
390
+ #### 5.8.1 Core Data Structures
391
+
392
+ For each model, compute two assessment objects — one for current hardware, one for proposed hardware:
393
+
394
+ ```
395
+ ModelAssessment:
396
+ model_key: string
397
+ display_name: string
398
+ parameter_count: string (e.g., "14.7B")
399
+ is_registered: boolean
400
+ role: string or null ("default_model", "fast_model", "chat_model")
401
+
402
+ current_hardware:
403
+ feasibility_tier: "Comfortable" | "Tight" | "Infeasible"
404
+ ram_required_gb: float (at recommended quantization)
405
+ ram_headroom_pct: float (positive = spare, negative = deficit)
406
+ gpu_fits: boolean or null (null if no GPU)
407
+ gpu_vram_required_gb: float or null
408
+ gpu_vram_headroom_pct: float or null
409
+ throughput_estimate: string (e.g., "~15 tok/s") or null
410
+ throughput_source: "Verified" | "Projected"
411
+ concurrent_capacity: integer (how many instances fit in memory)
412
+
413
+ proposed_hardware:
414
+ (same fields as current_hardware)
415
+
416
+ delta:
417
+ feasibility_changed: boolean
418
+ feasibility_before: string
419
+ feasibility_after: string
420
+ headroom_delta_pct: float
421
+ throughput_delta: string or null
422
+ concurrent_delta: integer
423
+ newly_feasible: boolean (was Infeasible, now Comfortable or Tight)
424
+ ```
425
+
426
+ #### 5.8.2 Feasibility Tier Calculation
427
+
428
+ For each model on each hardware configuration:
429
+
430
+ **Step A: Determine RAM requirement**
431
+ ```
432
+ ram_required = model.hardware_requirements.recommended_ram_gb
433
+ If recommended_ram_gb is missing, fall back to ram_gb_q4.
434
+ If ram_gb_q4 is also missing, estimate: ram_required = parameter_count_billions * 0.6
435
+ (This covers q4 quantization: ~0.5 bytes/param + overhead)
436
+ Mark as Projected if estimated.
437
+ ```
438
+
439
+ **Step B: Determine GPU VRAM requirement (if GPU present in hardware)**
440
+ ```
441
+ If hardware has GPU:
442
+ gpu_vram_required = model.hardware_requirements.gpu_vram_gb_q4
443
+ If gpu_vram_gb_q4 is missing:
444
+ gpu_vram_required = parameter_count_billions * 0.55
445
+ (q4 VRAM: ~0.5 bytes/param + small overhead)
446
+ Mark as Projected.
447
+ ```
448
+
449
+ **Step C: Compute feasibility tier**
450
+ ```
451
+ If hardware has GPU AND model.gpu_offload_support != "cpu_only_viable":
452
+ # GPU path: model can use GPU
453
+ If gpu_vram_required <= total_gpu_vram:
454
+ # Full GPU offload possible
455
+ resource_ratio = gpu_vram_required / total_gpu_vram
456
+ Else:
457
+ # Partial offload or CPU fallback — use RAM
458
+ resource_ratio = ram_required / hardware_ram_gb
459
+ Else:
460
+ # CPU-only path
461
+ resource_ratio = ram_required / hardware_ram_gb
462
+
463
+ Tier assignment:
464
+ resource_ratio <= 0.70 → "Comfortable" (30%+ headroom)
465
+ resource_ratio <= 0.95 → "Tight" (5–30% headroom)
466
+ resource_ratio > 0.95 → "Infeasible" (less than 5% headroom or deficit)
467
+ ```
468
+
469
+ **Step D: Compute headroom percentage**
470
+ ```
471
+ If GPU path:
472
+ headroom_pct = ((total_gpu_vram - gpu_vram_required) / total_gpu_vram) * 100
473
+ Else:
474
+ headroom_pct = ((hardware_ram_gb - ram_required) / hardware_ram_gb) * 100
475
+
476
+ Round to 1 decimal place.
477
+ Negative values mean deficit.
478
+ ```
479
+
480
+ #### 5.8.3 Throughput Estimation
481
+
482
+ **Verified path:**
483
+ ```
484
+ If model.raw_benchmarks exists AND contains throughput data (tokens_per_second or equivalent):
485
+ Use the benchmark value directly.
486
+ Label: "(Verified)"
487
+ ```
488
+
489
+ **Projected path (CPU-only):**
490
+ ```
491
+ If no benchmark data available:
492
+ # Base formula: smaller models are faster, more cores help
493
+ base_tokens_per_sec = (hardware_cpu_cores * hardware_cpu_clock_turbo) / (parameter_count_billions * 0.5)
494
+
495
+ # Apply architecture generation multiplier
496
+ # Newer architectures have better IPC
497
+ arch_multiplier = lookup from:
498
+ "Broadwell" → 1.0
499
+ "Skylake" / "Cascade Lake" → 1.15
500
+ "Ice Lake" → 1.25
501
+ "Sapphire Rapids" → 1.40
502
+ "Zen 3" → 1.20
503
+ "Zen 4" → 1.35
504
+ "Zen 5" → 1.45
505
+ unknown → 1.0
506
+
507
+ estimated_tokens_per_sec = base_tokens_per_sec * arch_multiplier
508
+ Round to nearest integer.
509
+ Label: "(Projected)"
510
+ Format: "~{N} tok/s"
511
+ ```
512
+
513
+ **Projected path (GPU):**
514
+ ```
515
+ If GPU present and model can offload:
516
+ If gpu_bandwidth_gbs is known:
517
+ # Memory bandwidth is the primary bottleneck for LLM inference
518
+ bytes_per_param_q4 = 0.5
519
+ model_size_gb = parameter_count_billions * bytes_per_param_q4
520
+ estimated_tokens_per_sec = gpu_bandwidth_gbs / (model_size_gb * 2)
521
+ # Factor of 2 accounts for KV cache and overhead
522
+ Else:
523
+ # Rough estimate without bandwidth
524
+ estimated_tokens_per_sec = (proposed_gpu_vram_gb * 10) / parameter_count_billions
525
+
526
+ Round to nearest integer.
527
+ Label: "(Projected)"
528
+ Format: "~{N} tok/s"
529
+ ```
530
+
531
+ **Important:** These formulas are rough order-of-magnitude estimates. The Projected label communicates this. Exact throughput depends on quantization method, batch size, context length, and other factors not modeled here.
532
+
533
+ #### 5.8.4 Concurrent Capacity
534
+
535
+ ```
536
+ If GPU path:
537
+ concurrent = floor(total_gpu_vram / gpu_vram_required)
538
+ Else:
539
+ # Reserve 20% of RAM for OS and system processes
540
+ available_ram = hardware_ram_gb * 0.80
541
+ concurrent = floor(available_ram / ram_required)
542
+
543
+ Minimum: 0 (if model doesn't fit at all)
544
+ ```
545
+
546
+ #### 5.8.5 Delta Computation
547
+
548
+ For each model, compute deltas between current and proposed assessments:
549
+ ```
550
+ feasibility_changed = (current.feasibility_tier != proposed.feasibility_tier)
551
+ headroom_delta_pct = proposed.ram_headroom_pct - current.ram_headroom_pct
552
+ (or GPU headroom if GPU path on proposed)
553
+ concurrent_delta = proposed.concurrent_capacity - current.concurrent_capacity
554
+ newly_feasible = (current.feasibility_tier == "Infeasible") AND (proposed.feasibility_tier != "Infeasible")
555
+
556
+ throughput_delta:
557
+ If both throughput values exist as numbers:
558
+ delta = proposed_tok_s - current_tok_s
559
+ Format: "+{N} tok/s" or "-{N} tok/s"
560
+ Else: null
561
+ ```
562
+
563
+ ### 5.9 Section 6: Benchmark Search (Degradable)
564
+
565
+ ```
566
+ For each model where throughput_source would be "Projected" on the proposed hardware:
567
+ Attempt a WebSearch query:
568
+ "{model_display_name} {proposed_cpu_or_gpu_model} inference benchmark tokens per second"
569
+
570
+ If WebSearch returns relevant results:
571
+ Extract tokens/s figure from results.
572
+ Update the proposed assessment:
573
+ throughput_estimate = the found value
574
+ throughput_source = "Verified"
575
+
576
+ If WebSearch fails, returns no results, or results are not clearly relevant:
577
+ Keep the Projected estimate. Do NOT block on search failure.
578
+ Do NOT ask the user for benchmark data.
579
+
580
+ Limit: Perform at most 5 WebSearch calls total (prioritize registered models first, then largest candidate models). Skip search for models under 3B parameters (estimates are sufficient for small models).
581
+ ```
582
+
583
+ ### 5.10 Section 7: Report Generation
584
+
585
+ #### 5.10.1 File Naming
586
+
587
+ ```
588
+ Date: current date in YYYY-MM-DD format
589
+
590
+ Slug rules by change type:
591
+ cpu_upgrade → "{server_name}-cpu-upgrade" (lowercase, spaces to hyphens)
592
+ gpu_addition → "{server_name}-{gpu_model_short}-addition" (e.g., "aos-ai-beta-rtx4090-addition")
593
+ ram_change → "{server_name}-ram-{proposed_gb}gb"
594
+ full_system → "{new_identifier}-full-replacement" (use server_name or "new-system" if unnamed)
595
+
596
+ All slugs: lowercase, alphanumeric and hyphens only, truncate to 60 chars.
597
+
598
+ Full path: docs/reports/hardware_eval_{date}_{slug}.md
599
+ ```
600
+
601
+ #### 5.10.2 Report Template Structure
602
+
603
+ The complete report has these sections in order:
604
+
605
+ ```markdown
606
+ # Hardware Evaluation Report: {Change Description}
607
+
608
+ **Date:** {YYYY-MM-DD}
609
+ **Evaluator:** Hardware Vetter Skill (Automated)
610
+ **Change Type:** {CPU Upgrade | GPU Addition | RAM Change | Full System Replacement}
611
+
612
+ ---
613
+
614
+ ## Executive Summary
615
+
616
+ {2–4 sentence summary covering:}
617
+ - What change was evaluated
618
+ - Overall verdict (Recommended / Recommended with Caveats / Not Recommended)
619
+ - Key impact (how many models affected, biggest gain)
620
+ - One critical caveat if any
621
+
622
+ ---
623
+
624
+ ## Hardware Comparison
625
+
626
+ | Component | Current | Proposed | Change |
627
+ |-----------|---------|----------|--------|
628
+ | CPU | {current} | {proposed or "No change"} | {description or "—"} |
629
+ | GPU | {current} | {proposed or "No change"} | {description or "—"} |
630
+ | RAM | {current} GB | {proposed or "No change"} GB | {"+N GB" or "—"} |
631
+ | Storage | {current} | {proposed or "No change"} | {description or "—"} |
632
+
633
+ ---
634
+
635
+ ## Registered Model Analysis
636
+
637
+ These models are currently configured in the pipeline (`src/pipeline/config.py`).
638
+
639
+ ### {Model Display Name} ({role})
640
+
641
+ **Ollama ID:** `{ollama_id}`
642
+ **Parameters:** {parameter_count}
643
+ **Recommended Quantization:** {recommended_quantization}
644
+
645
+ | Metric | Current Hardware | Proposed Hardware | Delta |
646
+ |--------|-----------------|-------------------|-------|
647
+ | Feasibility | {tier} | {tier} | {changed or "No change"} |
648
+ | RAM Required | {N} GB | {N} GB | — |
649
+ | RAM Headroom | {N}% | {N}% | {+/-N}% |
650
+ | GPU VRAM Required | {N} GB or N/A | {N} GB or N/A | {+/-N}% or N/A |
651
+ | GPU VRAM Headroom | {N}% or N/A | {N}% or N/A | {+/-N}% or N/A |
652
+ | Est. Throughput | {~N tok/s} ({source}) | {~N tok/s} ({source}) | {+/-N tok/s} |
653
+ | Concurrent Instances | {N} | {N} | {+/-N} |
654
+
655
+ {1–2 sentence interpretation of this model's results.}
656
+
657
+ {Repeat this subsection for each registered model — there will be 3.}
658
+
659
+ ---
660
+
661
+ ## Candidate Model Analysis
662
+
663
+ These models are in the catalog but not currently registered in the pipeline.
664
+
665
+ | Model | Params | Current Tier | Proposed Tier | Newly Feasible? | Headroom (Proposed) | Est. Throughput (Proposed) |
666
+ |-------|--------|-------------|---------------|-----------------|--------------------|----|
667
+ | {display_name} | {param_count} | {tier} | {tier} | {Yes/No} | {N}% | {~N tok/s} ({source}) |
668
+ {... one row per candidate model}
669
+
670
+ ### Newly Feasible Models
671
+
672
+ {If any models have newly_feasible == true:}
673
+
674
+ The following models become feasible with the proposed hardware:
675
+
676
+ - **{display_name}** ({parameter_count}) — moves from Infeasible to {tier}. {1 sentence on what this model offers, from its strengths field.}
677
+
678
+ {If no models become newly feasible:}
679
+
680
+ No additional models become feasible with the proposed hardware change.
681
+
682
+ ---
683
+
684
+ ## Upgrade Recommendation
685
+
686
+ **Verdict:** {Recommended | Recommended with Caveats | Not Recommended}
687
+
688
+ **Rationale:**
689
+ {3–5 bullet points covering:}
690
+ - Impact on registered models (performance gains or losses)
691
+ - Number of newly feasible candidate models
692
+ - Resource headroom changes
693
+ - Cost-benefit consideration (if full system, note scale of change)
694
+ - Any risks or caveats
695
+
696
+ **Suggested Next Steps:**
697
+ {2–3 actionable bullet points, e.g.:}
698
+ - Update model_catalog.json hardware_profile to reflect new hardware after installation
699
+ - Consider registering {model_name} as {role} given its feasibility on the new hardware
700
+ - Run actual benchmarks after installation to replace Projected estimates
701
+
702
+ ---
703
+
704
+ ## Methodology
705
+
706
+ **Data Sources:**
707
+ - Model specifications: `docs/model_catalog.json` (schema v{version})
708
+ - Registered models: `src/pipeline/config.py`
709
+ - Benchmark data: {list sources — "catalog benchmarks" and/or "web search results" and/or "none available"}
710
+
711
+ **Estimation Approach:**
712
+ - Feasibility tiers: Comfortable (>30% headroom), Tight (5–30%), Infeasible (<5%)
713
+ - RAM reservation: 20% of total RAM reserved for OS/system processes in concurrent capacity calculations
714
+ - Throughput estimates use {CPU formula description and/or GPU formula description}
715
+
716
+ **Data Confidence:**
717
+ - Verified data points: {N} of {total}
718
+ - Projected data points: {N} of {total}
719
+ - Projected estimates are order-of-magnitude approximations. Actual performance may vary significantly based on quantization method, context length, batch size, and system load.
720
+ ```
721
+
722
+ #### 5.10.3 Verdict Logic
723
+
724
+ ```
725
+ Compute verdict based on these rules (evaluate in order, first match wins):
726
+
727
+ "Not Recommended":
728
+ - Any registered model moves from Comfortable/Tight to Infeasible on proposed hardware
729
+ - OR all registered models show negative headroom_delta AND no newly feasible candidates
730
+
731
+ "Recommended with Caveats":
732
+ - Any registered model moves from Comfortable to Tight
733
+ - OR newly feasible count == 0 AND average headroom improvement < 10%
734
+ - OR gpu_data_available == false AND change_type == gpu_addition (can't fully evaluate the GPU without VRAM data)
735
+
736
+ "Recommended":
737
+ - All registered models stay at same or better tier
738
+ - AND (headroom improves OR newly feasible models > 0 OR throughput improves)
739
+ ```
740
+
741
+ #### 5.10.4 Write Report
742
+
743
+ ```
744
+ Use the Write tool to write the complete report to the computed file path.
745
+ After writing, inform the user:
746
+ "Report written to: {file_path}"
747
+ ```
748
+
749
+ ### 5.11 Section 8: Completion
750
+
751
+ ```
752
+ After writing the report, present a brief summary to the user:
753
+
754
+ "Hardware evaluation complete.
755
+
756
+ **Verdict:** {verdict}
757
+ **Report:** {file_path}
758
+
759
+ Key findings:
760
+ - {1-line summary for each registered model}
761
+ - {N} candidate model(s) become newly feasible
762
+ - {Any critical caveat}
763
+
764
+ The full report is available at the path above."
765
+ ```
766
+
767
+ ### 5.12 Error Handling Specification
768
+
769
+ | Error Condition | Behavior |
770
+ |----------------|----------|
771
+ | `docs/model_catalog.json` missing or unparseable | STOP. Inform user. Do not proceed. |
772
+ | `src/pipeline/config.py` missing or unparseable | WARN. Continue with all models as candidates. |
773
+ | `hardware_profile` key missing in catalog | STOP. Inform user catalog is malformed. |
774
+ | `models` key missing in catalog | STOP. Inform user catalog is malformed. |
775
+ | `gpu_vram_gb_q4` field missing on a model | Set gpu_vram_required to Projected estimate for that model. |
776
+ | `recommended_ram_gb` missing on a model | Fall back to `ram_gb_q4`. If also missing, use parameter-based estimate. Mark Projected. |
777
+ | `raw_benchmarks` missing or empty on a model | Use Projected throughput formula. |
778
+ | User enters unparseable hardware specs | Ask targeted follow-up for the specific missing field. Max 2 retries per field, then ask user to provide the complete spec in structured format. |
779
+ | WebSearch fails or returns irrelevant results | Silently keep Projected estimate. No user notification needed. |
780
+ | `docs/reports/` directory does not exist | Create it before writing the report. |
781
+ | Change type response not recognized | Re-ask the change type question with the numbered list. |
782
+ | User says "no" to confirmation | Ask which component to correct, re-ask that component's question. |
783
+
784
+ ### 5.13 Worked Example
785
+
786
+ **Scenario:** GPU Addition — NVIDIA RTX 4090 (24GB VRAM) added to the current system (192 GB RAM, no GPU).
787
+
788
+ **Model: qwen2.5-14b (14.7B params, registered as default_model)**
789
+
790
+ Current hardware (CPU-only):
791
+ ```
792
+ ram_required = recommended_ram_gb (from catalog, say 12 GB)
793
+ resource_ratio = 12 / 192 = 0.0625
794
+ feasibility_tier = "Comfortable" (0.0625 < 0.70)
795
+ headroom_pct = ((192 - 12) / 192) * 100 = 93.8%
796
+ concurrent = floor(192 * 0.80 / 12) = floor(12.8) = 12
797
+ throughput (Projected, CPU): base = (32 * 2.5) / (14.7 * 0.5) = 80 / 7.35 = 10.88
798
+ arch_multiplier (Broadwell) = 1.0
799
+ estimated = 10.88 * 1.0 = ~11 tok/s (Projected)
800
+ ```
801
+
802
+ Proposed hardware (GPU added, 24GB VRAM):
803
+ ```
804
+ gpu_offload_support = check catalog (say "full_offload")
805
+ gpu_vram_required = gpu_vram_gb_q4 (from catalog, say 9 GB)
806
+ 24 >= 9 → full GPU offload
807
+ resource_ratio = 9 / 24 = 0.375
808
+ feasibility_tier = "Comfortable" (0.375 < 0.70)
809
+ headroom_pct = ((24 - 9) / 24) * 100 = 62.5%
810
+ concurrent (GPU) = floor(24 / 9) = 2
811
+ throughput (Projected, GPU, no bandwidth given):
812
+ estimated = (24 * 10) / 14.7 = 16.3 → ~16 tok/s (Projected)
813
+ ```
814
+
815
+ Delta:
816
+ ```
817
+ feasibility_changed = false (Comfortable → Comfortable)
818
+ headroom_delta = 62.5 - 93.8 = -31.3% (NOTE: GPU headroom vs RAM headroom — different resource pools)
819
+ throughput_delta = +5 tok/s
820
+ concurrent_delta = 2 - 12 = -10 (but GPU vs RAM — different meaning)
821
+ newly_feasible = false
822
+ ```
823
+
824
+ **Note on the delta:** When comparing GPU headroom vs RAM headroom, the numbers describe different resource pools. The report should contextualize this: "GPU offload provides ~16 tok/s throughput vs ~11 tok/s CPU-only, but GPU VRAM limits concurrent instances to 2 (vs 12 in RAM). RAM remains available for additional CPU-only instances."
825
+
826
+ **Model: llama3.3-70b (70B params, candidate, currently Infeasible)**
827
+
828
+ Current hardware (CPU-only):
829
+ ```
830
+ ram_required = recommended_ram_gb (say 45 GB)
831
+ resource_ratio = 45 / 192 = 0.234
832
+ feasibility_tier = "Comfortable" (fits in RAM even at 70B with q4)
833
+ Note: This model is actually feasible in 192GB RAM — it's the throughput that's impractical, not the memory.
834
+ ```
835
+
836
+ This example illustrates that feasibility is about memory fit, not performance. The report's throughput column handles the performance dimension.
837
+
838
+ ### 5.14 Complete Question Flow Summary
839
+
840
+ ```
841
+ 1. Load data (silent — no user interaction)
842
+ 2. Present hardware summary
843
+ 3. Ask change type (1 question)
844
+ 4. Ask component details (1–3 questions depending on type)
845
+ 5. Present confirmation (1 question, may loop)
846
+ 6. Run analysis (silent)
847
+ 7. Run benchmark search (silent, degradable)
848
+ 8. Generate and write report (silent)
849
+ 9. Present completion summary
850
+ ```
851
+
852
+ Total user interactions: 3–6 questions (minimum for single component, maximum for full system with a correction).
853
+
854
+ ---
855
+
856
+ ## 6. Acceptance Mapping
857
+
858
+ | AC # | Criterion | How Addressed |
859
+ |------|-----------|---------------|
860
+ | 1 | Accepts all four hardware change types | Section 5.7, Step 4.2: explicit 4-option question with enum mapping. Full system asks all component questions sequentially. |
861
+ | 2 | Reads current specs from catalog and config without manual entry | Section 5.6: Data Loading reads both files automatically. Critical Rule: NEVER ask for data in files. |
862
+ | 3 | Fit/no-fit verdict with VRAM/RAM estimates per model | Section 5.8.2: feasibility tier (Comfortable/Tight/Infeasible) with exact RAM and VRAM requirement figures per model. |
863
+ | 4 | Before/after comparison with performance impact | Section 5.10.2: Hardware Comparison table + per-model tables with Current/Proposed/Delta columns. Delta computation in 5.8.5. |
864
+ | 5 | Identifies newly feasible candidate models | Section 5.8.5: `newly_feasible` flag computed per model. Section 5.10.2: "Newly Feasible Models" subsection. |
865
+ | 6 | Upgrade recommendation with rationale | Section 5.10.2: Upgrade Recommendation section with verdict + rationale bullets. Verdict logic in 5.10.3. |
866
+ | 7 | Projected vs Verified labeling | Section 5.8.3: all throughput labeled (Verified)/(Projected). Section 5.8.2: RAM/VRAM estimates labeled when from formula. Section 5.10.2: Methodology section with confidence counts. |
867
+ | 8 | Timestamped markdown file | Section 5.10.1: file naming with date + slug. Section 5.10.4: Write tool output. |
868
+
869
+ ---
870
+
871
+ ## 7. Integration Points
872
+
873
+ | Integration | File Path | Fields/Format | Direction |
874
+ |-------------|-----------|---------------|-----------|
875
+ | Model catalog | `docs/model_catalog.json` | JSON: `hardware_profile` (object), `models` (object of objects), each model has `hardware_requirements`, `ollama_id`, `display_name`, `parameter_count`, `raw_benchmarks`, `feasibility`, `strengths` | Read |
876
+ | Pipeline config | `src/pipeline/config.py` | Python: `default_model`, `fast_model`, `chat_model` string assignments in Pydantic Settings class. Values in Ollama colon format. | Read |
877
+ | Report output | `docs/reports/hardware_eval_{date}_{slug}.md` | Markdown file. Complete structure in Section 5.10.2. | Write |
878
+ | User interaction | AskUserQuestion tool | String prompts, string responses. Question flow in Section 5.7. | Interactive |
879
+ | Benchmark search | WebSearch tool | Query strings per Section 5.9. Returns web results. Degradable — failure is silent. | Read (external) |
880
+
881
+ ---
882
+
883
+ ## 8. Open Items
884
+
885
+ | Item | Type | Resolution |
886
+ |------|------|------------|
887
+ | Exact `recommended_ram_gb` values for each model | [VERIFY] at execution time | Read from catalog at runtime. Values documented in catalog; skill reads them dynamically. |
888
+ | Exact `raw_benchmarks` field contents | [VERIFY] at execution time | Skill checks for existence and extracts throughput if present. Format varies by model. |
889
+ | GPU VRAM fields may not yet exist (Task 1 dependency) | Execution-time degradation | Skill detects missing fields and falls back to Projected. Warning issued per Section 5.7 Step 4.4. |
890
+ | Architecture multiplier table completeness | Execution-time | Unknown architectures default to 1.0 multiplier. User's proposed CPU architecture is parsed and matched. |
891
+
892
+ No blocking open items remain. All items above are runtime verifications or graceful degradations already specified in the blueprint.
893
+
894
+ ---
895
+
896
+ ## 9. Producer Handoff
897
+
898
+ **Output format:** Single markdown file containing a Claude Code skill
899
+ **Producer:** Code Writer
900
+ **Filename:** `.claude/skills/hardware-vetter/SKILL.md`
901
+ **Target line count:** 600–700 lines
902
+
903
+ **Content blocks in order:**
904
+
905
+ 1. **Metadata block** (lines 1–4): YAML front matter with name, description, model
906
+ 2. **Critical Rules** (lines 5–20): 8 imperative rules, all-caps keywords, highest attention position
907
+ 3. **Purpose and Scope** (lines 21–35): Brief orientation, 4 change types, file paths
908
+ 4. **Data Loading** (lines 36–95): Steps 3.1–3.4, Read tool instructions, model classification, GPU detection
909
+ 5. **Question Flow** (lines 96–220): Steps 4.1–4.7, AskUserQuestion prompts (verbatim from blueprint), parsing instructions, confirmation loop
910
+ 6. **Analysis Engine** (lines 221–370): All formulas from 5.8.1–5.8.5, tier thresholds, throughput estimation (CPU + GPU paths), concurrent capacity, delta computation
911
+ 7. **Benchmark Search** (lines 371–400): WebSearch instructions, 5-query limit, priority order, failure handling
912
+ 8. **Report Generation** (lines 401–640): Complete report template from 5.10.2 (verbatim structure), file naming rules, verdict logic, Write tool instruction
913
+ 9. **Completion** (lines 641–660): Summary presentation format
914
+ 10. **Error Handling** (lines 661–700): Error table from 5.12, all conditions and behaviors
915
+
916
+ **Instruction tone guidance:** Write all instructions as second-person imperative directives to Claude. Use "You MUST", "NEVER", "ALWAYS" for non-negotiable rules. Use "You should" for preferred behaviors. No persona names, no first-person. Every instruction tells Claude exactly what to do, in what order, with what tools.