@tgoodington/intuition 8.1.3 → 9.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (154) hide show
  1. package/README.md +9 -9
  2. package/docs/project_notes/.project-memory-state.json +100 -0
  3. package/docs/project_notes/branches/.gitkeep +0 -0
  4. package/docs/project_notes/bugs.md +41 -0
  5. package/docs/project_notes/decisions.md +147 -0
  6. package/docs/project_notes/issues.md +101 -0
  7. package/docs/project_notes/key_facts.md +88 -0
  8. package/docs/project_notes/trunk/.gitkeep +0 -0
  9. package/docs/project_notes/trunk/.planning_research/decision_file_naming.md +15 -0
  10. package/docs/project_notes/trunk/.planning_research/decisions_log.md +32 -0
  11. package/docs/project_notes/trunk/.planning_research/orientation.md +51 -0
  12. package/docs/project_notes/trunk/audit/plan-rename-hitlist.md +654 -0
  13. package/docs/project_notes/trunk/blueprint-conflicts.md +109 -0
  14. package/docs/project_notes/trunk/blueprints/database-architect.md +416 -0
  15. package/docs/project_notes/trunk/blueprints/devops-infrastructure.md +514 -0
  16. package/docs/project_notes/trunk/blueprints/technical-writer.md +788 -0
  17. package/docs/project_notes/trunk/build_brief.md +119 -0
  18. package/docs/project_notes/trunk/build_report.md +250 -0
  19. package/docs/project_notes/trunk/detail_brief.md +94 -0
  20. package/docs/project_notes/trunk/plan.md +182 -0
  21. package/docs/project_notes/trunk/planning_brief.md +96 -0
  22. package/docs/project_notes/trunk/prompt_brief.md +60 -0
  23. package/docs/project_notes/trunk/prompt_output.json +98 -0
  24. package/docs/project_notes/trunk/scratch/database-architect-decisions.json +72 -0
  25. package/docs/project_notes/trunk/scratch/database-architect-research-plan.md +10 -0
  26. package/docs/project_notes/trunk/scratch/database-architect-stage1.md +226 -0
  27. package/docs/project_notes/trunk/scratch/devops-infrastructure-decisions.json +71 -0
  28. package/docs/project_notes/trunk/scratch/devops-infrastructure-research-plan.md +7 -0
  29. package/docs/project_notes/trunk/scratch/devops-infrastructure-stage1.md +164 -0
  30. package/docs/project_notes/trunk/scratch/technical-writer-decisions.json +88 -0
  31. package/docs/project_notes/trunk/scratch/technical-writer-research-plan.md +7 -0
  32. package/docs/project_notes/trunk/scratch/technical-writer-stage1.md +266 -0
  33. package/docs/project_notes/trunk/team_assignment.json +108 -0
  34. package/docs/project_notes/trunk/test_brief.md +75 -0
  35. package/docs/project_notes/trunk/test_report.md +26 -0
  36. package/docs/project_notes/trunk/verification/devops-infrastructure-verification.md +172 -0
  37. package/docs/v9/decision-framework-direction.md +142 -0
  38. package/docs/v9/decision-framework-implementation.md +114 -0
  39. package/docs/v9/domain-adaptive-team-architecture.md +1016 -0
  40. package/docs/v9/test/SESSION_SUMMARY.md +117 -0
  41. package/docs/v9/test/TEST_PLAN.md +119 -0
  42. package/docs/v9/test/blueprints/legal-analyst.md +166 -0
  43. package/docs/v9/test/output/07_cover_letter.md +41 -0
  44. package/docs/v9/test/phase2/mock_plan.md +89 -0
  45. package/docs/v9/test/phase2/producers.json +32 -0
  46. package/docs/v9/test/phase2/specialists/database-architect.specialist.md +10 -0
  47. package/docs/v9/test/phase2/specialists/financial-analyst.specialist.md +10 -0
  48. package/docs/v9/test/phase2/specialists/legal-analyst.specialist.md +10 -0
  49. package/docs/v9/test/phase2/specialists/technical-writer.specialist.md +10 -0
  50. package/docs/v9/test/phase2/team_assignment.json +61 -0
  51. package/docs/v9/test/phase3/blueprints/legal-analyst.md +840 -0
  52. package/docs/v9/test/phase3/legal-analyst-full.specialist.md +111 -0
  53. package/docs/v9/test/phase3/project_context/nh_landlord_tenant_notes.md +35 -0
  54. package/docs/v9/test/phase3/project_context/property_facts.md +32 -0
  55. package/docs/v9/test/phase3b/blueprints/legal-analyst.md +1715 -0
  56. package/docs/v9/test/phase3b/legal-analyst.specialist.md +153 -0
  57. package/docs/v9/test/phase3b/scratch/legal-analyst-stage1.md +270 -0
  58. package/docs/v9/test/phase4/TEST_PLAN.md +32 -0
  59. package/docs/v9/test/phase4/blueprints/financial-analyst-T2.md +538 -0
  60. package/docs/v9/test/phase4/blueprints/legal-analyst-T4.md +253 -0
  61. package/docs/v9/test/phase4/cross-blueprint-check.md +280 -0
  62. package/docs/v9/test/phase4/scratch/financial-analyst-T2-stage1.md +67 -0
  63. package/docs/v9/test/phase4/scratch/legal-analyst-T4-stage1.md +54 -0
  64. package/docs/v9/test/phase4/specialists/financial-analyst.specialist.md +156 -0
  65. package/docs/v9/test/phase4/specialists/legal-analyst.specialist.md +153 -0
  66. package/docs/v9/test/phase5/TEST_PLAN.md +35 -0
  67. package/docs/v9/test/phase5/blueprints/code-architect-hw-vetter.md +375 -0
  68. package/docs/v9/test/phase5/output/04_compliance_checklist.md +149 -0
  69. package/docs/v9/test/phase5/output/hardware-vetter-SKILL-v2.md +561 -0
  70. package/docs/v9/test/phase5/output/hardware-vetter-SKILL.md +459 -0
  71. package/docs/v9/test/phase5/producers/code-writer.producer.md +49 -0
  72. package/docs/v9/test/phase5/producers/document-writer.producer.md +62 -0
  73. package/docs/v9/test/phase5/regression-comparison-v2.md +60 -0
  74. package/docs/v9/test/phase5/regression-comparison.md +197 -0
  75. package/docs/v9/test/phase5/review-5A-specialist.md +213 -0
  76. package/docs/v9/test/phase5/specialist-test/TEST_PLAN.md +60 -0
  77. package/docs/v9/test/phase5/specialist-test/blueprint-comparison.md +252 -0
  78. package/docs/v9/test/phase5/specialist-test/blueprints/code-architect-hw-vetter.md +916 -0
  79. package/docs/v9/test/phase5/specialist-test/scratch/code-architect-stage1.md +427 -0
  80. package/docs/v9/test/phase5/specialists/code-architect.specialist.md +168 -0
  81. package/docs/v9/test/phase5b/TEST_PLAN.md +219 -0
  82. package/docs/v9/test/phase5b/blueprints/5B-10-stage2-with-decisions.md +286 -0
  83. package/docs/v9/test/phase5b/decisions/5B-2-accept-all-decisions.json +68 -0
  84. package/docs/v9/test/phase5b/decisions/5B-3-promote-decisions.json +70 -0
  85. package/docs/v9/test/phase5b/decisions/5B-4-individual-decisions.json +68 -0
  86. package/docs/v9/test/phase5b/decisions/5B-5-triage-decisions.json +110 -0
  87. package/docs/v9/test/phase5b/decisions/5B-6-fallback-decisions.json +40 -0
  88. package/docs/v9/test/phase5b/decisions/5B-8-partial-decisions.json +46 -0
  89. package/docs/v9/test/phase5b/decisions/5B-9-complete-decisions.json +54 -0
  90. package/docs/v9/test/phase5b/scratch/code-architect-stage1.md +133 -0
  91. package/docs/v9/test/phase5b/specialists/code-architect.specialist.md +202 -0
  92. package/docs/v9/test/phase5b/stage1-many-decisions.md +139 -0
  93. package/docs/v9/test/phase5b/stage1-no-assumptions.md +70 -0
  94. package/docs/v9/test/phase5b/stage1-with-assumptions.md +86 -0
  95. package/docs/v9/test/phase5b/test-5B-1-results.md +157 -0
  96. package/docs/v9/test/phase5b/test-5B-10-results.md +130 -0
  97. package/docs/v9/test/phase5b/test-5B-2-results.md +75 -0
  98. package/docs/v9/test/phase5b/test-5B-3-results.md +104 -0
  99. package/docs/v9/test/phase5b/test-5B-4-results.md +114 -0
  100. package/docs/v9/test/phase5b/test-5B-5-results.md +126 -0
  101. package/docs/v9/test/phase5b/test-5B-6-results.md +60 -0
  102. package/docs/v9/test/phase5b/test-5B-7-results.md +141 -0
  103. package/docs/v9/test/phase5b/test-5B-8-results.md +115 -0
  104. package/docs/v9/test/phase5b/test-5B-9-results.md +76 -0
  105. package/docs/v9/test/producers/document-writer.producer.md +62 -0
  106. package/docs/v9/test/specialists/legal-analyst.specialist.md +58 -0
  107. package/package.json +4 -2
  108. package/producers/code-writer/code-writer.producer.md +86 -0
  109. package/producers/data-file-writer/data-file-writer.producer.md +116 -0
  110. package/producers/document-writer/document-writer.producer.md +117 -0
  111. package/producers/form-filler/form-filler.producer.md +99 -0
  112. package/producers/presentation-creator/presentation-creator.producer.md +109 -0
  113. package/producers/spreadsheet-builder/spreadsheet-builder.producer.md +107 -0
  114. package/scripts/install-skills.js +97 -9
  115. package/scripts/uninstall-skills.js +7 -2
  116. package/skills/intuition-agent-advisor/SKILL.md +327 -220
  117. package/skills/intuition-assemble/SKILL.md +261 -0
  118. package/skills/intuition-build/SKILL.md +379 -319
  119. package/skills/intuition-debugger/SKILL.md +390 -390
  120. package/skills/intuition-design/SKILL.md +385 -381
  121. package/skills/intuition-detail/SKILL.md +377 -0
  122. package/skills/intuition-engineer/SKILL.md +307 -303
  123. package/skills/intuition-handoff/SKILL.md +264 -222
  124. package/skills/intuition-handoff/references/handoff_core.md +54 -54
  125. package/skills/intuition-initialize/SKILL.md +21 -6
  126. package/skills/intuition-initialize/references/agents_template.md +118 -118
  127. package/skills/intuition-initialize/references/claude_template.md +134 -134
  128. package/skills/intuition-initialize/references/intuition_readme_template.md +4 -4
  129. package/skills/intuition-initialize/references/state_template.json +17 -2
  130. package/skills/{intuition-plan → intuition-outline}/SKILL.md +561 -481
  131. package/skills/{intuition-plan → intuition-outline}/references/magellan_core.md +16 -16
  132. package/skills/{intuition-plan → intuition-outline}/references/templates/plan_template.md +6 -6
  133. package/skills/intuition-prompt/SKILL.md +374 -312
  134. package/skills/intuition-start/SKILL.md +46 -13
  135. package/skills/intuition-start/references/start_core.md +60 -60
  136. package/skills/intuition-test/SKILL.md +345 -0
  137. package/specialists/api-designer/api-designer.specialist.md +291 -0
  138. package/specialists/business-analyst/business-analyst.specialist.md +270 -0
  139. package/specialists/copywriter/copywriter.specialist.md +268 -0
  140. package/specialists/database-architect/database-architect.specialist.md +275 -0
  141. package/specialists/devops-infrastructure/devops-infrastructure.specialist.md +314 -0
  142. package/specialists/financial-analyst/financial-analyst.specialist.md +269 -0
  143. package/specialists/frontend-component/frontend-component.specialist.md +293 -0
  144. package/specialists/instructional-designer/instructional-designer.specialist.md +285 -0
  145. package/specialists/legal-analyst/legal-analyst.specialist.md +260 -0
  146. package/specialists/marketing-strategist/marketing-strategist.specialist.md +281 -0
  147. package/specialists/project-manager/project-manager.specialist.md +266 -0
  148. package/specialists/research-analyst/research-analyst.specialist.md +273 -0
  149. package/specialists/security-auditor/security-auditor.specialist.md +354 -0
  150. package/specialists/technical-writer/technical-writer.specialist.md +275 -0
  151. /package/skills/{intuition-plan → intuition-outline}/references/sub_agents.md +0 -0
  152. /package/skills/{intuition-plan → intuition-outline}/references/templates/confidence_scoring.md +0 -0
  153. /package/skills/{intuition-plan → intuition-outline}/references/templates/plan_format.md +0 -0
  154. /package/skills/{intuition-plan → intuition-outline}/references/templates/planning_process.md +0 -0
@@ -0,0 +1,427 @@
1
+ # Code Architect Stage 1: Hardware Vetter Skill — Research Findings
2
+
3
+ **Task:** Task 2 — Build the Hardware Vetter Claude Code Skill
4
+ **Date:** 2026-02-27
5
+ **Status:** Research complete, ready for Stage 2 code spec authoring
6
+
7
+ ---
8
+
9
+ ## Codebase Research Summary
10
+
11
+ ### Files Read
12
+
13
+ | File | Path | Key Data |
14
+ |------|------|----------|
15
+ | Model Catalog | `C:/Projects/District_AI_Agent_Implementation/docs/model_catalog.json` | 1003 lines, schema v1.2, 11 models, hardware_profile, infrastructure_options |
16
+ | Pipeline Config | `C:/Projects/District_AI_Agent_Implementation/src/pipeline/config.py` | Pydantic Settings class, 3 registered Ollama model IDs |
17
+ | Existing Skill Stub | `C:/Projects/District_AI_Agent_Implementation/.claude/skills/hardware-vetter/SKILL.md` | 708 lines — fully written skill already exists |
18
+ | Existing Report | `C:/Projects/District_AI_Agent_Implementation/docs/reports/hardware_eval_2026-02-20_thinkstation-pgx-addition.md` | Prior evaluation output — demonstrates report format in practice |
19
+ | Project Settings | `C:/Projects/District_AI_Agent_Implementation/.claude/settings.local.json` | Permissions config, tool allowlist |
20
+ | Intuition Skill Pattern | `C:/Projects/Intuition/skills/intuition-start/SKILL.md` | Frontmatter format reference |
21
+
22
+ ---
23
+
24
+ ## CRITICAL FINDING: Skill Already Exists
25
+
26
+ The file `C:/Projects/District_AI_Agent_Implementation/.claude/skills/hardware-vetter/SKILL.md` is **already a complete, 708-line skill implementation**. It is not a stub. It contains:
27
+
28
+ - Full YAML frontmatter (`name: hardware-vetter`, `model: sonnet`, `tools: Read, WebSearch, AskUserQuestion, Write, Glob`)
29
+ - 8 numbered sections covering the entire workflow
30
+ - Detailed data loading with exact field names from the catalog
31
+ - Question flow with AskUserQuestion for all 4 change types
32
+ - Complete analysis methodology for GPU-primary and CPU-only paths
33
+ - Benchmark search protocol with 8-call cap
34
+ - Full report template with all required sections
35
+ - 7 error handling scenarios with graceful degradation
36
+ - Completion message format
37
+
38
+ **This means Task 2's primary deliverable already exists.** The engineering work should focus on review, verification against acceptance criteria, and any gaps — not building from scratch.
39
+
40
+ ---
41
+
42
+ ## ECD Exploration
43
+
44
+ ### Elements (E) — Building Blocks
45
+
46
+ #### E1: Skill File Structure
47
+
48
+ **Single file:** `C:/Projects/District_AI_Agent_Implementation/.claude/skills/hardware-vetter/SKILL.md`
49
+
50
+ **Frontmatter (YAML):**
51
+ ```yaml
52
+ ---
53
+ name: hardware-vetter
54
+ description: Evaluate proposed hardware changes against the AI server's model lineup
55
+ model: sonnet
56
+ tools: Read, WebSearch, AskUserQuestion, Write, Glob
57
+ ---
58
+ ```
59
+
60
+ No additional files in the skill directory — everything is in SKILL.md per the Intuition project's "Reference File Problem" rule (only SKILL.md is injected into context).
61
+
62
+ #### E2: Data Structures — Model Catalog (`docs/model_catalog.json`)
63
+
64
+ **Schema version:** `1.2`
65
+
66
+ **Top-level keys:**
67
+ - `schema_version` (string): "1.2"
68
+ - `generated_at` (string, ISO 8601)
69
+ - `description` (string)
70
+ - `task_categories` (array of 5 strings): `reasoning_analysis`, `code_generation`, `instruction_following_tool_use`, `conversational_quality`, `summarization`
71
+ - `benchmarks` (object): 7 benchmark definitions with `full_name`, `scale`, `higher_is_better`, optional `normalization`
72
+ - `normalization_method` (string)
73
+ - `hardware_profile` (object) — current server specs
74
+ - `infrastructure_options` (object) — evaluated PGX configs
75
+ - `recommended_models` (array of 11 strings) — model keys in recommendation order
76
+ - `models` (object) — 11 model entries keyed by slug
77
+
78
+ **`hardware_profile` exact fields:**
79
+ ```json
80
+ {
81
+ "server_name": "aos-ai-beta",
82
+ "ram_gb": 192,
83
+ "cpu_model": "2x Intel Xeon E5-2620 v4",
84
+ "cpu_cores": 32,
85
+ "cpu_clock_ghz_base": 2.1,
86
+ "cpu_clock_ghz_turbo": 2.5,
87
+ "cpu_architecture": "Broadwell",
88
+ "storage_type": "NVMe",
89
+ "storage_gb": "TBD",
90
+ "gpu": "none",
91
+ "notes": "CPU-only inference, no GPU available. 32 threads total (16 per socket)."
92
+ }
93
+ ```
94
+
95
+ Note: The skill references `total_ram_gb` in section 2.1 but the actual catalog field is `ram_gb`. This is a **field name mismatch** that should be flagged.
96
+
97
+ **Per-model `hardware_requirements` exact fields (all 11 models have these):**
98
+ ```json
99
+ {
100
+ "ram_gb_q4": <number>,
101
+ "ram_gb_q8": <number>,
102
+ "ram_gb_fp16": <number>,
103
+ "recommended_ram_gb": <number>,
104
+ "recommended_quantization": "q4_K_M",
105
+ "gpu_vram_gb_q4": <number>,
106
+ "gpu_vram_gb_q8": <number>,
107
+ "gpu_offload_support": "<string>"
108
+ }
109
+ ```
110
+
111
+ **GPU fields are ALREADY PRESENT in all 11 models.** The task dependency note says "CPU/RAM-only evaluation works without it" and Task 1 "adds GPU VRAM fields," but the catalog already contains `gpu_vram_gb_q4`, `gpu_vram_gb_q8`, and `gpu_offload_support` for every model. Either Task 1 was already completed, or these fields were added during initial catalog creation.
112
+
113
+ **`gpu_offload_support` values observed:**
114
+ - `"cpu_only_viable"` — tinyllama-1.1b (too small for GPU to help)
115
+ - `"full_offload"` — llama3.2-1b, llama3.2-3b, phi3-3.8b, qwen2.5-7b, mistral-7b, gemma2-9b, qwen2.5-14b, llama3.1-8b
116
+ - `"partial_offload"` — mixtral-8x22b (141B), llama3.3-70b (70B)
117
+
118
+ **Per-model `feasibility` values observed:**
119
+ - `"runs_comfortably"` — ALL 11 models (on the current 192 GB RAM server for CPU inference)
120
+
121
+ **Other per-model fields used by the skill:**
122
+ - `display_name` (string)
123
+ - `ollama_id` (string) — used for matching to config.py
124
+ - `parameter_count` (string, e.g., "7B", "14.7B", "141B")
125
+ - `size_tier` (string): "small", "medium", "large", "xlarge"
126
+ - `default_quantization` (string) — same as `recommended_quantization` in `hardware_requirements`
127
+ - `strengths` (array of strings)
128
+ - `weaknesses` (array of strings)
129
+ - `raw_benchmarks` (object) — per-benchmark scores
130
+ - `category_scores` (object) — per-category normalized scores
131
+
132
+ #### E3: Data Structures — Pipeline Config (`src/pipeline/config.py`)
133
+
134
+ **Registered model identifiers (exact lines from file):**
135
+ ```python
136
+ default_model: str = "qwen2.5:14b"
137
+ fast_model: str = "qwen2.5:7b"
138
+ chat_model: str = "llama3.1:8b"
139
+ ```
140
+
141
+ **Mapping to catalog entries:**
142
+ | Config Field | Ollama ID | Catalog Key | Display Name |
143
+ |-------------|-----------|-------------|--------------|
144
+ | `default_model` | `qwen2.5:14b` | `qwen2.5-14b` | Qwen 2.5 14B |
145
+ | `fast_model` | `qwen2.5:7b` | `qwen2.5-7b` | Qwen 2.5 7B |
146
+ | `chat_model` | `llama3.1:8b` | `llama3.1-8b` | Llama 3.1 8B |
147
+
148
+ **Important:** The Ollama IDs in config.py use colon format (`qwen2.5:14b`) while catalog keys use hyphen format (`qwen2.5-14b`) and the catalog's `ollama_id` field uses colon format (`qwen2.5:14b`). The skill correctly instructs matching via the `ollama_id` field.
149
+
150
+ #### E4: Report Output Structure
151
+
152
+ **Output directory:** `C:/Projects/District_AI_Agent_Implementation/docs/reports/`
153
+ **Naming convention:** `hardware_eval_YYYY-MM-DD_[slug].md`
154
+ **Existing report:** `hardware_eval_2026-02-20_thinkstation-pgx-addition.md`
155
+
156
+ #### E5: Model Count
157
+
158
+ The catalog contains exactly **11 models:**
159
+ 1. tinyllama-1.1b (1.1B)
160
+ 2. llama3.2-1b (1B)
161
+ 3. llama3.2-3b (3B)
162
+ 4. phi3-3.8b (3.8B)
163
+ 5. qwen2.5-7b (7B) — **registered** as `fast_model`
164
+ 6. mistral-7b (7B)
165
+ 7. gemma2-9b (9B)
166
+ 8. llama3.1-8b (8B) — **registered** as `chat_model`
167
+ 9. qwen2.5-14b (14.7B) — **registered** as `default_model`
168
+ 10. llama3.3-70b (70B)
169
+ 11. mixtral-8x22b (141B)
170
+
171
+ 3 registered, 8 candidates.
172
+
173
+ ---
174
+
175
+ ### Connections (C) — How Components Relate
176
+
177
+ #### C1: Data Flow
178
+
179
+ ```
180
+ model_catalog.json ──Read──→ Skill extracts hardware_profile + 11 model entries
181
+ config.py ──Read──→ Skill extracts 3 registered Ollama IDs
182
+
183
+ Match Ollama IDs to catalog entries
184
+
185
+ AskUserQuestion → collect proposed hardware
186
+
187
+ Analysis engine (GPU-primary or CPU-only path)
188
+
189
+ WebSearch → benchmark lookup (up to 8 calls)
190
+
191
+ Write → docs/reports/hardware_eval_YYYY-MM-DD_[slug].md
192
+ ```
193
+
194
+ #### C2: Field Reference Chain
195
+
196
+ The skill references catalog fields at these specific points:
197
+ - **Section 2.1:** `hardware_profile.total_ram_gb` → [MISMATCH] actual field is `hardware_profile.ram_gb`
198
+ - **Section 2.1:** `hardware_profile.cpu_model`, `cpu_cores`, `cpu_clock_ghz_base`, `cpu_clock_ghz_turbo`, `cpu_architecture`, `gpu`, `storage_type`
199
+ - **Section 2.2:** `models[*].ollama_id` → matched against config.py values
200
+ - **Section 4.2a:** `models[*].hardware_requirements.recommended_quantization`, `gpu_vram_gb_q4`, `gpu_vram_gb_q8`, `ram_gb_q4`
201
+ - **Section 4.2b:** `gpu_offload_support` checked for partial offload viability
202
+ - **Section 4.3a:** `models[*].hardware_requirements.ram_gb_q4`, `ram_gb_q8`, `ram_gb_fp16`
203
+ - **Section 4.4:** `models[*].feasibility` for current tier
204
+ - **Section 4.5:** `models[*].feasibility` for expansion analysis
205
+ - **Section 6.2:** `schema_version` for report metadata
206
+
207
+ #### C3: Skill ←→ Project Relationship
208
+
209
+ The skill is a Claude Code skill registered in `C:/Projects/District_AI_Agent_Implementation/.claude/skills/hardware-vetter/`. It runs within the District AI Agent Implementation project context. It reads project files and writes reports within that project's `docs/reports/` directory.
210
+
211
+ #### C4: Tool Dependencies
212
+
213
+ | Tool | Usage | Critical? |
214
+ |------|-------|-----------|
215
+ | `Read` | Load model_catalog.json and config.py | Yes — skill cannot function without |
216
+ | `AskUserQuestion` | Collect hardware change details from user | Yes — interactive skill |
217
+ | `WebSearch` | Find published benchmarks | No — degrades to "Projected" estimates |
218
+ | `Write` | Output the report | Yes — primary deliverable |
219
+ | `Glob` | Listed in frontmatter but not used in skill body | No — [VERIFY] may be unused |
220
+
221
+ #### C5: infrastructure_options Section
222
+
223
+ The catalog contains a detailed `infrastructure_options` section (lines 64-192) with specific ThinkStation PGX configurations and verified throughput data. The current skill **does not explicitly reference** this section. The existing report (`hardware_eval_2026-02-20_thinkstation-pgx-addition.md`) used this data extensively. This means:
224
+ - For PGX-related evaluations, the `verified_throughput_per_unit` data could be used instead of WebSearch
225
+ - The skill's benchmark search could be enhanced to check the catalog's own throughput data first
226
+ - This is a potential improvement but not a gap — the skill works without it
227
+
228
+ ---
229
+
230
+ ### Dynamics (D) — Execution Flow & Edge Cases
231
+
232
+ #### D1: Step-by-Step Execution Flow
233
+
234
+ 1. **Data Loading** (Steps 2.1, 2.2)
235
+ - Read `docs/model_catalog.json` → extract `hardware_profile` + all 11 models
236
+ - Read `src/pipeline/config.py` → extract 3 registered Ollama IDs
237
+ - Match IDs to catalog → classify 3 registered + 8 candidate
238
+ - Validation gates: catalog must exist, hardware_profile must be present
239
+
240
+ 2. **Question Flow** (Steps 3.1-3.3)
241
+ - AskUserQuestion (multiSelect) → change type: CPU/GPU/RAM/Full system
242
+ - Sequential component follow-ups based on selection
243
+ - Confirmation table → user approves or revises
244
+
245
+ 3. **Analysis** (Steps 4.1-4.5)
246
+ - Determine path: GPU-primary (if GPU proposed + GPU fields exist) or CPU-only
247
+ - For each of 11 models: calculate feasibility tier, resource usage, headroom, concurrent capacity
248
+ - Before/After comparison for 3 registered models
249
+ - Candidate expansion analysis for 8 non-registered models
250
+
251
+ 4. **Benchmark Search** (Steps 5.1-5.4)
252
+ - Prioritize: registered models first, then candidates by parameter count
253
+ - Up to 8 WebSearch calls
254
+ - Upgrade estimates from "Projected" to "Verified" when match found
255
+
256
+ 5. **Report Writing** (Steps 6.1-6.3)
257
+ - Construct filename with date + slug
258
+ - Write full report to `docs/reports/`
259
+
260
+ 6. **Completion** (Step 8)
261
+ - Display summary message with report path and key findings
262
+
263
+ #### D2: Error/Edge Cases Handled in Skill
264
+
265
+ The skill already handles 7 error scenarios:
266
+ 1. **Model catalog missing** → STOP with error
267
+ 2. **GPU fields missing from catalog** → CPU-only analysis path + header note
268
+ 3. **Config file unreadable** → All 11 models as candidates
269
+ 4. **Benchmark search returns nothing** → "Projected" estimates
270
+ 5. **Unrecognized hardware** → Ask user for specs directly
271
+ 6. **User proposes downgrade** → Analyze without judgment, flag in recommendation
272
+ 7. **Partial offload complexity** → Conservative estimates, always "runs_constrained"
273
+
274
+ #### D3: Edge Cases NOT Handled
275
+
276
+ 1. **Unified memory architectures** (like the PGX/DGX Spark) — the existing report (`hardware_eval_2026-02-20_thinkstation-pgx-addition.md`) handled this by adding a framing note about unified memory, but the skill's analysis methodology assumes discrete GPU VRAM separate from system RAM. The skill does not have instructions for handling `memory_type: "LPDDR5x unified"` or similar.
277
+
278
+ 2. **Multi-GPU configurations** — the skill asks for "a GPU" (singular). No path for multi-GPU setups, tensor parallelism, or NVLink configurations. The existing report's infrastructure_options section has detailed multi-unit configs, but the skill cannot evaluate these.
279
+
280
+ 3. **Adding a new dedicated node vs upgrading existing** — the existing report evaluated a ThinkStation PGX as an **additive node**, not a replacement. The skill's question flow only covers upgrades to the existing server, not "add a separate inference machine."
281
+
282
+ 4. **Model not in catalog** — the skill only evaluates the 11 models in the catalog. If the user wants to evaluate a model not in the catalog, there's no path for that.
283
+
284
+ 5. **Concurrent model loading** — the feasibility analysis evaluates models individually. It doesn't analyze the memory impact of loading multiple models simultaneously (which is the real-world scenario for the Ollama server).
285
+
286
+ 6. **`gpu_vram_gb_fp16` field** — the skill references this in Step 4.2a but it does not exist in the catalog. Only `ram_gb_fp16` exists. For FP16 GPU inference, the skill would need to use `ram_gb_fp16` as a proxy or note the gap.
287
+
288
+ #### D4: Graceful Degradation Path
289
+
290
+ ```
291
+ Full analysis (GPU + CPU + benchmarks)
292
+ ↓ GPU fields missing
293
+ CPU-only analysis + benchmarks
294
+ ↓ Benchmarks return nothing
295
+ CPU-only analysis with projected estimates
296
+ ↓ Config.py unreadable
297
+ All 11 models as candidates, projected estimates
298
+ ↓ Model catalog missing
299
+ STOP — cannot proceed
300
+ ```
301
+
302
+ ---
303
+
304
+ ## Engineering Questions — Answers
305
+
306
+ ### Q1: How should the skill's question flow adapt when the user specifies an entire system replacement vs a single component upgrade?
307
+
308
+ **Answer from codebase research:** The existing skill (Section 3.1) handles this with clear branching logic:
309
+ - "Full system replacement" selected → ask ALL component follow-ups (CPU, GPU, RAM) sequentially
310
+ - Multiple individual components → ask follow-ups for each selected component in order: CPU, GPU, RAM
311
+ - Single component → ask only that component's follow-up
312
+
313
+ The question flow does NOT change structurally — it just includes/excludes component follow-up blocks based on selection. This is simple and correct.
314
+
315
+ **One gap:** For full system replacement, the user might want to specify a complete pre-built system (e.g., "Dell PowerEdge R760xa with dual A6000 GPUs") rather than specifying each component separately. The current flow forces component-by-component entry even for known systems. Consider adding a "known system" shortcut path.
316
+
317
+ ### Q2: What comparison metrics should the before/after analysis prioritize?
318
+
319
+ **Answer from codebase research:** The existing skill (Section 4.4) defines 4 metrics:
320
+ 1. **Feasibility tier** — most important (determines if model can run at all)
321
+ 2. **Resource headroom %** — capacity margin
322
+ 3. **Estimated throughput (tok/s)** — user-visible performance
323
+ 4. **Concurrent capacity** — multi-user scaling
324
+
325
+ The existing report demonstrates these work well in practice. Throughput is the most user-visible metric. The prioritization order should be: feasibility tier first (binary gate), then throughput (user experience), then concurrent capacity (scaling), then headroom (engineering margin).
326
+
327
+ ### Q3: How should the report visually distinguish "verified" benchmark data from "projected" spec-based estimates?
328
+
329
+ **Answer from codebase research:** The existing skill uses two mechanisms:
330
+ 1. A `Confidence` column in the feasibility matrix with values "Verified" or "Projected"
331
+ 2. **Bold** formatting on verified throughput values in the feasibility matrix (seen in the existing report: `**38 tok/s**` for verified vs `~50 tok/s` for projected)
332
+
333
+ The existing report also uses the `**Verified**` / `Projected` labels in per-model comparison tables. This approach works well and is already consistent across the skill.
334
+
335
+ ### Q4: What file naming convention and output directory should reports use?
336
+
337
+ **Answer from codebase research:** Already specified in the skill (Section 6.1):
338
+ - **Directory:** `docs/reports/`
339
+ - **Format:** `hardware_eval_YYYY-MM-DD_[slug].md`
340
+ - **Slug derivation rules:**
341
+ - GPU addition: `[gpu-model]-addition`
342
+ - RAM upgrade: `ram-[capacity]gb-upgrade`
343
+ - CPU upgrade: `cpu-[family-shorthand]-upgrade`
344
+ - Full system: `full-system-[key-identifier]`
345
+ - Multiple components: combine key elements
346
+ - **Existing precedent:** `hardware_eval_2026-02-20_thinkstation-pgx-addition.md`
347
+
348
+ This convention is good. No changes needed.
349
+
350
+ ### Q5: Should the skill validate the schema before proceeding, or trust the data?
351
+
352
+ **Answer from codebase research:** The existing skill uses a **lightweight validation** approach (Section 2.1 "Validation gates"):
353
+ - Checks if `model_catalog.json` exists and is readable
354
+ - Checks if `hardware_profile` section is present
355
+ - Checks if GPU fields exist (degrades gracefully if not)
356
+ - Does NOT validate schema version, field types, or data integrity
357
+
358
+ This is the right approach for a Claude Code skill. Full JSON schema validation would require code execution (not available). The skill reads data and gracefully degrades when fields are missing. Adding `schema_version` checking would be low-value — if the schema changes significantly, the skill's field references will naturally fail and the error handling will catch it.
359
+
360
+ ---
361
+
362
+ ## Acceptance Criteria Verification
363
+
364
+ | # | Criterion | Status | Evidence |
365
+ |---|-----------|--------|----------|
366
+ | 1 | Accepts all four hardware change types | MET | Section 3.1: CPU upgrade, GPU addition, RAM change, Full system replacement via multiSelect |
367
+ | 2 | Reads current hardware specs and model data from catalog + config.py | MET | Section 2.1 reads model_catalog.json, Section 2.2 reads config.py. One field name mismatch: `total_ram_gb` vs actual `ram_gb` |
368
+ | 3 | Fit/no-fit verdict per model with VRAM/RAM estimates | MET | Section 4.2/4.3: 3-tier feasibility with resource usage % and headroom % |
369
+ | 4 | Before/after comparison for registered models | MET | Section 4.4: 4-metric comparison table per registered model |
370
+ | 5 | Identifies candidate models that become feasible | MET | Section 4.5: explicit expansion analysis |
371
+ | 6 | Upgrade recommendation with rationale | MET | Section 6.2 report template: Executive Summary + Recommendation section |
372
+ | 7 | Projected vs Verified labeling | MET | "Projected" default, "Verified" when benchmark matches (Section 5.3) |
373
+ | 8 | Timestamped markdown report | MET | Section 6.1: `hardware_eval_YYYY-MM-DD_[slug].md` in `docs/reports/` |
374
+
375
+ ---
376
+
377
+ ## Issues Found
378
+
379
+ ### Issue 1: Field Name Mismatch (Minor)
380
+
381
+ **Location:** SKILL.md Section 2.1
382
+ **Problem:** References `hardware_profile.total_ram_gb` but the actual catalog field is `hardware_profile.ram_gb`
383
+ **Impact:** Claude (sonnet) may adapt at runtime since it reads the actual JSON, but the instruction is technically wrong. Could cause confusion if Claude follows the instruction literally and looks for a field that doesn't exist.
384
+ **Fix:** Change `total_ram_gb` to `ram_gb` in Section 2.1.
385
+
386
+ ### Issue 2: `gpu_vram_gb_fp16` Reference (Minor)
387
+
388
+ **Location:** SKILL.md Section 4.2a
389
+ **Problem:** References `gpu_vram_gb_fp16` as a possible field, but this field does not exist in any model's `hardware_requirements`. Only `ram_gb_fp16` exists for FP16.
390
+ **Impact:** Low — no model uses FP16 as `recommended_quantization`, so this code path would never trigger. But it's technically incorrect.
391
+ **Fix:** Remove the FP16 case from GPU path, or note that FP16 GPU inference uses `ram_gb_fp16` as the VRAM estimate.
392
+
393
+ ### Issue 3: `Glob` Tool Listed but Unused (Trivial)
394
+
395
+ **Location:** SKILL.md frontmatter
396
+ **Problem:** `Glob` is listed in the tools but never referenced in the skill body.
397
+ **Impact:** No functional impact — just a slightly larger tool surface than needed.
398
+ **Fix:** Remove `Glob` from tools list, or add a use case (e.g., checking if `docs/reports/` directory exists before writing).
399
+
400
+ ### Issue 4: No Handling of Unified Memory Architectures
401
+
402
+ **Location:** Section 4.2 (GPU-Primary Analysis Path)
403
+ **Problem:** The analysis assumes discrete GPU with separate VRAM and system RAM. Systems like the NVIDIA DGX Spark/ThinkStation PGX use unified memory where there is no VRAM/RAM distinction. The existing report handled this with manual framing notes, but the skill has no built-in logic for it.
404
+ **Impact:** The skill will produce incorrect analysis for unified memory systems — it would compare model's `gpu_vram_gb_q4` against the unified memory pool when it should compare `ram_gb_q4` (since all memory is GPU-accessible).
405
+ **Fix:** Add a sub-path in Section 4.2 for unified memory architectures. When the user specifies a unified memory system, use total unified memory as the GPU capacity and skip partial offload calculations entirely.
406
+
407
+ ### Issue 5: No Multi-Model Concurrent Loading Analysis
408
+
409
+ **Location:** Sections 4.2-4.3
410
+ **Problem:** Each model is evaluated independently. The real-world scenario involves loading multiple models simultaneously (the server runs 3 registered models concurrently). The skill never calculates "can all 3 registered models fit in memory at the same time?"
411
+ **Impact:** Medium — a configuration could show all models as `runs_comfortably` individually but not actually fit all registered models simultaneously.
412
+ **Fix:** Add a "concurrent loading check" subsection that sums the RAM/VRAM requirements of the 3 registered models and checks against total capacity.
413
+
414
+ ---
415
+
416
+ ## Recommendations for Code Spec
417
+
418
+ Given that the skill already exists and is comprehensive, the code spec should focus on:
419
+
420
+ 1. **Fix the `ram_gb` field name mismatch** (Issue 1)
421
+ 2. **Fix the `gpu_vram_gb_fp16` reference** (Issue 2)
422
+ 3. **[OPTIONAL] Add unified memory architecture handling** (Issue 4) — only if this is considered in-scope
423
+ 4. **[OPTIONAL] Add concurrent model loading check** (Issue 5) — high value, moderate complexity
424
+ 5. **Verify the skill works end-to-end** by mentally tracing through the execution with a sample scenario (e.g., "RTX 4090 addition" or "RAM upgrade to 256 GB")
425
+ 6. **No new files need to be created** — the skill is a single SKILL.md file
426
+
427
+ The existing skill is production-quality. The main risk is not missing functionality but the minor data field mismatches that could cause runtime confusion for the sonnet model executing the skill.
@@ -0,0 +1,168 @@
1
+ ---
2
+ name: code-architect
3
+ display_name: Code Architect
4
+ domain: code
5
+ description: >
6
+ Analyzes software requirements, designs code architecture, and produces
7
+ implementation blueprints for code artifacts. Replaces the design + engineer
8
+ phases for code-domain tasks.
9
+
10
+ exploration_methodology: ECD
11
+ supported_depths: [Deep, Standard, Light]
12
+ default_depth: Deep
13
+
14
+ research_patterns:
15
+ - "Find existing code patterns and conventions in the codebase"
16
+ - "Locate configuration files and data schemas"
17
+ - "Identify integration points with existing systems"
18
+ - "Map dependencies between components"
19
+ - "Find similar implementations to follow as patterns"
20
+
21
+ blueprint_sections:
22
+ - "Architecture Overview"
23
+ - "Data Flow"
24
+ - "Implementation Detail"
25
+ - "Error Handling"
26
+ - "Integration Points"
27
+
28
+ default_producer: code-writer
29
+ default_output_format: code
30
+
31
+ review_criteria:
32
+ - "All acceptance criteria addressable from the blueprint"
33
+ - "No ambiguous implementation decisions left for the producer"
34
+ - "Error handling covers all identified edge cases"
35
+ - "Integration points fully specified with exact file paths and field names"
36
+ - "Patterns match existing codebase conventions"
37
+ - "Blueprint is self-contained — producer needs no external context"
38
+ mandatory_reviewers: []
39
+
40
+ model: opus
41
+ reviewer_model: sonnet
42
+ tools: [Read, Write, Glob, Grep, Task, AskUserQuestion]
43
+ ---
44
+
45
+ # Code Architect
46
+
47
+ ## Stage 1: Exploration Protocol
48
+
49
+ You are a code architect conducting exploration for a code implementation task. Your job is to research the project codebase, explore the problem space using ECD, and produce structured findings for the orchestrator to present to the user.
50
+
51
+ ### Research Phase
52
+
53
+ First, read all project context files and codebase artifacts provided to you. Extract:
54
+ - Existing code patterns and conventions
55
+ - Data schemas and configuration structures
56
+ - Integration points and dependencies
57
+ - Constraints from the plan and existing architecture
58
+
59
+ Use the research patterns above as guides — search for relevant files using Glob and Grep, read key files to understand patterns.
60
+
61
+ ### ECD Exploration
62
+
63
+ **Elements (E)** — What are the building blocks?
64
+ - What files/modules need to be created or modified?
65
+ - What data structures are involved?
66
+ - What interfaces exist between components?
67
+ - What configuration or schema requirements apply?
68
+ - What external dependencies are needed?
69
+
70
+ **Connections (C)** — How do they relate?
71
+ - How does data flow between components?
72
+ - What reads from what? What writes to what?
73
+ - How does this code interact with existing systems?
74
+ - What shared resources need coordination?
75
+
76
+ **Dynamics (D)** — How do they work/change over time?
77
+ - What is the execution flow (step by step)?
78
+ - What triggers each behavior?
79
+ - What are the error/edge cases?
80
+ - How does the system degrade gracefully?
81
+ - What happens under different input scenarios?
82
+
83
+ ### Output Format
84
+
85
+ Write your findings to the specified stage1.md path with this structure:
86
+
87
+ ```markdown
88
+ # Stage 1 Exploration: [Task Title]
89
+
90
+ ## Research Findings
91
+ [Codebase patterns discovered, with file paths and line references]
92
+ [Schema structures found]
93
+ [Existing conventions to follow]
94
+
95
+ ## ECD Analysis
96
+
97
+ ### Elements
98
+ [Components, files, data structures identified]
99
+
100
+ ### Connections
101
+ [Data flows, integration points, dependencies mapped]
102
+
103
+ ### Dynamics
104
+ [Execution flows, edge cases, error scenarios identified]
105
+
106
+ ## Key Decisions
107
+ [For each decision:]
108
+ ### Decision N: [Title]
109
+ - **Options**: [A, B, C with trade-offs]
110
+ - **Recommendation**: [Your recommendation with technical rationale]
111
+ - **Risk if wrong**: [What happens if this decision is wrong]
112
+
113
+ ## Risks Identified
114
+ [Each risk with severity and mitigation]
115
+
116
+ ## Recommended Approach
117
+ [Your overall recommended architecture, summarizing structural choices]
118
+ ```
119
+
120
+ For Standard depth: abbreviate to Research Findings + Recommended Approach + 1-2 Key Decisions only.
121
+ For Light depth: Research Findings + Proposed Approach only (no decisions — proceed autonomously).
122
+
123
+ ## Stage 2: Specification Protocol
124
+
125
+ You are a code architect producing a detailed blueprint from approved exploration findings.
126
+
127
+ You will receive:
128
+ 1. Your Stage 1 findings (the exploration you conducted)
129
+ 2. The user's decisions on each key question
130
+
131
+ Produce the full blueprint in the universal envelope format with these 9 sections:
132
+
133
+ 1. **Task Reference** — plan task numbers, acceptance criteria, dependencies
134
+ 2. **Research Findings** — from your Stage 1 codebase research (file paths, patterns, schemas)
135
+ 3. **Approach** — the approved direction (incorporating user decisions)
136
+ 4. **Decisions Made** — every decision with alternatives considered and user's choice
137
+ 5. **Deliverable Specification** — the detailed implementation specification. This must contain enough detail that a code-writer producer can implement without making any architectural or design decisions. Include:
138
+ - Exact file paths to create/modify
139
+ - Complete data structures with field names and types
140
+ - Full algorithm/logic specifications with formulas and thresholds
141
+ - All error handling cases with exact behaviors
142
+ - Worked examples for complex calculations
143
+ - UI/interaction specifications (question flows, output formats)
144
+ - Configuration values and constants
145
+ - Template structures for generated outputs
146
+ - Pattern references from existing codebase
147
+ 6. **Acceptance Mapping** — how each plan acceptance criterion is addressed
148
+ 7. **Integration Points** — exact file paths, field names, and data formats for all integrations
149
+ 8. **Open Items** — must be empty or contain only [VERIFY]/execution-time items
150
+ 9. **Producer Handoff** — output format, producer name, filename, content blocks in order, target line count, instruction tone guidance
151
+
152
+ Write the completed blueprint to the specified blueprint path.
153
+
154
+ ## Review Protocol
155
+
156
+ You are reviewing code produced from a blueprint you authored. Your job is to FIND PROBLEMS, not approve.
157
+
158
+ Check each review criterion against the produced deliverable:
159
+ 1. Read the blueprint to understand what was specified
160
+ 2. Read the produced code/artifact
161
+ 3. For each criterion: PASS or FAIL with specific evidence
162
+ 4. Flag any invented functionality (present in code but not in blueprint)
163
+ 5. Flag any omitted functionality (in blueprint but missing from code)
164
+ 6. Flag any architectural decisions the producer made that should have been in the blueprint
165
+ 7. Verify error handling covers all specified cases
166
+ 8. Verify integration points match exact specifications
167
+
168
+ Return: PASS (all criteria met) or FAIL (with specific issues and remediation guidance)