@tgoodington/intuition 8.1.3 → 9.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (154) hide show
  1. package/README.md +9 -9
  2. package/docs/project_notes/.project-memory-state.json +100 -0
  3. package/docs/project_notes/branches/.gitkeep +0 -0
  4. package/docs/project_notes/bugs.md +41 -0
  5. package/docs/project_notes/decisions.md +147 -0
  6. package/docs/project_notes/issues.md +101 -0
  7. package/docs/project_notes/key_facts.md +88 -0
  8. package/docs/project_notes/trunk/.gitkeep +0 -0
  9. package/docs/project_notes/trunk/.planning_research/decision_file_naming.md +15 -0
  10. package/docs/project_notes/trunk/.planning_research/decisions_log.md +32 -0
  11. package/docs/project_notes/trunk/.planning_research/orientation.md +51 -0
  12. package/docs/project_notes/trunk/audit/plan-rename-hitlist.md +654 -0
  13. package/docs/project_notes/trunk/blueprint-conflicts.md +109 -0
  14. package/docs/project_notes/trunk/blueprints/database-architect.md +416 -0
  15. package/docs/project_notes/trunk/blueprints/devops-infrastructure.md +514 -0
  16. package/docs/project_notes/trunk/blueprints/technical-writer.md +788 -0
  17. package/docs/project_notes/trunk/build_brief.md +119 -0
  18. package/docs/project_notes/trunk/build_report.md +250 -0
  19. package/docs/project_notes/trunk/detail_brief.md +94 -0
  20. package/docs/project_notes/trunk/plan.md +182 -0
  21. package/docs/project_notes/trunk/planning_brief.md +96 -0
  22. package/docs/project_notes/trunk/prompt_brief.md +60 -0
  23. package/docs/project_notes/trunk/prompt_output.json +98 -0
  24. package/docs/project_notes/trunk/scratch/database-architect-decisions.json +72 -0
  25. package/docs/project_notes/trunk/scratch/database-architect-research-plan.md +10 -0
  26. package/docs/project_notes/trunk/scratch/database-architect-stage1.md +226 -0
  27. package/docs/project_notes/trunk/scratch/devops-infrastructure-decisions.json +71 -0
  28. package/docs/project_notes/trunk/scratch/devops-infrastructure-research-plan.md +7 -0
  29. package/docs/project_notes/trunk/scratch/devops-infrastructure-stage1.md +164 -0
  30. package/docs/project_notes/trunk/scratch/technical-writer-decisions.json +88 -0
  31. package/docs/project_notes/trunk/scratch/technical-writer-research-plan.md +7 -0
  32. package/docs/project_notes/trunk/scratch/technical-writer-stage1.md +266 -0
  33. package/docs/project_notes/trunk/team_assignment.json +108 -0
  34. package/docs/project_notes/trunk/test_brief.md +75 -0
  35. package/docs/project_notes/trunk/test_report.md +26 -0
  36. package/docs/project_notes/trunk/verification/devops-infrastructure-verification.md +172 -0
  37. package/docs/v9/decision-framework-direction.md +142 -0
  38. package/docs/v9/decision-framework-implementation.md +114 -0
  39. package/docs/v9/domain-adaptive-team-architecture.md +1016 -0
  40. package/docs/v9/test/SESSION_SUMMARY.md +117 -0
  41. package/docs/v9/test/TEST_PLAN.md +119 -0
  42. package/docs/v9/test/blueprints/legal-analyst.md +166 -0
  43. package/docs/v9/test/output/07_cover_letter.md +41 -0
  44. package/docs/v9/test/phase2/mock_plan.md +89 -0
  45. package/docs/v9/test/phase2/producers.json +32 -0
  46. package/docs/v9/test/phase2/specialists/database-architect.specialist.md +10 -0
  47. package/docs/v9/test/phase2/specialists/financial-analyst.specialist.md +10 -0
  48. package/docs/v9/test/phase2/specialists/legal-analyst.specialist.md +10 -0
  49. package/docs/v9/test/phase2/specialists/technical-writer.specialist.md +10 -0
  50. package/docs/v9/test/phase2/team_assignment.json +61 -0
  51. package/docs/v9/test/phase3/blueprints/legal-analyst.md +840 -0
  52. package/docs/v9/test/phase3/legal-analyst-full.specialist.md +111 -0
  53. package/docs/v9/test/phase3/project_context/nh_landlord_tenant_notes.md +35 -0
  54. package/docs/v9/test/phase3/project_context/property_facts.md +32 -0
  55. package/docs/v9/test/phase3b/blueprints/legal-analyst.md +1715 -0
  56. package/docs/v9/test/phase3b/legal-analyst.specialist.md +153 -0
  57. package/docs/v9/test/phase3b/scratch/legal-analyst-stage1.md +270 -0
  58. package/docs/v9/test/phase4/TEST_PLAN.md +32 -0
  59. package/docs/v9/test/phase4/blueprints/financial-analyst-T2.md +538 -0
  60. package/docs/v9/test/phase4/blueprints/legal-analyst-T4.md +253 -0
  61. package/docs/v9/test/phase4/cross-blueprint-check.md +280 -0
  62. package/docs/v9/test/phase4/scratch/financial-analyst-T2-stage1.md +67 -0
  63. package/docs/v9/test/phase4/scratch/legal-analyst-T4-stage1.md +54 -0
  64. package/docs/v9/test/phase4/specialists/financial-analyst.specialist.md +156 -0
  65. package/docs/v9/test/phase4/specialists/legal-analyst.specialist.md +153 -0
  66. package/docs/v9/test/phase5/TEST_PLAN.md +35 -0
  67. package/docs/v9/test/phase5/blueprints/code-architect-hw-vetter.md +375 -0
  68. package/docs/v9/test/phase5/output/04_compliance_checklist.md +149 -0
  69. package/docs/v9/test/phase5/output/hardware-vetter-SKILL-v2.md +561 -0
  70. package/docs/v9/test/phase5/output/hardware-vetter-SKILL.md +459 -0
  71. package/docs/v9/test/phase5/producers/code-writer.producer.md +49 -0
  72. package/docs/v9/test/phase5/producers/document-writer.producer.md +62 -0
  73. package/docs/v9/test/phase5/regression-comparison-v2.md +60 -0
  74. package/docs/v9/test/phase5/regression-comparison.md +197 -0
  75. package/docs/v9/test/phase5/review-5A-specialist.md +213 -0
  76. package/docs/v9/test/phase5/specialist-test/TEST_PLAN.md +60 -0
  77. package/docs/v9/test/phase5/specialist-test/blueprint-comparison.md +252 -0
  78. package/docs/v9/test/phase5/specialist-test/blueprints/code-architect-hw-vetter.md +916 -0
  79. package/docs/v9/test/phase5/specialist-test/scratch/code-architect-stage1.md +427 -0
  80. package/docs/v9/test/phase5/specialists/code-architect.specialist.md +168 -0
  81. package/docs/v9/test/phase5b/TEST_PLAN.md +219 -0
  82. package/docs/v9/test/phase5b/blueprints/5B-10-stage2-with-decisions.md +286 -0
  83. package/docs/v9/test/phase5b/decisions/5B-2-accept-all-decisions.json +68 -0
  84. package/docs/v9/test/phase5b/decisions/5B-3-promote-decisions.json +70 -0
  85. package/docs/v9/test/phase5b/decisions/5B-4-individual-decisions.json +68 -0
  86. package/docs/v9/test/phase5b/decisions/5B-5-triage-decisions.json +110 -0
  87. package/docs/v9/test/phase5b/decisions/5B-6-fallback-decisions.json +40 -0
  88. package/docs/v9/test/phase5b/decisions/5B-8-partial-decisions.json +46 -0
  89. package/docs/v9/test/phase5b/decisions/5B-9-complete-decisions.json +54 -0
  90. package/docs/v9/test/phase5b/scratch/code-architect-stage1.md +133 -0
  91. package/docs/v9/test/phase5b/specialists/code-architect.specialist.md +202 -0
  92. package/docs/v9/test/phase5b/stage1-many-decisions.md +139 -0
  93. package/docs/v9/test/phase5b/stage1-no-assumptions.md +70 -0
  94. package/docs/v9/test/phase5b/stage1-with-assumptions.md +86 -0
  95. package/docs/v9/test/phase5b/test-5B-1-results.md +157 -0
  96. package/docs/v9/test/phase5b/test-5B-10-results.md +130 -0
  97. package/docs/v9/test/phase5b/test-5B-2-results.md +75 -0
  98. package/docs/v9/test/phase5b/test-5B-3-results.md +104 -0
  99. package/docs/v9/test/phase5b/test-5B-4-results.md +114 -0
  100. package/docs/v9/test/phase5b/test-5B-5-results.md +126 -0
  101. package/docs/v9/test/phase5b/test-5B-6-results.md +60 -0
  102. package/docs/v9/test/phase5b/test-5B-7-results.md +141 -0
  103. package/docs/v9/test/phase5b/test-5B-8-results.md +115 -0
  104. package/docs/v9/test/phase5b/test-5B-9-results.md +76 -0
  105. package/docs/v9/test/producers/document-writer.producer.md +62 -0
  106. package/docs/v9/test/specialists/legal-analyst.specialist.md +58 -0
  107. package/package.json +4 -2
  108. package/producers/code-writer/code-writer.producer.md +86 -0
  109. package/producers/data-file-writer/data-file-writer.producer.md +116 -0
  110. package/producers/document-writer/document-writer.producer.md +117 -0
  111. package/producers/form-filler/form-filler.producer.md +99 -0
  112. package/producers/presentation-creator/presentation-creator.producer.md +109 -0
  113. package/producers/spreadsheet-builder/spreadsheet-builder.producer.md +107 -0
  114. package/scripts/install-skills.js +97 -9
  115. package/scripts/uninstall-skills.js +7 -2
  116. package/skills/intuition-agent-advisor/SKILL.md +327 -220
  117. package/skills/intuition-assemble/SKILL.md +261 -0
  118. package/skills/intuition-build/SKILL.md +379 -319
  119. package/skills/intuition-debugger/SKILL.md +390 -390
  120. package/skills/intuition-design/SKILL.md +385 -381
  121. package/skills/intuition-detail/SKILL.md +377 -0
  122. package/skills/intuition-engineer/SKILL.md +307 -303
  123. package/skills/intuition-handoff/SKILL.md +264 -222
  124. package/skills/intuition-handoff/references/handoff_core.md +54 -54
  125. package/skills/intuition-initialize/SKILL.md +21 -6
  126. package/skills/intuition-initialize/references/agents_template.md +118 -118
  127. package/skills/intuition-initialize/references/claude_template.md +134 -134
  128. package/skills/intuition-initialize/references/intuition_readme_template.md +4 -4
  129. package/skills/intuition-initialize/references/state_template.json +17 -2
  130. package/skills/{intuition-plan → intuition-outline}/SKILL.md +561 -481
  131. package/skills/{intuition-plan → intuition-outline}/references/magellan_core.md +16 -16
  132. package/skills/{intuition-plan → intuition-outline}/references/templates/plan_template.md +6 -6
  133. package/skills/intuition-prompt/SKILL.md +374 -312
  134. package/skills/intuition-start/SKILL.md +46 -13
  135. package/skills/intuition-start/references/start_core.md +60 -60
  136. package/skills/intuition-test/SKILL.md +345 -0
  137. package/specialists/api-designer/api-designer.specialist.md +291 -0
  138. package/specialists/business-analyst/business-analyst.specialist.md +270 -0
  139. package/specialists/copywriter/copywriter.specialist.md +268 -0
  140. package/specialists/database-architect/database-architect.specialist.md +275 -0
  141. package/specialists/devops-infrastructure/devops-infrastructure.specialist.md +314 -0
  142. package/specialists/financial-analyst/financial-analyst.specialist.md +269 -0
  143. package/specialists/frontend-component/frontend-component.specialist.md +293 -0
  144. package/specialists/instructional-designer/instructional-designer.specialist.md +285 -0
  145. package/specialists/legal-analyst/legal-analyst.specialist.md +260 -0
  146. package/specialists/marketing-strategist/marketing-strategist.specialist.md +281 -0
  147. package/specialists/project-manager/project-manager.specialist.md +266 -0
  148. package/specialists/research-analyst/research-analyst.specialist.md +273 -0
  149. package/specialists/security-auditor/security-auditor.specialist.md +354 -0
  150. package/specialists/technical-writer/technical-writer.specialist.md +275 -0
  151. /package/skills/{intuition-plan → intuition-outline}/references/sub_agents.md +0 -0
  152. /package/skills/{intuition-plan → intuition-outline}/references/templates/confidence_scoring.md +0 -0
  153. /package/skills/{intuition-plan → intuition-outline}/references/templates/plan_format.md +0 -0
  154. /package/skills/{intuition-plan → intuition-outline}/references/templates/planning_process.md +0 -0
@@ -0,0 +1,68 @@
1
+ {
2
+ "specialist": "code-architect",
3
+ "gate_started": "2026-02-27T16:30:00Z",
4
+ "gate_completed": "2026-02-27T16:36:00Z",
5
+ "assumptions": [
6
+ {
7
+ "id": "A1",
8
+ "title": "Output Format Consistency",
9
+ "default": "Use existing 3-tier rating system (excellent_fit, acceptable_fit, poor_fit)",
10
+ "status": "accepted",
11
+ "user_override": null
12
+ },
13
+ {
14
+ "id": "A2",
15
+ "title": "Single-File Skill Structure",
16
+ "default": "Implement as a single SKILL.md file",
17
+ "status": "accepted",
18
+ "user_override": null
19
+ },
20
+ {
21
+ "id": "A3",
22
+ "title": "Model Selection for Execution",
23
+ "default": "Use sonnet as the execution model",
24
+ "status": "accepted",
25
+ "user_override": null
26
+ },
27
+ {
28
+ "id": "A4",
29
+ "title": "Hardware Profile Path",
30
+ "default": "Read hardware profile from config/hardware-profile.json",
31
+ "status": "accepted",
32
+ "user_override": null
33
+ },
34
+ {
35
+ "id": "A5",
36
+ "title": "Report Naming Convention",
37
+ "default": "model_rec_YYYY-MM-DD_[use-case-slug].md",
38
+ "status": "accepted",
39
+ "user_override": null
40
+ }
41
+ ],
42
+ "decisions": [
43
+ {
44
+ "id": "D1",
45
+ "title": "Scoring Formula Approach",
46
+ "context": "Need to rank 47 models against user hardware. RAM, VRAM, and context length are the key dimensions.",
47
+ "options": ["A: Weighted percentage — RAM 40%, VRAM 40%, context 20% (recommended)", "B: Binary pass/fail per dimension, rank by headroom", "C: Single composite ratio averaged across dimensions"],
48
+ "chosen": "A",
49
+ "user_input": null
50
+ },
51
+ {
52
+ "id": "D2",
53
+ "title": "Use-Case Filtering Strategy",
54
+ "context": "Models have use-case tags (chat, code, creative, reasoning). User provides a query like 'I need a coding model'.",
55
+ "options": ["A: Strict tag match (recommended)", "B: Fuzzy match — tagged first, then 'might also work'"],
56
+ "chosen": "B",
57
+ "user_input": null
58
+ },
59
+ {
60
+ "id": "D3",
61
+ "title": "Top-N Presentation Count",
62
+ "context": "Need to decide how many models to show in the recommendation report.",
63
+ "options": ["A: Top 5 models (recommended)", "B: Top 3 models", "C: All models above acceptable_fit threshold"],
64
+ "chosen": "other",
65
+ "user_input": "Show top 5 but also include a 'honorable mentions' section for models that scored between acceptable_fit and the 5th-place score"
66
+ }
67
+ ]
68
+ }
@@ -0,0 +1,110 @@
1
+ {
2
+ "specialist": "code-architect",
3
+ "gate_started": "2026-02-27T16:40:00Z",
4
+ "gate_completed": "2026-02-27T16:52:00Z",
5
+ "assumptions": [
6
+ {
7
+ "id": "A1",
8
+ "title": "Test Framework",
9
+ "default": "Use Jest with supertest (already in devDependencies)",
10
+ "status": "accepted",
11
+ "user_override": null
12
+ },
13
+ {
14
+ "id": "A2",
15
+ "title": "Single-File Skill Structure",
16
+ "default": "Implement as a single SKILL.md file",
17
+ "status": "accepted",
18
+ "user_override": null
19
+ },
20
+ {
21
+ "id": "A3",
22
+ "title": "Model Selection",
23
+ "default": "Use sonnet as the execution model",
24
+ "status": "accepted",
25
+ "user_override": null
26
+ }
27
+ ],
28
+ "decisions": [
29
+ {
30
+ "id": "D1",
31
+ "title": "Test Scope — Which Endpoints",
32
+ "context": "42 documented endpoints + ~6 undocumented admin routes. OpenAPI spec covers the 42.",
33
+ "options": ["A: All 42 documented endpoints (recommended)", "B: Critical paths only (~15)", "C: All 48 including undocumented admin"],
34
+ "chosen": "A",
35
+ "user_input": null
36
+ },
37
+ {
38
+ "id": "D2",
39
+ "title": "External Service Mocking Strategy",
40
+ "context": "3 external dependencies: payment processor, email service, search index.",
41
+ "options": ["A: In-process mocks — nock/msw (recommended)", "B: Sidecar mock servers", "C: Real staging services"],
42
+ "chosen": "other",
43
+ "user_input": "Use msw for email and search, but use a real Stripe test-mode instance for payment since Stripe has a robust test API"
44
+ },
45
+ {
46
+ "id": "D3",
47
+ "title": "Database Strategy",
48
+ "context": "No test database seeding currently. 340 transitive deps in full tree.",
49
+ "options": ["A: SQLite in-memory (recommended)", "B: Dockerized test database", "C: Shared test DB with transaction rollback"],
50
+ "chosen": "A",
51
+ "user_input": null
52
+ },
53
+ {
54
+ "id": "D4",
55
+ "title": "Auth Token Management",
56
+ "context": "Auth middleware uses JWT with refresh tokens. Integration tests need token management.",
57
+ "options": ["A: Pre-generated static tokens (recommended)", "B: Full auth flow per test", "C: Bypass auth middleware"],
58
+ "chosen": "A",
59
+ "user_input": null
60
+ },
61
+ {
62
+ "id": "D5",
63
+ "title": "Test Organization",
64
+ "context": "14 route files in src/api/routes/. Need to organize test files.",
65
+ "options": ["A: One test file per route file — 14 files (recommended)", "B: One per endpoint — 42 files", "C: Grouped by domain — ~6 files"],
66
+ "chosen": "A",
67
+ "user_input": null
68
+ },
69
+ {
70
+ "id": "D6",
71
+ "title": "Response Validation Depth",
72
+ "context": "OpenAPI 3.0 spec available for response shape validation.",
73
+ "options": ["A: Schema validation + key field assertions (recommended)", "B: Full deep-equal", "C: Status code + content-type only"],
74
+ "chosen": "A",
75
+ "user_input": null
76
+ },
77
+ {
78
+ "id": "D7",
79
+ "title": "Error Case Coverage",
80
+ "context": "Error handling is where most integration bugs hide.",
81
+ "options": ["A: All documented error codes per endpoint (recommended)", "B: Common errors only (400, 401, 404, 500)", "C: Happy path only"],
82
+ "chosen": "A",
83
+ "user_input": null
84
+ },
85
+ {
86
+ "id": "D8",
87
+ "title": "Rate Limiting Test Approach",
88
+ "context": "Rate limiting is per-IP in production. Needs different handling in tests.",
89
+ "options": ["A: Configurable rate limits in test env (recommended)", "B: Real rate limits", "C: Skip rate limit testing"],
90
+ "chosen": "A",
91
+ "user_input": null
92
+ },
93
+ {
94
+ "id": "D9",
95
+ "title": "Test Data Seeding Strategy",
96
+ "context": "No existing seeding. Tests currently use mocked data stores.",
97
+ "options": ["A: Fixture files per test suite (recommended)", "B: Factory functions with random data", "C: Shared seed script"],
98
+ "chosen": "B",
99
+ "user_input": null
100
+ },
101
+ {
102
+ "id": "D10",
103
+ "title": "CI Integration",
104
+ "context": "Integration tests are slower than unit tests. Need CI strategy.",
105
+ "options": ["A: Separate CI job — run on PR (recommended)", "B: Combined with unit tests", "C: Manual trigger only"],
106
+ "chosen": "A",
107
+ "user_input": null
108
+ }
109
+ ]
110
+ }
@@ -0,0 +1,40 @@
1
+ {
2
+ "specialist": "code-architect",
3
+ "gate_started": "2026-02-27T16:00:00Z",
4
+ "gate_completed": "2026-02-27T16:03:00Z",
5
+ "assumptions": [],
6
+ "decisions": [
7
+ {
8
+ "id": "D1",
9
+ "title": "Scope of Audit",
10
+ "context": "Existing tests are unit-level only. No integration tests. Options range from vuln-only to full audit.",
11
+ "options": ["A: Vulnerabilities only (recommended)", "B: Vulnerabilities + license compliance", "C: Vulnerabilities + license compliance + version staleness"],
12
+ "chosen": "A",
13
+ "user_input": null
14
+ },
15
+ {
16
+ "id": "D2",
17
+ "title": "Transitive Dependency Depth",
18
+ "context": "340 transitive dependencies in the tree. Direct-only covers 20 packages.",
19
+ "options": ["A: Full tree (recommended)", "B: Direct only"],
20
+ "chosen": "A",
21
+ "user_input": null
22
+ },
23
+ {
24
+ "id": "D3",
25
+ "title": "Output Verbosity",
26
+ "context": "npm audit output is verbose and hard to read. Need to balance detail with readability.",
27
+ "options": ["A: Summary with expandable details (recommended)", "B: Full verbose", "C: Executive summary only"],
28
+ "chosen": "A",
29
+ "user_input": null
30
+ },
31
+ {
32
+ "id": "D4",
33
+ "title": "Remediation Suggestions",
34
+ "context": "Users need actionable output. Fix commands risk suggesting breaking changes.",
35
+ "options": ["A: Include fix commands (recommended)", "B: Flag issues only"],
36
+ "chosen": "B",
37
+ "user_input": null
38
+ }
39
+ ]
40
+ }
@@ -0,0 +1,46 @@
1
+ {
2
+ "specialist": "code-architect",
3
+ "gate_started": "2026-02-27T17:10:00Z",
4
+ "gate_completed": null,
5
+ "assumptions": [
6
+ {
7
+ "id": "A1",
8
+ "title": "Output Format Consistency",
9
+ "default": "Use existing 3-tier rating system (excellent_fit, acceptable_fit, poor_fit)",
10
+ "status": "accepted",
11
+ "user_override": null
12
+ },
13
+ {
14
+ "id": "A2",
15
+ "title": "Single-File Skill Structure",
16
+ "default": "Implement as a single SKILL.md file",
17
+ "status": "accepted",
18
+ "user_override": null
19
+ },
20
+ {
21
+ "id": "A3",
22
+ "title": "Model Selection for Execution",
23
+ "default": "Use sonnet as the execution model",
24
+ "status": "accepted",
25
+ "user_override": null
26
+ }
27
+ ],
28
+ "decisions": [
29
+ {
30
+ "id": "D1",
31
+ "title": "Scoring Formula Approach",
32
+ "context": "Need to rank 47 models against user hardware. RAM, VRAM, and context length are the key dimensions.",
33
+ "options": ["A: Weighted percentage — RAM 40%, VRAM 40%, context 20% (recommended)", "B: Binary pass/fail per dimension, rank by headroom", "C: Single composite ratio averaged across dimensions"],
34
+ "chosen": "A",
35
+ "user_input": null
36
+ },
37
+ {
38
+ "id": "D2",
39
+ "title": "Use-Case Filtering Strategy",
40
+ "context": "Models have use-case tags (chat, code, creative, reasoning). User provides a query like 'I need a coding model'.",
41
+ "options": ["A: Strict tag match (recommended)", "B: Fuzzy match — tagged first, then 'might also work'"],
42
+ "chosen": "B",
43
+ "user_input": null
44
+ }
45
+ ]
46
+ }
@@ -0,0 +1,54 @@
1
+ {
2
+ "specialist": "code-architect",
3
+ "gate_started": "2026-02-27T17:10:00Z",
4
+ "gate_completed": "2026-02-27T17:18:00Z",
5
+ "assumptions": [
6
+ {
7
+ "id": "A1",
8
+ "title": "Output Format Consistency",
9
+ "default": "Use existing 3-tier rating system (excellent_fit, acceptable_fit, poor_fit)",
10
+ "status": "accepted",
11
+ "user_override": null
12
+ },
13
+ {
14
+ "id": "A2",
15
+ "title": "Single-File Skill Structure",
16
+ "default": "Implement as a single SKILL.md file",
17
+ "status": "accepted",
18
+ "user_override": null
19
+ },
20
+ {
21
+ "id": "A3",
22
+ "title": "Model Selection for Execution",
23
+ "default": "Use sonnet as the execution model",
24
+ "status": "accepted",
25
+ "user_override": null
26
+ }
27
+ ],
28
+ "decisions": [
29
+ {
30
+ "id": "D1",
31
+ "title": "Scoring Formula Approach",
32
+ "context": "Need to rank 47 models against user hardware.",
33
+ "options": ["A: Weighted percentage (recommended)", "B: Binary pass/fail", "C: Single composite ratio"],
34
+ "chosen": "A",
35
+ "user_input": null
36
+ },
37
+ {
38
+ "id": "D2",
39
+ "title": "Use-Case Filtering Strategy",
40
+ "context": "Models have use-case tags.",
41
+ "options": ["A: Strict tag match (recommended)", "B: Fuzzy match"],
42
+ "chosen": "B",
43
+ "user_input": null
44
+ },
45
+ {
46
+ "id": "D3",
47
+ "title": "Top-N Presentation Count",
48
+ "context": "Need to decide how many models to show.",
49
+ "options": ["A: Top 5 (recommended)", "B: Top 3", "C: All above threshold"],
50
+ "chosen": "A",
51
+ "user_input": null
52
+ }
53
+ ]
54
+ }
@@ -0,0 +1,133 @@
1
+ # Stage 1 Exploration: Task 2 — Build the Hardware Vetter Claude Code Skill
2
+
3
+ ## Research Findings
4
+
5
+ ### Files Examined
6
+
7
+ | File | Path | Key Data |
8
+ |------|------|----------|
9
+ | Model Catalog | `C:/Projects/District_AI_Agent_Implementation/docs/model_catalog.json` | Schema v1.2, 11 models, `hardware_profile`, `infrastructure_options` |
10
+ | Pipeline Config | `C:/Projects/District_AI_Agent_Implementation/src/pipeline/config.py` | Pydantic Settings, 3 registered Ollama model IDs: `qwen2.5:14b`, `qwen2.5:7b`, `llama3.1:8b` |
11
+ | Existing Skill | `C:/Projects/District_AI_Agent_Implementation/.claude/skills/hardware-vetter/SKILL.md` | 708-line complete skill implementation already exists |
12
+
13
+ ### Critical Finding: Skill Already Exists
14
+
15
+ The file `.claude/skills/hardware-vetter/SKILL.md` is a **complete, 708-line implementation** — not a stub. It includes full YAML frontmatter, 8 workflow sections, detailed question flow via AskUserQuestion, GPU-primary and CPU-only analysis paths, benchmark search protocol (8-call cap), full report template, 7 error handling scenarios, and a completion message format.
16
+
17
+ ### Codebase Conventions
18
+
19
+ - Single-file skill pattern: everything in `SKILL.md` (no reference files loaded by Claude Code)
20
+ - Frontmatter format: `name`, `description`, `model`, `tools` fields
21
+ - Report output directory: `docs/reports/` with naming convention `hardware_eval_YYYY-MM-DD_[slug].md`
22
+ - Catalog field `hardware_profile.ram_gb` (not `total_ram_gb` as referenced in current skill)
23
+ - Ollama IDs use colon format in config.py (`qwen2.5:14b`) but hyphen format as catalog keys (`qwen2.5-14b`); matching via `ollama_id` field
24
+
25
+ ### Data Field Issues Found
26
+
27
+ 1. **Field name mismatch:** Skill Section 2.1 references `hardware_profile.total_ram_gb` but catalog uses `hardware_profile.ram_gb`
28
+ 2. **Nonexistent field reference:** Skill Section 4.2a references `gpu_vram_gb_fp16` which does not exist in any model's `hardware_requirements`; only `ram_gb_fp16` exists
29
+ 3. **Unused tool:** `Glob` listed in frontmatter but never referenced in skill body
30
+
31
+ ## ECD Analysis
32
+
33
+ ### Elements
34
+
35
+ - **Skill file:** Single `SKILL.md` at `.claude/skills/hardware-vetter/SKILL.md`
36
+ - **Data source 1:** `docs/model_catalog.json` — 11 models, hardware profile, infrastructure options
37
+ - **Data source 2:** `src/pipeline/config.py` — 3 registered model IDs (default, fast, chat)
38
+ - **Output artifact:** Markdown report at `docs/reports/hardware_eval_YYYY-MM-DD_[slug].md`
39
+ - **Model catalog schema fields per model:** `display_name`, `ollama_id`, `parameter_count`, `size_tier`, `hardware_requirements` (with `ram_gb_q4`, `ram_gb_q8`, `ram_gb_fp16`, `recommended_ram_gb`, `gpu_vram_gb_q4`, `gpu_vram_gb_q8`, `gpu_offload_support`), `feasibility`, `raw_benchmarks`, `category_scores`
40
+
41
+ ### Connections
42
+
43
+ - Skill reads `model_catalog.json` to extract hardware profile and all 11 model entries
44
+ - Skill reads `config.py` to extract 3 registered Ollama IDs, then matches via `ollama_id` field to catalog entries
45
+ - AskUserQuestion collects proposed hardware changes from user (4 change types: CPU, GPU, RAM, full system)
46
+ - Analysis engine branches on GPU presence: GPU-primary path or CPU-only path
47
+ - WebSearch (up to 8 calls) finds published benchmarks to upgrade estimates from "Projected" to "Verified"
48
+ - Write tool outputs the final report to `docs/reports/`
49
+
50
+ ### Dynamics
51
+
52
+ - **Execution flow:** Data loading → question flow → analysis → benchmark search → report writing → completion message
53
+ - **Graceful degradation:** GPU fields missing → CPU-only path; benchmarks return nothing → projected estimates; config.py unreadable → all 11 as candidates; catalog missing → STOP
54
+ - **Unhandled edge cases:** Unified memory architectures (e.g., DGX Spark), multi-GPU configurations, adding a separate node (vs upgrading existing), models not in catalog, concurrent multi-model loading analysis
55
+
56
+ ## Assumptions
57
+
58
+ ### A1: Single-File Skill Structure
59
+ - **Default**: Keep the entire skill as a single `SKILL.md` file with no companion files
60
+ - **Rationale**: Claude Code only injects `SKILL.md` into context (the "Reference File Problem"). All existing Intuition skills follow this pattern. Splitting into multiple files would break skill loading.
61
+
62
+ ### A2: Fix the `total_ram_gb` Field Name Mismatch
63
+ - **Default**: Change `total_ram_gb` to `ram_gb` in Section 2.1 to match the actual catalog field
64
+ - **Rationale**: The catalog field is definitively `ram_gb`. The current reference is incorrect and could cause runtime confusion for the sonnet model executing the skill.
65
+
66
+ ### A3: Fix the `gpu_vram_gb_fp16` Nonexistent Field Reference
67
+ - **Default**: Remove or correct the FP16 case in the GPU analysis path (Section 4.2a), since `gpu_vram_gb_fp16` does not exist in any model's hardware_requirements
68
+ - **Rationale**: No model uses FP16 as its recommended quantization, and the field does not exist. The reference is dead code that could confuse the executor.
69
+
70
+ ### A4: Preserve Existing Report Format and Naming Convention
71
+ - **Default**: Keep the existing `hardware_eval_YYYY-MM-DD_[slug].md` naming convention and report structure
72
+ - **Rationale**: An existing report (`hardware_eval_2026-02-20_thinkstation-pgx-addition.md`) already demonstrates this format works well. Changing it would create inconsistency with prior reports.
73
+
74
+ ### A5: Match via `ollama_id` Field for Config-to-Catalog Linking
75
+ - **Default**: Continue matching config.py model IDs to catalog entries via the `ollama_id` field (colon format)
76
+ - **Rationale**: The skill already implements this correctly. Catalog keys use hyphen format but `ollama_id` uses colon format matching config.py exactly.
77
+
78
+ ### A6: Keep `sonnet` as the Execution Model
79
+ - **Default**: Retain `model: sonnet` in frontmatter for skill execution
80
+ - **Rationale**: The skill is data-reading, question-asking, and report-writing — tasks well-suited to sonnet. Opus would be overkill for the structured analysis and report generation this skill performs.
81
+
82
+ ### A7: Lightweight Schema Validation (Existence Checks Only)
83
+ - **Default**: Validate only that `model_catalog.json` exists, is readable, and `hardware_profile` is present — no deep schema validation
84
+ - **Rationale**: Full JSON schema validation would require code execution tools not available to the skill. The existing approach of reading data and gracefully degrading when fields are missing is the correct pattern for a Claude Code skill.
85
+
86
+ ## Key Decisions
87
+
88
+ ### D1: Scope of Changes — Fix Only vs Enhancement
89
+ - **Options**:
90
+ - A) Fix data field issues only (Issues 1-3) — recommended: Minimal, low-risk changes to a working skill. Corrects the `ram_gb` mismatch, removes `gpu_vram_gb_fp16` dead reference, optionally removes unused `Glob` tool. Does not change functionality.
91
+ - B) Fix issues + add unified memory architecture support: Adds a sub-path in Section 4.2 for systems like DGX Spark where GPU VRAM and system RAM are unified. Medium scope increase.
92
+ - C) Fix issues + add unified memory + add concurrent model loading analysis: Also adds a check that sums RAM/VRAM requirements for all 3 registered models loaded simultaneously. Largest scope.
93
+ - **Recommendation**: A, because the skill is already production-quality and complete. The acceptance criteria are already met. Scope creep into new features (unified memory, concurrent loading) should be separate tasks with their own planning.
94
+ - **Risk if wrong**: If option A is chosen but unified memory systems are evaluated soon, the skill will produce incorrect VRAM/RAM split analysis for those architectures. However, this can be addressed in a follow-up task.
95
+
96
+ ### D2: Remove or Keep Unused `Glob` Tool in Frontmatter
97
+ - **Options**:
98
+ - A) Remove `Glob` from the tools list — recommended: Reduces the tool surface to only what the skill actually uses (Read, WebSearch, AskUserQuestion, Write).
99
+ - B) Keep `Glob` and add a use case: Add a step to check if `docs/reports/` directory exists before writing, giving Glob a purpose.
100
+ - **Recommendation**: A, because the Write tool will create the file regardless, and adding a directory check adds complexity for negligible benefit. Smaller tool surface means fewer tokens spent on tool descriptions.
101
+ - **Risk if wrong**: Negligible either way. If a future skill revision needs Glob, it can be re-added.
102
+
103
+ ### D3: How to Handle the Existing Skill — Review-and-Patch vs Rewrite
104
+ - **Options**:
105
+ - A) Review-and-patch — recommended: Treat the existing 708-line skill as the baseline. Apply targeted fixes (field name corrections, dead reference removal). Verify against acceptance criteria. Minimal diff.
106
+ - B) Rewrite from scratch: Produce a new SKILL.md from the blueprint, incorporating lessons learned but potentially losing working edge case handling.
107
+ - **Recommendation**: A, because the existing skill handles 7 error scenarios, has a well-structured question flow, and covers all 8 acceptance criteria. A rewrite risks losing subtle handling that the existing implementation got right.
108
+ - **Risk if wrong**: If the existing skill has deeper structural problems beyond the data field issues, patching may be insufficient. However, research found no structural issues — only data reference mismatches.
109
+
110
+ ## Risks Identified
111
+
112
+ ### Risk 1: Runtime Field Name Confusion (Low Severity)
113
+ - **Description**: Even after fixing the `total_ram_gb` reference, the sonnet model executing the skill reads the actual JSON. If the instruction says one thing and the data says another, sonnet may adapt — but inconsistency between instruction text and data structure creates ambiguity.
114
+ - **Mitigation**: Fix the field name references to exactly match the catalog. This eliminates the ambiguity entirely.
115
+
116
+ ### Risk 2: Future Catalog Schema Changes (Low Severity)
117
+ - **Description**: The skill hardcodes field names from schema v1.2. If the catalog schema changes, field references will break.
118
+ - **Mitigation**: The lightweight validation approach means the skill will gracefully degrade (missing fields trigger fallback paths). No action needed now.
119
+
120
+ ### Risk 3: Existing Skill Untested End-to-End (Medium Severity)
121
+ - **Description**: The 708-line skill has never been run. Subtle issues in question flow branching, analysis calculations, or report formatting may exist but are invisible until runtime.
122
+ - **Mitigation**: The code spec should include a mental walkthrough trace of at least one scenario (e.g., "Add RTX 4090 GPU") to verify the logic flow.
123
+
124
+ ## Recommended Approach
125
+
126
+ The existing Hardware Vetter skill at `.claude/skills/hardware-vetter/SKILL.md` is a comprehensive, production-quality implementation that meets all 8 acceptance criteria. The engineering work should be a **review-and-patch** operation:
127
+
128
+ 1. Fix the `total_ram_gb` → `ram_gb` field name mismatch in Section 2.1
129
+ 2. Fix or remove the `gpu_vram_gb_fp16` nonexistent field reference in Section 4.2a
130
+ 3. Remove `Glob` from the tools list (if decided)
131
+ 4. Verify all remaining field references against the actual catalog schema
132
+ 5. Mental walkthrough of at least one complete scenario to validate logic flow
133
+ 6. No new files to create — single-file SKILL.md pattern maintained
@@ -0,0 +1,202 @@
1
+ ---
2
+ name: code-architect
3
+ display_name: Code Architect
4
+ domain: code
5
+ description: >
6
+ Analyzes software requirements, designs code architecture, and produces
7
+ implementation blueprints for code artifacts. Replaces the design + engineer
8
+ phases for code-domain tasks.
9
+
10
+ exploration_methodology: ECD
11
+ supported_depths: [Deep, Standard, Light]
12
+ default_depth: Deep
13
+
14
+ research_patterns:
15
+ - "Find existing code patterns and conventions in the codebase"
16
+ - "Locate configuration files and data schemas"
17
+ - "Identify integration points with existing systems"
18
+ - "Map dependencies between components"
19
+ - "Find similar implementations to follow as patterns"
20
+
21
+ blueprint_sections:
22
+ - "Architecture Overview"
23
+ - "Data Flow"
24
+ - "Implementation Detail"
25
+ - "Error Handling"
26
+ - "Integration Points"
27
+
28
+ default_producer: code-writer
29
+ default_output_format: code
30
+
31
+ review_criteria:
32
+ - "All acceptance criteria addressable from the blueprint"
33
+ - "No ambiguous implementation decisions left for the producer"
34
+ - "Error handling covers all identified edge cases"
35
+ - "Integration points fully specified with exact file paths and field names"
36
+ - "Patterns match existing codebase conventions"
37
+ - "Blueprint is self-contained — producer needs no external context"
38
+ mandatory_reviewers: []
39
+
40
+ model: opus
41
+ reviewer_model: sonnet
42
+ tools: [Read, Write, Glob, Grep, Task, AskUserQuestion]
43
+ ---
44
+
45
+ # Code Architect
46
+
47
+ ## Stage 1: Exploration Protocol
48
+
49
+ You are a code architect conducting exploration for a code implementation task. Your job is to research the project codebase, explore the problem space using ECD, and produce structured findings for the orchestrator to present to the user.
50
+
51
+ ### Research Phase
52
+
53
+ First, read all project context files and codebase artifacts provided to you. Extract:
54
+ - Existing code patterns and conventions
55
+ - Data schemas and configuration structures
56
+ - Integration points and dependencies
57
+ - Constraints from the plan and existing architecture
58
+
59
+ Use the research patterns above as guides — search for relevant files using Glob and Grep, read key files to understand patterns.
60
+
61
+ ### ECD Exploration
62
+
63
+ **Elements (E)** — What are the building blocks?
64
+ - What files/modules need to be created or modified?
65
+ - What data structures are involved?
66
+ - What interfaces exist between components?
67
+ - What configuration or schema requirements apply?
68
+ - What external dependencies are needed?
69
+
70
+ **Connections (C)** — How do they relate?
71
+ - How does data flow between components?
72
+ - What reads from what? What writes to what?
73
+ - How does this code interact with existing systems?
74
+ - What shared resources need coordination?
75
+
76
+ **Dynamics (D)** — How do they work/change over time?
77
+ - What is the execution flow (step by step)?
78
+ - What triggers each behavior?
79
+ - What are the error/edge cases?
80
+ - How does the system degrade gracefully?
81
+ - What happens under different input scenarios?
82
+
83
+ ### Assumptions vs Key Decisions Classification
84
+
85
+ After your ECD exploration, you MUST classify every architectural item into one of two categories:
86
+
87
+ **Assumptions** — Items where there is a clear best practice, an obvious default, or only one reasonable approach given the codebase context. These are things you would do without asking. Examples:
88
+ - Following an existing naming convention in the codebase
89
+ - Using the same error handling pattern as adjacent code
90
+ - Matching an established data format or schema
91
+ - Using the project's existing dependency/library for a task
92
+
93
+ **Key Decisions** — Items where multiple valid approaches exist and the choice meaningfully affects the outcome. These require user input. Examples:
94
+ - Choosing between two viable architectures with different trade-offs
95
+ - Deciding scope boundaries (include feature X or defer it?)
96
+ - Selecting a strategy when the codebase has no established precedent
97
+ - Trade-offs between correctness, performance, and complexity
98
+
99
+ **Classification rule:** If you are uncertain whether something is an assumption or a decision, classify it as a **Key Decision**. It is better to ask unnecessarily than to assume incorrectly.
100
+
101
+ ### Output Format — FORMAT COMPLIANCE IS MANDATORY
102
+
103
+ Write your findings to the specified stage1.md path. You MUST use exactly the heading levels and field labels specified below. Do not restructure, rename, or nest differently. The foreground skill parses stage1.md by these exact headings — creative reformatting will break the gate.
104
+
105
+ ```markdown
106
+ # Stage 1 Exploration: [Task Title]
107
+
108
+ ## Research Findings
109
+ [Facts from codebase research — file paths, schemas, patterns, constraints]
110
+
111
+ ## ECD Analysis
112
+
113
+ ### Elements
114
+ [Components, files, data structures identified]
115
+
116
+ ### Connections
117
+ [Data flows, integration points, dependencies mapped]
118
+
119
+ ### Dynamics
120
+ [Execution flows, edge cases, error scenarios identified]
121
+
122
+ ## Assumptions
123
+ ### A1: [Title]
124
+ - **Default**: [what you will do]
125
+ - **Rationale**: [why this is the obvious choice]
126
+
127
+ ### A2: [Title]
128
+ - **Default**: [what you will do]
129
+ - **Rationale**: [why this is the obvious choice]
130
+
131
+ ## Key Decisions
132
+ ### D1: [Title]
133
+ - **Options**:
134
+ - A) [option — recommended]: [rationale]
135
+ - B) [option]: [rationale]
136
+ - C) [option]: [rationale]
137
+ - **Recommendation**: A, because [reason]
138
+ - **Risk if wrong**: [what happens if this decision is wrong]
139
+
140
+ ### D2: [Title]
141
+ - **Options**:
142
+ - A) [option — recommended]: [rationale]
143
+ - B) [option]: [rationale]
144
+ - **Recommendation**: A, because [reason]
145
+ - **Risk if wrong**: [what happens if this decision is wrong]
146
+
147
+ ## Risks Identified
148
+ [Each risk with severity and mitigation]
149
+
150
+ ## Recommended Approach
151
+ [Overall recommendation summarizing the proposed direction]
152
+ ```
153
+
154
+ For Standard depth: abbreviate to Research Findings + Recommended Approach + Assumptions + 1-2 Key Decisions only.
155
+ For Light depth: Research Findings + Proposed Approach only (no decisions — proceed autonomously).
156
+
157
+ ## Stage 2: Specification Protocol
158
+
159
+ You are a code architect producing a detailed blueprint from approved exploration findings.
160
+
161
+ You will receive:
162
+ 1. Your Stage 1 findings (the exploration you conducted)
163
+ 2. The user's decisions on each key question
164
+
165
+ Produce the full blueprint in the universal envelope format with these 9 sections:
166
+
167
+ 1. **Task Reference** — plan task numbers, acceptance criteria, dependencies
168
+ 2. **Research Findings** — from your Stage 1 codebase research (file paths, patterns, schemas)
169
+ 3. **Approach** — the approved direction (incorporating user decisions)
170
+ 4. **Decisions Made** — every decision with alternatives considered and user's choice
171
+ 5. **Deliverable Specification** — the detailed implementation specification. This must contain enough detail that a code-writer producer can implement without making any architectural or design decisions. Include:
172
+ - Exact file paths to create/modify
173
+ - Complete data structures with field names and types
174
+ - Full algorithm/logic specifications with formulas and thresholds
175
+ - All error handling cases with exact behaviors
176
+ - Worked examples for complex calculations
177
+ - UI/interaction specifications (question flows, output formats)
178
+ - Configuration values and constants
179
+ - Template structures for generated outputs
180
+ - Pattern references from existing codebase
181
+ 6. **Acceptance Mapping** — how each plan acceptance criterion is addressed
182
+ 7. **Integration Points** — exact file paths, field names, and data formats for all integrations
183
+ 8. **Open Items** — must be empty or contain only [VERIFY]/execution-time items
184
+ 9. **Producer Handoff** — output format, producer name, filename, content blocks in order, target line count, instruction tone guidance
185
+
186
+ Write the completed blueprint to the specified blueprint path.
187
+
188
+ ## Review Protocol
189
+
190
+ You are reviewing code produced from a blueprint you authored. Your job is to FIND PROBLEMS, not approve.
191
+
192
+ Check each review criterion against the produced deliverable:
193
+ 1. Read the blueprint to understand what was specified
194
+ 2. Read the produced code/artifact
195
+ 3. For each criterion: PASS or FAIL with specific evidence
196
+ 4. Flag any invented functionality (present in code but not in blueprint)
197
+ 5. Flag any omitted functionality (in blueprint but missing from code)
198
+ 6. Flag any architectural decisions the producer made that should have been in the blueprint
199
+ 7. Verify error handling covers all specified cases
200
+ 8. Verify integration points match exact specifications
201
+
202
+ Return: PASS (all criteria met) or FAIL (with specific issues and remediation guidance)