prjct-cli 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,107 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.9.0] - 2026-02-07
4
+
5
+ ### Features
6
+
7
+ - add structured output schema to all LLM prompts (PRJ-264) (#150)
8
+ - add mandatory model specification to AI provider (PRJ-265) (#149)
9
+
10
+ ### Bug Fixes
11
+
12
+ - replace keyword domain detection with LLM semantic classification (PRJ-299) (#148)
13
+
14
+
15
+ ## [1.10.0] - 2026-02-07
16
+
17
+ ### Features
18
+ - **Add structured output schema to all LLM prompts (PRJ-264)**: LLM prompts now include explicit JSON output schemas. Responses are validated with Zod before use. Invalid responses trigger re-prompt with structured error feedback.
19
+
20
+ ### Implementation Details
21
+ - New `core/schemas/llm-output.ts`: Zod schemas for task classification, agent assignment, and subtask breakdown responses. Schema registry (`OUTPUT_SCHEMAS`) with examples that self-validate. `renderSchemaForPrompt()` serializes schemas as markdown format instructions for prompt injection.
22
+ - New `core/agentic/response-validator.ts`: `validateLLMResponse()` handles JSON parsing (plain and markdown-wrapped `\`\`\`json` fences), Zod validation, and typed results. `buildReprompt()` generates retry messages with specific validation errors.
23
+ - Replaced manual field-by-field validation in `domain-classifier.ts` with `TaskClassificationSchema.safeParse()` — the schema existed (PRJ-299) but was unused.
24
+ - Added output schema injection to `prompt-builder.ts` `build()` method with `getSchemaTypeForCommand()` mapping commands to schemas.
25
+ - 20 new unit tests in `core/__tests__/agentic/response-validator.test.ts`
26
+
27
+ ### Test Plan
28
+
29
+ #### For QA
30
+ 1. Run `bun test core/__tests__/agentic/response-validator.test.ts` — all 20 tests pass
31
+ 2. Run `bun test` — full suite (677 tests) passes with no regressions
32
+ 3. Run `bun run build` — build succeeds cleanly
33
+ 4. Verify `renderSchemaForPrompt('classification')` returns markdown with OUTPUT FORMAT header
34
+ 5. Verify `validateLLMResponse()` handles plain JSON, markdown-wrapped JSON, and rejects non-JSON
35
+ 6. Verify OUTPUT_SCHEMAS registry examples validate against their own schemas
36
+
37
+ #### For Users
38
+ **What changed:** LLM prompts include explicit JSON output schemas. Domain classifier uses Zod validation. Response validator provides structured error handling with re-prompt.
39
+ **How to use:** Automatic — schemas injected into prompts and validation runs transparently.
40
+ **Breaking changes:** None — all changes are additive.
41
+
42
+ ## [1.9.0] - 2026-02-07
43
+
44
+ ### Features
45
+ - **Add mandatory model specification to AI provider (PRJ-265)**: Provider configs now include `defaultModel`, `supportedModels`, and `minCliVersion` fields. Analysis and task metadata can record which model was used, enabling consistency tracking and mismatch warnings.
46
+
47
+ ### Implementation Details
48
+ - New `core/schemas/model.ts`: Zod schemas defining supported models per provider (Claude: opus/sonnet/haiku, Gemini: 2.5-pro/2.5-flash/2.0-flash), default model resolution, semver comparison utilities, minimum CLI version validation, and model mismatch detection
49
+ - Extended `AIProviderConfig` interface in `core/types/provider.ts` with `defaultModel`, `supportedModels`, `minCliVersion` fields
50
+ - All 5 provider configs (Claude, Gemini, Cursor, Windsurf, Antigravity) updated with model specification fields
51
+ - Added `modelMetadata` (optional) to `CurrentTaskSchema` in `core/schemas/state.ts` and `AnalysisSchema` in `core/schemas/analysis.ts`
52
+ - Added `preferredModel` to `ProjectSettings` in `core/types/config.ts`
53
+ - Added `validateCliVersion()` to `core/infrastructure/ai-provider.ts` with version warning integration into `detectProvider()`
54
+ - Added `versionWarning` field to `ProviderDetectionResult`
55
+ - 32 new unit tests in `core/__tests__/schemas/model.test.ts`
56
+
57
+ ### Test Plan
58
+
59
+ #### For QA
60
+ 1. Verify `ClaudeProvider.defaultModel` is `'sonnet'` and `supportedModels` includes `['opus', 'sonnet', 'haiku']`
61
+ 2. Verify `GeminiProvider.defaultModel` is `'2.5-flash'` and `supportedModels` includes `['2.5-pro', '2.5-flash', '2.0-flash']`
62
+ 3. Verify multi-model IDEs (Cursor, Windsurf) have `null` defaultModel and empty supportedModels
63
+ 4. Run `bun test core/__tests__/schemas/model.test.ts` — all 32 tests pass
64
+ 5. Run `bun test` — full suite (657 tests) passes with no regressions
65
+ 6. Run `bun run build` — build succeeds cleanly
66
+
67
+ #### For Users
68
+ **What changed:** Provider configs now include model specification fields. Analysis and task metadata can record which model was used. Version validation warns if CLI is outdated.
69
+ **How to use:** Existing configs work unchanged — model fields have sensible defaults. New `preferredModel` setting available in project settings.
70
+ **Breaking changes:** None — all new fields are optional or have defaults.
71
+
72
+ ## [1.8.1] - 2026-02-07
73
+
74
+ ### Bug Fixes
75
+ - **Replace keyword domain detection with LLM semantic classification (PRJ-299)**: Eliminated substring false positives in domain classification. "author" no longer matches "auth" → backend, "Build responsive dashboard" correctly routes to frontend.
76
+
77
+ ### Implementation Details
78
+ - New `core/agentic/domain-classifier.ts`: LLM-based classifier with 4-level fallback chain (cache → confirmed history → Claude Haiku API → word-boundary heuristic)
79
+ - New `core/schemas/classification.ts`: Zod schemas for TaskClassification, cache entries, and confirmed patterns
80
+ - Replaced substring `includes()` matching in `smart-context.ts` and `orchestrator-executor.ts` with word-boundary regex (`\b`)
81
+ - Removed ~230 lines of hardcoded keyword lists from both files
82
+ - Classification results cached per (project + description hash) with 1-hour TTL
83
+ - Successful classifications auto-persisted as confirmed patterns via `confirmClassification()`
84
+
85
+ ### Learnings
86
+ - Word-boundary regex (`\b`) correctly rejects "author" matching "auth" because there's no boundary between "auth" and "or" in "author"
87
+ - Using raw `fetch` to Claude API avoids adding `@anthropic-ai/sdk` dependency while keeping vendor-neutral design
88
+ - Centralized classifier in `domain-classifier.ts` consumed by both `smart-context.ts` and `orchestrator-executor.ts` eliminates duplication
89
+
90
+ ### Test Plan
91
+
92
+ #### For QA
93
+ 1. Run `bun test` — all 625 tests should pass
94
+ 2. Verify `detectDomain('Fix the author display on profile page')` returns `frontend` (not `backend`)
95
+ 3. Verify `detectDomain('Build responsive dashboard')` returns `frontend` (not `general`)
96
+ 4. Verify `detectDomain('Fix the auth middleware')` returns `backend` (standalone "auth" still works)
97
+ 5. Verify `classifyWithHeuristic` returns `general` with confidence 0.3 for unrecognizable tasks
98
+ 6. Run `bun run build` — build should succeed
99
+
100
+ #### For Users
101
+ **What changed:** Domain classification uses smarter word-boundary matching, eliminating false positives.
102
+ **How to use:** No user-facing changes — classification happens automatically during `p. task`.
103
+ **Breaking changes:** None for end users.
104
+
3
105
  ## [1.8.0] - 2026-02-07
4
106
 
5
107
  ### Features
@@ -0,0 +1,330 @@
1
+ /**
2
+ * Domain Classifier Tests
3
+ * PRJ-299: LLM-based domain classification with fallback chain
4
+ */
5
+
6
+ import { describe, expect, it } from 'bun:test'
7
+ import {
8
+ classifyWithHeuristic,
9
+ hashDescription,
10
+ type ProjectContext,
11
+ } from '../../agentic/domain-classifier'
12
+
13
+ // Default project context for testing (all domains available)
14
+ const fullContext: ProjectContext = {
15
+ domains: {
16
+ hasFrontend: true,
17
+ hasBackend: true,
18
+ hasDatabase: true,
19
+ hasTesting: true,
20
+ hasDocker: true,
21
+ },
22
+ agents: ['frontend', 'backend', 'database', 'testing', 'devops'],
23
+ stack: { language: 'TypeScript', framework: 'Hono' },
24
+ }
25
+
26
+ // Backend-only project context
27
+ const backendOnlyContext: ProjectContext = {
28
+ domains: {
29
+ hasFrontend: false,
30
+ hasBackend: true,
31
+ hasDatabase: false,
32
+ hasTesting: false,
33
+ hasDocker: false,
34
+ },
35
+ agents: ['backend'],
36
+ stack: { language: 'TypeScript', framework: 'Hono' },
37
+ }
38
+
39
+ describe('DomainClassifier PRJ-299', () => {
40
+ describe('classifyWithHeuristic', () => {
41
+ // =================================================================
42
+ // Substring Trap Tests (the whole reason for PRJ-299)
43
+ // =================================================================
44
+ describe('substring traps (critical fixes)', () => {
45
+ it('should NOT match "author" to "auth" domain', () => {
46
+ const result = classifyWithHeuristic('Fix the author display on profile page', fullContext)
47
+ // "author" should NOT trigger backend (auth)
48
+ // "profile page" and "display" should trigger frontend
49
+ expect(result.primaryDomain).not.toBe('backend')
50
+ expect(result.primaryDomain).toBe('frontend')
51
+ })
52
+
53
+ it('should match standalone "auth" to backend', () => {
54
+ const result = classifyWithHeuristic(
55
+ 'Fix the auth middleware for JWT validation',
56
+ fullContext
57
+ )
58
+ expect(result.primaryDomain).toBe('backend')
59
+ })
60
+
61
+ it('should NOT match "testament" to "test" domain', () => {
62
+ const result = classifyWithHeuristic(
63
+ 'Update the testament of the old testament module',
64
+ fullContext
65
+ )
66
+ expect(result.primaryDomain).not.toBe('testing')
67
+ })
68
+
69
+ it('should NOT match "button" to "but" in other domains', () => {
70
+ const result = classifyWithHeuristic('Add a button component', fullContext)
71
+ expect(result.primaryDomain).toBe('frontend')
72
+ })
73
+
74
+ it('should NOT match "configure" to "config" in devops', () => {
75
+ // "configure" without a devops context word should not go to devops
76
+ const result = classifyWithHeuristic('Configure the React component props', fullContext)
77
+ expect(result.primaryDomain).toBe('frontend')
78
+ })
79
+ })
80
+
81
+ // =================================================================
82
+ // Correct Classification Tests
83
+ // =================================================================
84
+ describe('frontend detection', () => {
85
+ it('should detect "Build responsive dashboard" as frontend', () => {
86
+ const result = classifyWithHeuristic('Build responsive dashboard', fullContext)
87
+ expect(result.primaryDomain).toBe('frontend')
88
+ })
89
+
90
+ it('should detect React component tasks', () => {
91
+ const result = classifyWithHeuristic('Create a modal dialog for user settings', fullContext)
92
+ expect(result.primaryDomain).toBe('frontend')
93
+ })
94
+
95
+ it('should detect CSS/styling tasks', () => {
96
+ const result = classifyWithHeuristic(
97
+ 'Fix the layout for mobile responsive view',
98
+ fullContext
99
+ )
100
+ expect(result.primaryDomain).toBe('frontend')
101
+ })
102
+
103
+ it('should detect page/navigation tasks', () => {
104
+ const result = classifyWithHeuristic(
105
+ 'Add sidebar navigation with dropdown menus',
106
+ fullContext
107
+ )
108
+ expect(result.primaryDomain).toBe('frontend')
109
+ })
110
+ })
111
+
112
+ describe('backend detection', () => {
113
+ it('should detect API endpoint tasks', () => {
114
+ const result = classifyWithHeuristic(
115
+ 'Create REST API endpoint for user management',
116
+ fullContext
117
+ )
118
+ expect(result.primaryDomain).toBe('backend')
119
+ })
120
+
121
+ it('should detect middleware tasks', () => {
122
+ const result = classifyWithHeuristic('Add rate limiting middleware', fullContext)
123
+ expect(result.primaryDomain).toBe('backend')
124
+ })
125
+
126
+ it('should detect authentication tasks', () => {
127
+ const result = classifyWithHeuristic('Implement JWT authentication flow', fullContext)
128
+ expect(result.primaryDomain).toBe('backend')
129
+ })
130
+ })
131
+
132
+ describe('database detection', () => {
133
+ it('should detect schema/migration tasks', () => {
134
+ const result = classifyWithHeuristic(
135
+ 'Create database migration for users table',
136
+ fullContext
137
+ )
138
+ expect(result.primaryDomain).toBe('database')
139
+ })
140
+
141
+ it('should detect connection pooling as database (not schema)', () => {
142
+ const result = classifyWithHeuristic('Optimize database connection pooling', fullContext)
143
+ expect(result.primaryDomain).toBe('database')
144
+ })
145
+
146
+ it('should detect ORM/Prisma tasks', () => {
147
+ const result = classifyWithHeuristic('Update Prisma schema with new entity', fullContext)
148
+ expect(result.primaryDomain).toBe('database')
149
+ })
150
+ })
151
+
152
+ describe('devops detection', () => {
153
+ it('should detect Docker tasks', () => {
154
+ const result = classifyWithHeuristic(
155
+ 'Create Docker container for production deployment',
156
+ fullContext
157
+ )
158
+ expect(result.primaryDomain).toBe('devops')
159
+ })
160
+
161
+ it('should detect CI/CD tasks', () => {
162
+ const result = classifyWithHeuristic(
163
+ 'Fix the CI pipeline for automated deployment',
164
+ fullContext
165
+ )
166
+ expect(result.primaryDomain).toBe('devops')
167
+ })
168
+ })
169
+
170
+ describe('testing detection', () => {
171
+ it('should detect test writing tasks', () => {
172
+ const result = classifyWithHeuristic('Add unit tests for the payment service', fullContext)
173
+ expect(result.primaryDomain).toBe('testing')
174
+ })
175
+
176
+ it('should detect coverage improvement tasks', () => {
177
+ const result = classifyWithHeuristic('Improve test coverage for auth module', fullContext)
178
+ expect(result.primaryDomain).toBe('testing')
179
+ })
180
+ })
181
+
182
+ // =================================================================
183
+ // Multi-domain Tasks
184
+ // =================================================================
185
+ describe('multi-domain tasks', () => {
186
+ it('should detect secondary domains', () => {
187
+ const result = classifyWithHeuristic(
188
+ 'Add API endpoint with React frontend component',
189
+ fullContext
190
+ )
191
+ expect(result.secondaryDomains.length).toBeGreaterThan(0)
192
+ })
193
+
194
+ it('should limit secondary domains to 2', () => {
195
+ const result = classifyWithHeuristic(
196
+ 'Add API endpoint with React component and Docker deploy with test coverage and database migration',
197
+ fullContext
198
+ )
199
+ expect(result.secondaryDomains.length).toBeLessThanOrEqual(2)
200
+ })
201
+ })
202
+
203
+ // =================================================================
204
+ // Project Context Filtering
205
+ // =================================================================
206
+ describe('project context filtering', () => {
207
+ it('should not classify as frontend when project has no frontend', () => {
208
+ const result = classifyWithHeuristic(
209
+ 'Add a button component with responsive layout',
210
+ backendOnlyContext
211
+ )
212
+ // Can't be frontend since project doesn't have it
213
+ // Falls through to general or docs (always available)
214
+ expect(result.primaryDomain).not.toBe('frontend')
215
+ })
216
+
217
+ it('should respect available agents', () => {
218
+ const result = classifyWithHeuristic('Create REST API endpoint', backendOnlyContext)
219
+ expect(result.primaryDomain).toBe('backend')
220
+ })
221
+ })
222
+
223
+ // =================================================================
224
+ // Confidence Scoring
225
+ // =================================================================
226
+ describe('confidence scoring', () => {
227
+ it('should have higher confidence for strong signals than multi-domain', () => {
228
+ // Single-domain (strong frontend signal) vs multi-domain (split between frontend and backend)
229
+ const strong = classifyWithHeuristic(
230
+ 'Create React component with jsx tsx ui button form modal',
231
+ fullContext
232
+ )
233
+ const split = classifyWithHeuristic(
234
+ 'Add API endpoint with React component and database query',
235
+ fullContext
236
+ )
237
+ expect(strong.confidence).toBeGreaterThanOrEqual(split.confidence)
238
+ })
239
+
240
+ it('should cap confidence at 0.85 for heuristic', () => {
241
+ const result = classifyWithHeuristic(
242
+ 'ui component react vue angular css style button form modal layout responsive animation',
243
+ fullContext
244
+ )
245
+ expect(result.confidence).toBeLessThanOrEqual(0.85)
246
+ })
247
+
248
+ it('should return 0.3 confidence for unknown domains', () => {
249
+ const result = classifyWithHeuristic(
250
+ 'Do something completely unrelated to any domain',
251
+ fullContext
252
+ )
253
+ expect(result.confidence).toBe(0.3)
254
+ expect(result.primaryDomain).toBe('general')
255
+ })
256
+ })
257
+
258
+ // =================================================================
259
+ // Edge Cases
260
+ // =================================================================
261
+ describe('edge cases', () => {
262
+ it('should handle empty description', () => {
263
+ const result = classifyWithHeuristic('', fullContext)
264
+ expect(result.primaryDomain).toBe('general')
265
+ expect(result.confidence).toBe(0.3)
266
+ })
267
+
268
+ it('should handle very long descriptions', () => {
269
+ const longDesc = 'Fix the bug in the component '.repeat(100)
270
+ const result = classifyWithHeuristic(longDesc, fullContext)
271
+ expect(result.primaryDomain).toBeDefined()
272
+ })
273
+
274
+ it('should be case-insensitive', () => {
275
+ const lower = classifyWithHeuristic('add react component', fullContext)
276
+ const upper = classifyWithHeuristic('ADD REACT COMPONENT', fullContext)
277
+ expect(lower.primaryDomain).toBe(upper.primaryDomain)
278
+ })
279
+ })
280
+ })
281
+
282
+ // =================================================================
283
+ // Hash Function
284
+ // =================================================================
285
+ describe('hashDescription', () => {
286
+ it('should produce consistent hashes', () => {
287
+ const hash1 = hashDescription('Fix the auth middleware')
288
+ const hash2 = hashDescription('Fix the auth middleware')
289
+ expect(hash1).toBe(hash2)
290
+ })
291
+
292
+ it('should be case-insensitive', () => {
293
+ const hash1 = hashDescription('Fix the Auth Middleware')
294
+ const hash2 = hashDescription('fix the auth middleware')
295
+ expect(hash1).toBe(hash2)
296
+ })
297
+
298
+ it('should trim whitespace', () => {
299
+ const hash1 = hashDescription(' Fix the auth middleware ')
300
+ const hash2 = hashDescription('Fix the auth middleware')
301
+ expect(hash1).toBe(hash2)
302
+ })
303
+
304
+ it('should produce different hashes for different descriptions', () => {
305
+ const hash1 = hashDescription('Fix frontend component')
306
+ const hash2 = hashDescription('Fix backend service')
307
+ expect(hash1).not.toBe(hash2)
308
+ })
309
+
310
+ it('should return a 16-character hex string', () => {
311
+ const hash = hashDescription('Test description')
312
+ expect(hash).toMatch(/^[a-f0-9]{16}$/)
313
+ })
314
+ })
315
+
316
+ // =================================================================
317
+ // File Patterns
318
+ // =================================================================
319
+ describe('file patterns', () => {
320
+ it('should return frontend file patterns for frontend domain', () => {
321
+ const result = classifyWithHeuristic('Add React component', fullContext)
322
+ expect(result.filePatterns.length).toBeGreaterThan(0)
323
+ })
324
+
325
+ it('should return relevant agents', () => {
326
+ const result = classifyWithHeuristic('Create REST API endpoint', fullContext)
327
+ expect(result.relevantAgents).toContain('backend')
328
+ })
329
+ })
330
+ })