task-o-matic 0.0.7 → 0.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +286 -23
- package/dist/commands/benchmark.d.ts +3 -0
- package/dist/commands/benchmark.d.ts.map +1 -0
- package/dist/commands/benchmark.js +569 -0
- package/dist/commands/prd.d.ts.map +1 -1
- package/dist/commands/prd.js +203 -9
- package/dist/commands/tasks/execute-loop.d.ts +3 -0
- package/dist/commands/tasks/execute-loop.d.ts.map +1 -0
- package/dist/commands/tasks/execute-loop.js +118 -0
- package/dist/commands/tasks/index.d.ts +1 -0
- package/dist/commands/tasks/index.d.ts.map +1 -1
- package/dist/commands/tasks/index.js +1 -0
- package/dist/commands/tasks.d.ts.map +1 -1
- package/dist/commands/tasks.js +1 -0
- package/dist/commands/workflow.d.ts.map +1 -1
- package/dist/commands/workflow.js +491 -331
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +2 -0
- package/dist/lib/ai-service/ai-operations.d.ts +5 -0
- package/dist/lib/ai-service/ai-operations.d.ts.map +1 -1
- package/dist/lib/ai-service/ai-operations.js +167 -0
- package/dist/lib/benchmark/registry.d.ts +11 -0
- package/dist/lib/benchmark/registry.d.ts.map +1 -0
- package/dist/lib/benchmark/registry.js +89 -0
- package/dist/lib/benchmark/runner.d.ts +6 -0
- package/dist/lib/benchmark/runner.d.ts.map +1 -0
- package/dist/lib/benchmark/runner.js +150 -0
- package/dist/lib/benchmark/storage.d.ts +13 -0
- package/dist/lib/benchmark/storage.d.ts.map +1 -0
- package/dist/lib/benchmark/storage.js +99 -0
- package/dist/lib/benchmark/types.d.ts +104 -0
- package/dist/lib/benchmark/types.d.ts.map +1 -0
- package/dist/lib/benchmark/types.js +2 -0
- package/dist/lib/index.d.ts +9 -0
- package/dist/lib/index.d.ts.map +1 -1
- package/dist/lib/index.js +7 -1
- package/dist/lib/prompt-registry.d.ts.map +1 -1
- package/dist/lib/prompt-registry.js +23 -0
- package/dist/lib/task-loop-execution.d.ts +25 -0
- package/dist/lib/task-loop-execution.d.ts.map +1 -0
- package/dist/lib/task-loop-execution.js +473 -0
- package/dist/prompts/index.d.ts +7 -6
- package/dist/prompts/index.d.ts.map +1 -1
- package/dist/prompts/index.js +1 -0
- package/dist/prompts/prd-question.d.ts +3 -0
- package/dist/prompts/prd-question.d.ts.map +1 -0
- package/dist/prompts/prd-question.js +40 -0
- package/dist/services/benchmark.d.ts +12 -0
- package/dist/services/benchmark.d.ts.map +1 -0
- package/dist/services/benchmark.js +18 -0
- package/dist/services/prd.d.ts +25 -0
- package/dist/services/prd.d.ts.map +1 -1
- package/dist/services/prd.js +224 -29
- package/dist/services/tasks.d.ts.map +1 -1
- package/dist/services/tasks.js +90 -3
- package/dist/services/workflow-benchmark.d.ts +34 -0
- package/dist/services/workflow-benchmark.d.ts.map +1 -0
- package/dist/services/workflow-benchmark.js +317 -0
- package/dist/services/workflow.d.ts +85 -0
- package/dist/services/workflow.d.ts.map +1 -0
- package/dist/services/workflow.js +476 -0
- package/dist/test/task-loop-git.test.d.ts +2 -0
- package/dist/test/task-loop-git.test.d.ts.map +1 -0
- package/dist/test/task-loop-git.test.js +62 -0
- package/dist/types/index.d.ts +53 -0
- package/dist/types/index.d.ts.map +1 -1
- package/dist/types/options.d.ts +2 -1
- package/dist/types/options.d.ts.map +1 -1
- package/dist/types/options.js +16 -0
- package/dist/types/results.d.ts +29 -1
- package/dist/types/results.d.ts.map +1 -1
- package/dist/types/workflow-options.d.ts +45 -0
- package/dist/types/workflow-options.d.ts.map +1 -0
- package/dist/types/workflow-options.js +2 -0
- package/dist/types/workflow-results.d.ts +82 -0
- package/dist/types/workflow-results.d.ts.map +1 -0
- package/dist/types/workflow-results.js +2 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -6,6 +6,8 @@ AI-powered task management for CLI, TUI, and web applications. Parse PRDs, enhan
|
|
|
6
6
|
|
|
7
7
|
- 🤖 **AI-Powered**: Parse PRDs and enhance tasks using multiple AI providers
|
|
8
8
|
- 🎭 **Interactive Workflow**: Guided setup from project init to task generation with AI assistance
|
|
9
|
+
- ❓ **PRD Question/Refine**: AI generates clarifying questions and can answer them automatically
|
|
10
|
+
- 🧠 **AI Reasoning Support**: Enable advanced reasoning for better PRD refinement
|
|
9
11
|
- 📦 **Multi-Purpose Package**: Use as CLI tool, library, or MCP server
|
|
10
12
|
- 📁 **Project-Local Storage**: All data stored locally in `.task-o-matic/` directory
|
|
11
13
|
- 🎯 **Task Management**: Full CRUD operations with AI enhancement
|
|
@@ -14,6 +16,8 @@ AI-powered task management for CLI, TUI, and web applications. Parse PRDs, enhan
|
|
|
14
16
|
- 🔧 **Multi-Provider AI**: Support for OpenAI, Anthropic, OpenRouter, and custom providers
|
|
15
17
|
- 📊 **Smart Breakdown**: AI-powered task decomposition into subtasks
|
|
16
18
|
- 🌊 **Real-time Streaming**: Watch AI responses generate live with streaming output
|
|
19
|
+
- 📊 **Model Benchmarking**: Compare performance and quality across different AI models
|
|
20
|
+
- 🏁 **Workflow Benchmarking**: Test complete workflows across multiple models and compare results
|
|
17
21
|
- 🏠 **Single-Project Focus**: Self-contained within each project directory
|
|
18
22
|
- 💻 **Framework-Agnostic**: Easily integrate into TUI, web apps, or any Node.js project
|
|
19
23
|
|
|
@@ -55,14 +59,14 @@ task-o-matic/
|
|
|
55
59
|
├── dist/ # Compiled output (published)
|
|
56
60
|
│ ├── lib/ # Library entry point + core exports
|
|
57
61
|
│ ├── cli/ # CLI binary
|
|
58
|
-
│ ├── services/ # Business logic layer
|
|
62
|
+
│ ├── services/ # Business logic layer (WorkflowService, PRDService, TaskService)
|
|
59
63
|
│ ├── commands/ # CLI commands
|
|
60
64
|
│ ├── mcp/ # MCP server
|
|
61
65
|
│ └── types/ # TypeScript definitions
|
|
62
66
|
├── src/
|
|
63
67
|
│ ├── lib/ # Core library (Storage, Config, AI, etc.)
|
|
64
68
|
│ │ └── index.ts # Main library exports
|
|
65
|
-
│ ├── services/ #
|
|
69
|
+
│ ├── services/ # WorkflowService, PRDService, TaskService (framework-agnostic)
|
|
66
70
|
│ ├── cli/ # CLI-specific logic
|
|
67
71
|
│ │ └── bin.ts # CLI binary entry point
|
|
68
72
|
│ ├── commands/ # Commander.js command implementations
|
|
@@ -75,8 +79,8 @@ task-o-matic/
|
|
|
75
79
|
|
|
76
80
|
### Core Components
|
|
77
81
|
|
|
78
|
-
- **Service Layer** (`
|
|
79
|
-
- **AI Service**: Uses Vercel AI SDK for multi-provider support
|
|
82
|
+
- **Service Layer** (`WorkflowService`, `PRDService`, `TaskService`): Framework-agnostic business logic
|
|
83
|
+
- **AI Service**: Uses Vercel AI SDK for multi-provider support with reasoning capabilities
|
|
80
84
|
- **Local Storage**: JSON-based file storage in `.task-o-matic/` directory
|
|
81
85
|
- **Configuration**: Project-local config with AI provider settings
|
|
82
86
|
- **Prompt Templates**: Structured AI prompts for consistent results
|
|
@@ -104,24 +108,45 @@ npm install task-o-matic
|
|
|
104
108
|
|
|
105
109
|
```typescript
|
|
106
110
|
import {
|
|
111
|
+
WorkflowService,
|
|
107
112
|
TaskService,
|
|
108
113
|
PRDService,
|
|
109
114
|
type Task,
|
|
110
115
|
type AIConfig,
|
|
111
116
|
} from "task-o-matic";
|
|
112
117
|
|
|
113
|
-
//
|
|
118
|
+
// Use the workflow service for complete project setup
|
|
119
|
+
const workflowService = new WorkflowService();
|
|
120
|
+
|
|
121
|
+
const result = await workflowService.initializeProject({
|
|
122
|
+
projectName: "my-app",
|
|
123
|
+
initMethod: "quick",
|
|
124
|
+
bootstrap: true,
|
|
125
|
+
aiOptions: {
|
|
126
|
+
aiProvider: "anthropic",
|
|
127
|
+
aiModel: "claude-3-5-sonnet",
|
|
128
|
+
aiKey: process.env.ANTHROPIC_API_KEY,
|
|
129
|
+
},
|
|
130
|
+
callbacks: {
|
|
131
|
+
onProgress: (event) => {
|
|
132
|
+
console.log(`Progress: ${event.message}`);
|
|
133
|
+
},
|
|
134
|
+
},
|
|
135
|
+
});
|
|
136
|
+
|
|
137
|
+
console.log("Project initialized:", result.projectName);
|
|
138
|
+
|
|
139
|
+
// Or use task service directly
|
|
114
140
|
const taskService = new TaskService();
|
|
115
141
|
|
|
116
|
-
|
|
117
|
-
const result = await taskService.createTask({
|
|
142
|
+
const taskResult = await taskService.createTask({
|
|
118
143
|
title: "Implement user authentication",
|
|
119
144
|
content: "Add login and signup functionality",
|
|
120
145
|
aiEnhance: true,
|
|
121
146
|
aiOptions: {
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
147
|
+
aiProvider: "anthropic",
|
|
148
|
+
aiModel: "claude-3-5-sonnet",
|
|
149
|
+
aiKey: process.env.ANTHROPIC_API_KEY,
|
|
125
150
|
},
|
|
126
151
|
callbacks: {
|
|
127
152
|
onProgress: (event) => {
|
|
@@ -130,7 +155,7 @@ const result = await taskService.createTask({
|
|
|
130
155
|
},
|
|
131
156
|
});
|
|
132
157
|
|
|
133
|
-
console.log("Task created:",
|
|
158
|
+
console.log("Task created:", taskResult.task);
|
|
134
159
|
```
|
|
135
160
|
|
|
136
161
|
#### TUI Integration Example
|
|
@@ -167,20 +192,28 @@ const result = await taskService.createTask({
|
|
|
167
192
|
});
|
|
168
193
|
```
|
|
169
194
|
|
|
170
|
-
#### PRD
|
|
195
|
+
#### PRD Question/Refine Example
|
|
171
196
|
|
|
172
197
|
```typescript
|
|
173
198
|
import { PRDService } from "task-o-matic";
|
|
174
199
|
|
|
175
200
|
const prdService = new PRDService();
|
|
176
201
|
|
|
177
|
-
|
|
202
|
+
// Generate questions and refine PRD with AI answering
|
|
203
|
+
const result = await prdService.refinePRDWithQuestions({
|
|
178
204
|
file: "./requirements.md",
|
|
205
|
+
questionMode: "ai", // or "user" for interactive
|
|
206
|
+
questionAIOptions: {
|
|
207
|
+
// Optional: use a different AI for answering
|
|
208
|
+
aiProvider: "openrouter",
|
|
209
|
+
aiModel: "anthropic/claude-3-opus",
|
|
210
|
+
aiReasoning: "enabled", // Enable reasoning for better answers
|
|
211
|
+
},
|
|
179
212
|
workingDirectory: process.cwd(),
|
|
180
213
|
aiOptions: {
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
214
|
+
aiProvider: "anthropic",
|
|
215
|
+
aiModel: "claude-3-5-sonnet",
|
|
216
|
+
aiKey: process.env.ANTHROPIC_API_KEY,
|
|
184
217
|
},
|
|
185
218
|
callbacks: {
|
|
186
219
|
onProgress: (event) => {
|
|
@@ -189,9 +222,10 @@ const result = await prdService.parsePRD({
|
|
|
189
222
|
},
|
|
190
223
|
});
|
|
191
224
|
|
|
192
|
-
console.log(`
|
|
193
|
-
result.
|
|
194
|
-
console.log(
|
|
225
|
+
console.log(`Refined PRD with ${result.questions.length} questions`);
|
|
226
|
+
result.questions.forEach((q, i) => {
|
|
227
|
+
console.log(`Q${i + 1}: ${q}`);
|
|
228
|
+
console.log(`A${i + 1}: ${result.answers[q]}`);
|
|
195
229
|
});
|
|
196
230
|
```
|
|
197
231
|
|
|
@@ -227,6 +261,18 @@ import type {
|
|
|
227
261
|
CreateTaskOptions,
|
|
228
262
|
PRDParseResult,
|
|
229
263
|
TaskAIMetadata,
|
|
264
|
+
// Workflow types
|
|
265
|
+
WorkflowService,
|
|
266
|
+
InitializeResult,
|
|
267
|
+
DefinePRDResult,
|
|
268
|
+
RefinePRDResult,
|
|
269
|
+
GenerateTasksResult,
|
|
270
|
+
SplitTasksResult,
|
|
271
|
+
// Benchmark types
|
|
272
|
+
WorkflowBenchmarkInput,
|
|
273
|
+
WorkflowBenchmarkResult,
|
|
274
|
+
BenchmarkConfig,
|
|
275
|
+
BenchmarkResult,
|
|
230
276
|
} from "task-o-matic";
|
|
231
277
|
```
|
|
232
278
|
|
|
@@ -310,15 +356,23 @@ task-o-matic workflow
|
|
|
310
356
|
|
|
311
357
|
# With streaming AI output
|
|
312
358
|
task-o-matic workflow --stream
|
|
359
|
+
|
|
360
|
+
# Want to test multiple AI models? Try workflow benchmarking:
|
|
361
|
+
task-o-matic benchmark workflow --models "openai:gpt-4o,anthropic:claude-3-5-sonnet"
|
|
313
362
|
```
|
|
314
363
|
|
|
315
364
|
**The workflow will guide you through:**
|
|
316
365
|
|
|
317
366
|
1. **Project Initialization** - Choose quick start, custom, or AI-assisted configuration
|
|
318
367
|
2. **PRD Definition** - Upload file, write manually, or use AI to generate from description
|
|
319
|
-
3. **PRD
|
|
320
|
-
|
|
321
|
-
|
|
368
|
+
3. **PRD Question/Refine** (NEW) - AI generates clarifying questions and refines PRD
|
|
369
|
+
- User can answer questions interactively
|
|
370
|
+
- OR AI can answer with PRD + stack context
|
|
371
|
+
- Optional: Use different AI model for answering (e.g., smarter model)
|
|
372
|
+
- Optional: Enable reasoning for better answers
|
|
373
|
+
4. **PRD Manual Refinement** - Optional additional AI-assisted improvements
|
|
374
|
+
5. **Task Generation** - Parse PRD into actionable tasks
|
|
375
|
+
6. **Task Splitting** - Break down complex tasks into subtasks
|
|
322
376
|
|
|
323
377
|
**AI Assistance at Every Step:**
|
|
324
378
|
|
|
@@ -344,6 +398,7 @@ task-o-matic workflow --stream
|
|
|
344
398
|
- [AI Integration](docs/ai-integration.md) - AI providers and prompt engineering
|
|
345
399
|
- [Project Initialization](docs/projects.md) - Project setup and bootstrapping
|
|
346
400
|
- [Streaming Output](docs/streaming.md) - Real-time AI streaming capabilities
|
|
401
|
+
- [Model Benchmarking](docs/benchmarking.md) - Compare AI models and workflow performance
|
|
347
402
|
|
|
348
403
|
## 🎯 Common Workflows
|
|
349
404
|
|
|
@@ -395,7 +450,74 @@ task-o-matic tasks create --title "Add payment system" --ai-enhance --stream
|
|
|
395
450
|
task-o-matic tasks split --task-id <task-id>
|
|
396
451
|
```
|
|
397
452
|
|
|
398
|
-
### Workflow 3:
|
|
453
|
+
### Workflow 3: Benchmarking Models
|
|
454
|
+
|
|
455
|
+
Compare different AI models for performance, cost, and quality.
|
|
456
|
+
|
|
457
|
+
```bash
|
|
458
|
+
# 1. Run a benchmark for PRD parsing
|
|
459
|
+
task-o-matic benchmark run prd-parse \
|
|
460
|
+
--file requirements.md \
|
|
461
|
+
--models "openai:gpt-4o,openrouter:anthropic/claude-3.5-sonnet" \
|
|
462
|
+
--concurrency 5
|
|
463
|
+
|
|
464
|
+
# 2. Compare results
|
|
465
|
+
task-o-matic benchmark compare <run-id>
|
|
466
|
+
|
|
467
|
+
# 3. View detailed metrics (Tokens, BPS, Size)
|
|
468
|
+
task-o-matic benchmark show <run-id>
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
### Workflow 3b: Complete Workflow Benchmarking
|
|
472
|
+
|
|
473
|
+
Test entire workflows across multiple AI models and automatically set up your project with the best results.
|
|
474
|
+
|
|
475
|
+
```bash
|
|
476
|
+
# 1. Basic workflow benchmark with interactive setup
|
|
477
|
+
task-o-matic benchmark workflow \
|
|
478
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet,openrouter:qwen/qwen-2.5-72b-instruct" \
|
|
479
|
+
--concurrency 2 \
|
|
480
|
+
--delay 1000
|
|
481
|
+
|
|
482
|
+
# 2. Automated workflow benchmark
|
|
483
|
+
task-o-matic benchmark workflow \
|
|
484
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet" \
|
|
485
|
+
--project-name "my-saas-app" \
|
|
486
|
+
--project-description "Team collaboration platform with real-time chat" \
|
|
487
|
+
--init-method ai \
|
|
488
|
+
--prd-method ai \
|
|
489
|
+
--auto-accept \
|
|
490
|
+
--skip-all
|
|
491
|
+
|
|
492
|
+
# 3. Benchmark with specific workflow options
|
|
493
|
+
task-o-matic benchmark workflow \
|
|
494
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet" \
|
|
495
|
+
--project-name "e-commerce-app" \
|
|
496
|
+
--init-method custom \
|
|
497
|
+
--frontend next \
|
|
498
|
+
--backend hono \
|
|
499
|
+
--database postgres \
|
|
500
|
+
--prd-method ai \
|
|
501
|
+
--prd-description "Modern e-commerce platform with AI recommendations" \
|
|
502
|
+
--refine-feedback "Focus on scalability and security" \
|
|
503
|
+
--split-all
|
|
504
|
+
|
|
505
|
+
# Results include:
|
|
506
|
+
# - Comprehensive comparison table (duration, tasks, PRD size, costs)
|
|
507
|
+
# - Detailed per-model breakdowns with timing and token metrics
|
|
508
|
+
# - Interactive selection to choose the best model
|
|
509
|
+
# - Automatic project setup with selected model's results
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
**Workflow Benchmark Features:**
|
|
513
|
+
|
|
514
|
+
- **Two-Phase Execution**: Interactive question collection, then parallel execution
|
|
515
|
+
- **Complete Workflow**: Project init → PRD creation → task generation → task splitting
|
|
516
|
+
- **Comprehensive Metrics**: Performance, cost, quality, and output comparison
|
|
517
|
+
- **Model Selection**: Choose the best performer and auto-setup your project
|
|
518
|
+
- **Identical Conditions**: All models receive the same inputs for fair comparison
|
|
519
|
+
|
|
520
|
+
### Workflow 4: Project Bootstrapping
|
|
399
521
|
|
|
400
522
|
```bash
|
|
401
523
|
# Option 1: One-step setup (recommended)
|
|
@@ -414,6 +536,130 @@ task-o-matic init bootstrap my-app
|
|
|
414
536
|
task-o-matic tasks create --title "Set up development environment" --ai-enhance --stream
|
|
415
537
|
```
|
|
416
538
|
|
|
539
|
+
## 📊 Benchmarking Commands
|
|
540
|
+
|
|
541
|
+
### Basic Model Benchmarking
|
|
542
|
+
|
|
543
|
+
Compare different AI models on specific operations:
|
|
544
|
+
|
|
545
|
+
```bash
|
|
546
|
+
# Benchmark PRD parsing across multiple models
|
|
547
|
+
task-o-matic benchmark run prd-parse \
|
|
548
|
+
--file requirements.md \
|
|
549
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet,openrouter:qwen/qwen-2.5-72b-instruct" \
|
|
550
|
+
--concurrency 3 \
|
|
551
|
+
--delay 1000
|
|
552
|
+
|
|
553
|
+
# Benchmark task splitting
|
|
554
|
+
task-o-matic benchmark run task-breakdown \
|
|
555
|
+
--task-id <task-id> \
|
|
556
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet" \
|
|
557
|
+
--concurrency 2
|
|
558
|
+
|
|
559
|
+
# View benchmark results
|
|
560
|
+
task-o-matic benchmark list
|
|
561
|
+
task-o-matic benchmark show <run-id>
|
|
562
|
+
task-o-matic benchmark compare <run-id>
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
### Complete Workflow Benchmarking
|
|
566
|
+
|
|
567
|
+
Test entire project workflows across multiple AI models:
|
|
568
|
+
|
|
569
|
+
```bash
|
|
570
|
+
# Interactive workflow benchmark (recommended)
|
|
571
|
+
task-o-matic benchmark workflow \
|
|
572
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet,openrouter:qwen/qwen-2.5-72b-instruct"
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
**What happens:**
|
|
576
|
+
1. **Phase 1**: You answer workflow questions once (project setup, PRD creation, etc.)
|
|
577
|
+
2. **Phase 2**: All models execute the identical workflow in parallel
|
|
578
|
+
3. **Results**: Comprehensive comparison table with metrics and model selection
|
|
579
|
+
|
|
580
|
+
**Full automation example:**
|
|
581
|
+
|
|
582
|
+
```bash
|
|
583
|
+
task-o-matic benchmark workflow \
|
|
584
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet" \
|
|
585
|
+
--project-name "my-saas-platform" \
|
|
586
|
+
--project-description "Team collaboration platform with real-time messaging" \
|
|
587
|
+
--init-method ai \
|
|
588
|
+
--prd-method ai \
|
|
589
|
+
--auto-accept \
|
|
590
|
+
--refine-feedback "Add more technical details and security considerations" \
|
|
591
|
+
--generate-instructions "Focus on MVP features and break into small tasks" \
|
|
592
|
+
--split-all \
|
|
593
|
+
--concurrency 2 \
|
|
594
|
+
--delay 2000
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
**Output includes:**
|
|
598
|
+
|
|
599
|
+
```
|
|
600
|
+
📊 Workflow Benchmark Results
|
|
601
|
+
|
|
602
|
+
Model | Duration | Tasks | PRD Size | Steps | Cost
|
|
603
|
+
---------------------------------------- | ---------- | ----- | ---------- | ----- | ----------
|
|
604
|
+
openai:gpt-4o | 45234ms | 12 | 2843 chars | 5/5 | $0.023400
|
|
605
|
+
anthropic:claude-3-5-sonnet | 42156ms | 15 | 3021 chars | 5/5 | $0.019800
|
|
606
|
+
|
|
607
|
+
🔍 Detailed Comparison
|
|
608
|
+
|
|
609
|
+
[1] openai:gpt-4o
|
|
610
|
+
Duration: 45234ms
|
|
611
|
+
Steps Completed: 5/5
|
|
612
|
+
Init: 2341ms
|
|
613
|
+
PRD Generation: 12456ms
|
|
614
|
+
Task Generation: 8234ms
|
|
615
|
+
Task Splitting: 3421ms
|
|
616
|
+
Tasks Created: 12
|
|
617
|
+
PRD Size: 2843 characters
|
|
618
|
+
Tokens: 4521 (Prompt: 2341, Completion: 2180)
|
|
619
|
+
Cost: $0.023400
|
|
620
|
+
|
|
621
|
+
🎯 Model Selection
|
|
622
|
+
Would you like to select a model and set up your project with its results? (y/N)
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
### Benchmark Options
|
|
626
|
+
|
|
627
|
+
All benchmark commands support:
|
|
628
|
+
|
|
629
|
+
- `--models <list>`: Comma-separated model list (required)
|
|
630
|
+
- `--concurrency <number>`: Max parallel requests (default: 3)
|
|
631
|
+
- `--delay <ms>`: Delay between requests (default: 1000ms)
|
|
632
|
+
|
|
633
|
+
**Model format:** `provider:model[:reasoning=<tokens>]`
|
|
634
|
+
|
|
635
|
+
**Examples:**
|
|
636
|
+
- `openai:gpt-4o`
|
|
637
|
+
- `anthropic:claude-3-5-sonnet`
|
|
638
|
+
- `openrouter:anthropic/claude-3.5-sonnet`
|
|
639
|
+
- `openrouter:openai/o1-preview:reasoning=50000`
|
|
640
|
+
|
|
641
|
+
### Workflow Benchmark Inheritance
|
|
642
|
+
|
|
643
|
+
The `benchmark workflow` command supports ALL workflow command options:
|
|
644
|
+
|
|
645
|
+
```bash
|
|
646
|
+
# All these workflow options work in benchmarks:
|
|
647
|
+
--project-name, --init-method, --project-description
|
|
648
|
+
--frontend, --backend, --database, --auth/--no-auth
|
|
649
|
+
--prd-method, --prd-file, --prd-description, --prd-content
|
|
650
|
+
--refine-feedback, --generate-instructions
|
|
651
|
+
--split-tasks, --split-all, --split-instructions
|
|
652
|
+
--skip-init, --skip-prd, --skip-refine, --skip-generate, --skip-split
|
|
653
|
+
--stream, --auto-accept, --config-file
|
|
654
|
+
```
|
|
655
|
+
|
|
656
|
+
This allows you to:
|
|
657
|
+
- **Pre-configure workflow steps** via command-line options
|
|
658
|
+
- **Skip interactive questions** for automated benchmarking
|
|
659
|
+
- **Compare identical workflows** across different models
|
|
660
|
+
- **Test specific scenarios** (e.g., only AI vs only custom stack)
|
|
661
|
+
```
|
|
662
|
+
|
|
417
663
|
## 🔧 Environment Variables
|
|
418
664
|
|
|
419
665
|
```bash
|
|
@@ -445,6 +691,23 @@ AI_TEMPERATURE=0.7
|
|
|
445
691
|
- **PRD Parsing**: `claude-3.5-sonnet` or `gpt-4`
|
|
446
692
|
- **Task Enhancement**: `claude-3-haiku` or `gpt-3.5-turbo`
|
|
447
693
|
- **Task Breakdown**: `claude-3.5-sonnet` for complex tasks
|
|
694
|
+
- **Workflow Benchmarking**: Test 2-3 models to find optimal performance for your use case
|
|
695
|
+
|
|
696
|
+
### Choosing the Right Model
|
|
697
|
+
|
|
698
|
+
Not sure which model to use? Try workflow benchmarking:
|
|
699
|
+
|
|
700
|
+
```bash
|
|
701
|
+
# Test your specific workflow across multiple models
|
|
702
|
+
task-o-matic benchmark workflow \
|
|
703
|
+
--models "openai:gpt-4o,anthropic:claude-3-5-sonnet,openrouter:qwen/qwen-2.5-72b-instruct" \
|
|
704
|
+
--project-description "Your project description here"
|
|
705
|
+
|
|
706
|
+
# The benchmark will show you:
|
|
707
|
+
# - Performance (speed, tokens, cost)
|
|
708
|
+
# - Quality (tasks created, PRD completeness)
|
|
709
|
+
# - Best model for your specific needs
|
|
710
|
+
```
|
|
448
711
|
|
|
449
712
|
## 📁 Storage Structure
|
|
450
713
|
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"benchmark.d.ts","sourceRoot":"","sources":["../../src/commands/benchmark.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAapC,eAAO,MAAM,gBAAgB,SAE5B,CAAC"}
|