@corbat-tech/coco 1.0.2 β 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +178 -373
- package/dist/cli/index.js +2403 -368
- package/dist/cli/index.js.map +1 -1
- package/dist/index.d.ts +12 -4
- package/dist/index.js +10052 -8297
- package/dist/index.js.map +1 -1
- package/package.json +36 -23
package/README.md
CHANGED
|
@@ -1,462 +1,267 @@
|
|
|
1
|
-
# π₯₯ Corbat-Coco
|
|
1
|
+
# π₯₯ Corbat-Coco
|
|
2
2
|
|
|
3
|
-
**The
|
|
3
|
+
**The open-source coding agent that iterates until your code is actually good.**
|
|
4
4
|
|
|
5
5
|
[](https://www.typescriptlang.org/)
|
|
6
6
|
[](https://nodejs.org/)
|
|
7
7
|
[](./LICENSE)
|
|
8
|
-
[](./)
|
|
9
|
+
[](./)
|
|
9
10
|
|
|
10
11
|
---
|
|
11
12
|
|
|
12
|
-
##
|
|
13
|
-
|
|
14
|
-
Most AI coding assistants generate code and hope for the best. Coco is different:
|
|
15
|
-
|
|
16
|
-
1. **Generates** code with your favorite LLM (Claude, GPT-4, Gemini)
|
|
17
|
-
2. **Measures** quality with real metrics (coverage, security, complexity)
|
|
18
|
-
3. **Analyzes** test failures to find root causes
|
|
19
|
-
4. **Fixes** issues with targeted changes
|
|
20
|
-
5. **Repeats** until quality reaches 85+ (senior engineer level)
|
|
21
|
-
|
|
22
|
-
All autonomous. All verifiable. All open source.
|
|
23
|
-
|
|
24
|
-
---
|
|
25
|
-
|
|
26
|
-
## The Problem with AI Code Generation
|
|
27
|
-
|
|
28
|
-
Current AI assistants:
|
|
29
|
-
- Generate code that looks good but fails in production
|
|
30
|
-
- Don't run tests or validate output
|
|
31
|
-
- Make you iterate manually
|
|
32
|
-
- Can't coordinate complex tasks
|
|
33
|
-
|
|
34
|
-
**Result**: You spend hours debugging AI-generated code.
|
|
35
|
-
|
|
36
|
-
---
|
|
37
|
-
|
|
38
|
-
## How Coco Solves It
|
|
39
|
-
|
|
40
|
-
### 1. Real Quality Measurement
|
|
41
|
-
|
|
42
|
-
Coco measures 12 dimensions of code quality:
|
|
43
|
-
- **Test Coverage**: Runs your tests with c8/v8 instrumentation (not estimated)
|
|
44
|
-
- **Security**: Scans for vulnerabilities with npm audit + OWASP checks
|
|
45
|
-
- **Complexity**: Calculates cyclomatic complexity from AST
|
|
46
|
-
- **Correctness**: Validates tests pass + builds succeed
|
|
47
|
-
- **Maintainability**: Real metrics from code analysis
|
|
48
|
-
- ... and 7 more
|
|
13
|
+
## The Problem
|
|
49
14
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
Current state: **58.3% real measurements** (up from 0%), with 41.7% still using safe defaults.
|
|
53
|
-
|
|
54
|
-
### 2. Smart Iteration Loop
|
|
55
|
-
|
|
56
|
-
When tests fail, Coco:
|
|
57
|
-
- Parses stack traces to find the error location
|
|
58
|
-
- Reads surrounding code for context
|
|
59
|
-
- Diagnoses root cause (not just symptoms)
|
|
60
|
-
- Generates targeted fix (not rewriting entire file)
|
|
61
|
-
- Re-validates and repeats if needed
|
|
62
|
-
|
|
63
|
-
**Target**: 70%+ of failures fixed in first iteration.
|
|
64
|
-
|
|
65
|
-
### 3. Multi-Agent Coordination
|
|
66
|
-
|
|
67
|
-
Complex tasks are decomposed and executed by specialized agents:
|
|
68
|
-
- **Researcher**: Explores codebase, finds patterns
|
|
69
|
-
- **Coder**: Writes production code
|
|
70
|
-
- **Tester**: Generates comprehensive tests
|
|
71
|
-
- **Reviewer**: Identifies issues
|
|
72
|
-
- **Optimizer**: Reduces complexity
|
|
73
|
-
|
|
74
|
-
Agents work in parallel where possible, coordinate when needed.
|
|
75
|
-
|
|
76
|
-
### 4. AST-Aware Validation
|
|
77
|
-
|
|
78
|
-
Before saving any file:
|
|
79
|
-
- Parses AST to validate syntax
|
|
80
|
-
- Checks TypeScript semantics
|
|
81
|
-
- Analyzes imports
|
|
82
|
-
- Verifies build succeeds
|
|
83
|
-
|
|
84
|
-
**Result**: Zero broken builds from AI edits.
|
|
85
|
-
|
|
86
|
-
### 5. Production Hardening
|
|
87
|
-
|
|
88
|
-
- **Error Recovery**: Auto-recovers from 8 error types (syntax, timeout, dependencies, etc.)
|
|
89
|
-
- **Checkpoint/Resume**: Ctrl+C saves state, resume anytime
|
|
90
|
-
- **Resource Limits**: Prevents runaway costs with configurable quotas
|
|
91
|
-
- **Streaming Output**: Real-time feedback as code generates
|
|
92
|
-
|
|
93
|
-
---
|
|
15
|
+
AI coding assistants generate code and hope for the best. You paste it in, tests fail, you iterate manually, you lose an hour. Studies show **67% of AI-generated PRs get rejected** on first review.
|
|
94
16
|
|
|
95
|
-
##
|
|
17
|
+
## The Solution
|
|
96
18
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
1. **Converge**: Gather requirements, create specification
|
|
100
|
-
2. **Orchestrate**: Design architecture, create task backlog
|
|
101
|
-
3. **Complete**: Execute tasks with quality iteration
|
|
102
|
-
4. **Output**: Generate CI/CD, docs, deployment config
|
|
103
|
-
|
|
104
|
-
### Quality Iteration Loop
|
|
19
|
+
Coco doesn't stop at code generation. It runs your tests, measures quality across 12 dimensions, diagnoses failures, generates targeted fixes, and repeats β autonomously β until quality reaches a configurable threshold (default: 85/100).
|
|
105
20
|
|
|
106
21
|
```
|
|
107
|
-
Generate
|
|
108
|
-
|
|
109
|
-
|
|
22
|
+
Generate β Test β Measure β Diagnose β Fix β Repeat
|
|
23
|
+
β
|
|
24
|
+
Quality β₯ 85? β Done β
|
|
110
25
|
```
|
|
111
26
|
|
|
112
|
-
|
|
113
|
-
- Quality β₯ 85/100 (minimum)
|
|
114
|
-
- Score stable for 2+ iterations
|
|
115
|
-
- Tests all passing
|
|
116
|
-
- Or max 10 iterations reached
|
|
117
|
-
|
|
118
|
-
### Real Analyzers
|
|
119
|
-
|
|
120
|
-
| Analyzer | What It Measures | Data Source |
|
|
121
|
-
|----------|------------------|-------------|
|
|
122
|
-
| Coverage | Lines, branches, functions, statements | c8/v8 instrumentation |
|
|
123
|
-
| Security | Vulnerabilities, dangerous patterns | npm audit + static analysis |
|
|
124
|
-
| Complexity | Cyclomatic complexity, maintainability | AST traversal |
|
|
125
|
-
| Duplication | Code similarity, redundancy | Token-based comparison |
|
|
126
|
-
| Build | Compilation success | tsc/build execution |
|
|
127
|
-
| Import | Missing dependencies, circular deps | AST + package.json |
|
|
27
|
+
**This is the Quality Convergence Loop.** No other open-source coding agent does this.
|
|
128
28
|
|
|
129
29
|
---
|
|
130
30
|
|
|
131
31
|
## Quick Start
|
|
132
32
|
|
|
133
|
-
### Installation
|
|
134
|
-
|
|
135
33
|
```bash
|
|
136
34
|
npm install -g corbat-coco
|
|
35
|
+
coco init # Configure your LLM provider
|
|
36
|
+
coco "Build a REST API with authentication" # That's it
|
|
137
37
|
```
|
|
138
38
|
|
|
139
|
-
|
|
39
|
+
Coco will generate code, run tests, iterate until quality passes, and generate CI/CD + docs.
|
|
140
40
|
|
|
141
|
-
|
|
142
|
-
coco init
|
|
143
|
-
```
|
|
41
|
+
---
|
|
144
42
|
|
|
145
|
-
|
|
146
|
-
- AI Provider (Anthropic, OpenAI, Google)
|
|
147
|
-
- API Key
|
|
148
|
-
- Project preferences
|
|
43
|
+
## What Makes Coco Different
|
|
149
44
|
|
|
150
|
-
###
|
|
45
|
+
### 1. Quality Convergence Loop (Unique Differentiator)
|
|
151
46
|
|
|
152
|
-
|
|
153
|
-
coco "Build a REST API with JWT authentication"
|
|
154
|
-
```
|
|
47
|
+
Other agents generate code once. Coco iterates:
|
|
155
48
|
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
49
|
+
| Iteration | Score | What Happened |
|
|
50
|
+
|-----------|-------|---------------|
|
|
51
|
+
| 1 | 52/100 | Generated code, 3 tests failing |
|
|
52
|
+
| 2 | 71/100 | Fixed test failures, found security issue |
|
|
53
|
+
| 3 | 84/100 | Fixed security, improved coverage |
|
|
54
|
+
| 4 | 91/100 | All tests pass, quality converged β
|
|
|
162
55
|
|
|
163
|
-
|
|
56
|
+
The loop stops when:
|
|
57
|
+
- Score β₯ 85/100 (configurable)
|
|
58
|
+
- Score stabilized (delta < 2 between iterations)
|
|
59
|
+
- All critical issues resolved
|
|
60
|
+
- Or max 10 iterations reached
|
|
164
61
|
|
|
165
|
-
|
|
166
|
-
coco resume
|
|
167
|
-
```
|
|
62
|
+
### 2. 12-Dimension Quality Scoring
|
|
168
63
|
|
|
169
|
-
|
|
64
|
+
Every iteration measures code across 12 real dimensions:
|
|
170
65
|
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
66
|
+
| Dimension | Method | Type |
|
|
67
|
+
|-----------|--------|------|
|
|
68
|
+
| **Test Coverage** | c8/v8 instrumentation | Instrumented |
|
|
69
|
+
| **Security** | Pattern matching + optional Snyk | Instrumented |
|
|
70
|
+
| **Complexity** | Cyclomatic complexity via AST | Instrumented |
|
|
71
|
+
| **Duplication** | Line-based similarity detection | Instrumented |
|
|
72
|
+
| **Correctness** | Test pass rate + build verification | Instrumented |
|
|
73
|
+
| **Style** | oxlint/eslint/biome integration | Instrumented |
|
|
74
|
+
| **Documentation** | JSDoc coverage analysis | Instrumented |
|
|
75
|
+
| **Readability** | AST: naming quality, function length, nesting depth | Heuristic |
|
|
76
|
+
| **Maintainability** | AST: file length, coupling, function count | Heuristic |
|
|
77
|
+
| **Test Quality** | Assertion density, trivial ratio, edge cases | Heuristic |
|
|
78
|
+
| **Completeness** | Export density + test file coverage ratio | Heuristic |
|
|
79
|
+
| **Robustness** | Error handling pattern detection via AST | Heuristic |
|
|
174
80
|
|
|
175
|
-
|
|
81
|
+
> **Transparency**: 7 dimensions use instrumented analysis (real measurements). 5 use heuristic-based static analysis (directional signals via pattern detection). We label which is which.
|
|
176
82
|
|
|
177
|
-
|
|
83
|
+
### 3. Multi-Agent with Weighted Scoring Routing
|
|
178
84
|
|
|
179
|
-
|
|
85
|
+
Six specialized agents, each with real LLM tool-use execution:
|
|
180
86
|
|
|
181
|
-
|
|
87
|
+
| Agent | Primary Keywords (weight 3) | Tools |
|
|
88
|
+
|-------|----------------------------|-------|
|
|
89
|
+
| **Researcher** | research, analyze, explore, investigate | read_file, grep, glob |
|
|
90
|
+
| **Coder** | (default) | read_file, write_file, edit_file, bash |
|
|
91
|
+
| **Tester** | test, coverage, spec, mock | read_file, write_file, run_tests |
|
|
92
|
+
| **Reviewer** | review, quality, audit, lint | read_file, calculate_quality, grep |
|
|
93
|
+
| **Optimizer** | optimize, refactor, performance | read_file, write_file, analyze_complexity |
|
|
94
|
+
| **Planner** | plan, design, architect, decompose | read_file, grep, glob, codebase_map |
|
|
182
95
|
|
|
183
|
-
|
|
184
|
-
- Hardcoded metrics: 100% β **41.7%** β
|
|
185
|
-
- New analyzers: **4** (coverage, security, complexity, duplication)
|
|
186
|
-
- New tests: **62** (all passing)
|
|
187
|
-
- E2E tests: **6** (full pipeline validation)
|
|
96
|
+
Task routing scores each role against the task description. The highest-scoring role is selected; below threshold, it defaults to "coder". Each agent runs a multi-turn tool-use loop via the LLM protocol.
|
|
188
97
|
|
|
189
|
-
|
|
190
|
-
```javascript
|
|
191
|
-
// All hardcoded π±
|
|
192
|
-
dimensions: {
|
|
193
|
-
testCoverage: 80, // Fake
|
|
194
|
-
security: 100, // Fake
|
|
195
|
-
complexity: 90, // Fake
|
|
196
|
-
// ... all fake
|
|
197
|
-
}
|
|
198
|
-
```
|
|
98
|
+
### 4. Production Hardening
|
|
199
99
|
|
|
200
|
-
**
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
const complexity = await this.complexityAnalyzer.analyze(files);
|
|
206
|
-
|
|
207
|
-
dimensions: {
|
|
208
|
-
testCoverage: coverage.lines.percentage, // REAL
|
|
209
|
-
security: security.score, // REAL
|
|
210
|
-
complexity: complexity.score, // REAL
|
|
211
|
-
// ... 7 more real metrics
|
|
212
|
-
}
|
|
213
|
-
```
|
|
100
|
+
- **Error Recovery**: 9 error types with automatic retry strategies and exponential backoff
|
|
101
|
+
- **Checkpoint/Resume**: Ctrl+C saves state. `coco resume` continues from where you left off
|
|
102
|
+
- **Error Messages**: Every error includes an actionable suggestion for how to fix it
|
|
103
|
+
- **Convergence Analysis**: Detects oscillation, diminishing returns, and stuck patterns
|
|
104
|
+
- **AST Validation**: Parses and validates syntax before saving files
|
|
214
105
|
|
|
215
|
-
|
|
106
|
+
---
|
|
216
107
|
|
|
217
|
-
|
|
108
|
+
## Architecture: COCO Methodology
|
|
109
|
+
|
|
110
|
+
Four phases, each with its own executor:
|
|
218
111
|
|
|
219
112
|
```
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
113
|
+
CONVERGE ORCHESTRATE COMPLETE OUTPUT
|
|
114
|
+
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββ
|
|
115
|
+
β Gather β β Design β β Execute with β β Generate β
|
|
116
|
+
β reqs β βββΊ β architecture ββββΊβ quality ββββΊβ CI/CD, β
|
|
117
|
+
β + spec β β + backlog β β iteration β β docs β
|
|
118
|
+
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββ
|
|
119
|
+
β β
|
|
120
|
+
βββββββββββββββ
|
|
121
|
+
β Convergence β
|
|
122
|
+
β Loop β
|
|
123
|
+
βββββββββββββββ
|
|
228
124
|
```
|
|
229
125
|
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
---
|
|
233
|
-
|
|
234
|
-
## Development Roadmap
|
|
235
|
-
|
|
236
|
-
### Phase 1: Foundation β
(Weeks 1-4) - COMPLETE
|
|
237
|
-
|
|
238
|
-
- [x] Real quality scoring system
|
|
239
|
-
- [x] AST-aware generation pipeline
|
|
240
|
-
- [x] Smart iteration loop
|
|
241
|
-
- [x] Test failure analyzer
|
|
242
|
-
- [x] Build verifier
|
|
243
|
-
- [x] Import analyzer
|
|
126
|
+
### Technology Stack
|
|
244
127
|
|
|
245
|
-
|
|
128
|
+
| Component | Technology |
|
|
129
|
+
|-----------|-----------|
|
|
130
|
+
| Language | TypeScript (ESM, strict mode) |
|
|
131
|
+
| Runtime | Node.js 22+ |
|
|
132
|
+
| Testing | Vitest (4,000+ tests) |
|
|
133
|
+
| Linting | oxlint |
|
|
134
|
+
| Build | tsup |
|
|
135
|
+
| LLM Providers | Anthropic Claude, OpenAI GPT, Google Gemini, Ollama, LM Studio |
|
|
136
|
+
| Auth | OAuth 2.0 PKCE (browser + device code flow) |
|
|
246
137
|
|
|
247
|
-
|
|
138
|
+
---
|
|
248
139
|
|
|
249
|
-
|
|
250
|
-
- [x] Parallel agent coordinator
|
|
251
|
-
- [ ] Agent communication protocol
|
|
252
|
-
- [ ] Semantic code search
|
|
253
|
-
- [ ] Codebase knowledge graph
|
|
254
|
-
- [ ] Smart task decomposition
|
|
255
|
-
- [ ] Adaptive planning
|
|
140
|
+
## Comparison with Alternatives
|
|
256
141
|
|
|
257
|
-
**
|
|
142
|
+
| Feature | Cursor | Aider | Goose | Devin | **Coco** |
|
|
143
|
+
|---------|--------|-------|-------|-------|----------|
|
|
144
|
+
| Quality Convergence Loop | β | β | β | PartialΒΉ | **β
** |
|
|
145
|
+
| Multi-Dimensional Scoring | β | β | β | Internal | **12 dimensions** |
|
|
146
|
+
| Multi-Agent | β | β | Via MCP | β
| **β
(weighted routing)** |
|
|
147
|
+
| AST Validation | β | β | β | β
| **β
** |
|
|
148
|
+
| Error Recovery + Resume | β | β | β | β
| **β
(9 error types)** |
|
|
149
|
+
| Open Source | β | β
| β
| β | **β
** |
|
|
150
|
+
| Price | $20/mo | FreeΒ² | FreeΒ² | $500/mo | **FreeΒ²** |
|
|
258
151
|
|
|
259
|
-
|
|
152
|
+
ΒΉ Devin iterates internally but doesn't expose a configurable quality scoring system.
|
|
153
|
+
Β² Free beyond LLM API costs (bring your own keys).
|
|
260
154
|
|
|
261
|
-
|
|
262
|
-
-
|
|
263
|
-
-
|
|
264
|
-
-
|
|
265
|
-
- [ ] Framework detection
|
|
266
|
-
- [ ] Interactive dashboard
|
|
267
|
-
- [ ] Streaming output
|
|
268
|
-
- [ ] Performance optimization
|
|
155
|
+
### Where Coco Excels
|
|
156
|
+
- **Quality iteration**: The only open-source agent with a configurable multi-dimensional convergence loop
|
|
157
|
+
- **Transparency**: Every score is computed, not estimated. You can inspect the analyzers
|
|
158
|
+
- **Cost**: $0 subscription. ~$2-5 in API costs per project
|
|
269
159
|
|
|
270
|
-
|
|
160
|
+
### Where Coco is Behind
|
|
161
|
+
- **IDE integration**: CLI-only today. VS Code extension planned
|
|
162
|
+
- **Maturity**: Earlier stage than Cursor (millions of users) or Devin (2+ years production)
|
|
163
|
+
- **Speed**: Iteration takes time. For quick edits, use Cursor or Copilot
|
|
164
|
+
- **Language support**: Best with TypeScript/JavaScript. Python/Go experimental
|
|
271
165
|
|
|
272
166
|
---
|
|
273
167
|
|
|
274
|
-
##
|
|
275
|
-
|
|
276
|
-
| Feature | Cursor | Aider | Cody | Devin | **Coco** |
|
|
277
|
-
|---------|--------|-------|------|-------|----------|
|
|
278
|
-
| IDE Integration | β
| β | β
| β | π (planned Q2) |
|
|
279
|
-
| Real Quality Metrics | β | β | β | β
| β
(58% real) |
|
|
280
|
-
| Root Cause Analysis | β | β | β | β
| β
|
|
|
281
|
-
| Multi-Agent | β | β | β | β
| β
|
|
|
282
|
-
| AST Validation | β | β | β | β
| β
|
|
|
283
|
-
| Error Recovery | β | β | β | β
| β
|
|
|
284
|
-
| Checkpoint/Resume | β | β | β | β
| β
|
|
|
285
|
-
| Open Source | β | β
| β | β | β
|
|
|
286
|
-
| Price | $20/mo | Free | $9/mo | $500/mo | **Free** |
|
|
168
|
+
## CLI Experience
|
|
287
169
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
---
|
|
170
|
+
### Interactive REPL
|
|
291
171
|
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
172
|
+
```bash
|
|
173
|
+
coco # Opens interactive REPL
|
|
174
|
+
```
|
|
295
175
|
|
|
296
|
-
|
|
297
|
-
-
|
|
298
|
-
-
|
|
299
|
-
-
|
|
300
|
-
-
|
|
301
|
-
-
|
|
302
|
-
-
|
|
176
|
+
**Slash commands**:
|
|
177
|
+
- `/coco` β Toggle quality convergence mode (auto-test + iterate)
|
|
178
|
+
- `/tutorial` β Quick 5-step guide for new users
|
|
179
|
+
- `/init` β Initialize a new project
|
|
180
|
+
- `/plan` β Design architecture and backlog
|
|
181
|
+
- `/build` β Build with quality iteration
|
|
182
|
+
- `/task <desc>` β Execute a single task
|
|
183
|
+
- `/status` β Check project state
|
|
184
|
+
- `/diff` β Review changes
|
|
185
|
+
- `/commit` β Commit with message
|
|
186
|
+
- `/help` β See all commands
|
|
187
|
+
|
|
188
|
+
### Provider Support
|
|
189
|
+
|
|
190
|
+
| Provider | Auth Method | Models |
|
|
191
|
+
|----------|------------|--------|
|
|
192
|
+
| Anthropic | API key or OAuth PKCE | Claude Opus, Sonnet, Haiku |
|
|
193
|
+
| OpenAI | API key | GPT-4o, GPT-4, o1, o3 |
|
|
194
|
+
| Google | API key or gcloud ADC | Gemini Pro, Flash |
|
|
195
|
+
| Ollama | Local (no key) | Any local model |
|
|
196
|
+
| LM Studio | Local (no key) | Any GGUF model |
|
|
197
|
+
| Moonshot | API key | Kimi models |
|
|
303
198
|
|
|
304
199
|
---
|
|
305
200
|
|
|
306
|
-
##
|
|
201
|
+
## Development
|
|
307
202
|
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
- **Build**: tsup (fast ESM bundler)
|
|
203
|
+
```bash
|
|
204
|
+
git clone https://github.com/corbat/corbat-coco
|
|
205
|
+
cd corbat-coco
|
|
206
|
+
pnpm install
|
|
207
|
+
pnpm dev # Run in dev mode
|
|
208
|
+
pnpm test # Run 4,000+ tests
|
|
209
|
+
pnpm check # typecheck + lint + test
|
|
210
|
+
```
|
|
317
211
|
|
|
318
212
|
### Project Structure
|
|
319
213
|
|
|
320
214
|
```
|
|
321
215
|
corbat-coco/
|
|
322
216
|
βββ src/
|
|
323
|
-
β βββ agents/ # Multi-agent coordination
|
|
324
|
-
β βββ cli/ #
|
|
325
|
-
β βββ orchestrator/ #
|
|
326
|
-
β βββ phases/ # COCO phases (
|
|
327
|
-
β βββ quality/ #
|
|
328
|
-
β
|
|
329
|
-
β βββ
|
|
330
|
-
β βββ
|
|
331
|
-
β
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
βββ docs/ # Documentation
|
|
217
|
+
β βββ agents/ # Multi-agent coordination + weighted routing
|
|
218
|
+
β βββ cli/ # REPL, commands, input handling
|
|
219
|
+
β βββ orchestrator/ # Phase coordinator + recovery
|
|
220
|
+
β βββ phases/ # COCO phases (converge/orchestrate/complete/output)
|
|
221
|
+
β βββ quality/ # 12 quality analyzers
|
|
222
|
+
β βββ providers/ # 6 LLM providers + OAuth
|
|
223
|
+
β βββ tools/ # 20+ tool implementations
|
|
224
|
+
β βββ hooks/ # Lifecycle hooks (safety, lint, format, audit)
|
|
225
|
+
β βββ mcp/ # MCP server for external integration
|
|
226
|
+
β βββ config/ # Zod-validated configuration
|
|
227
|
+
βββ test/e2e/ # End-to-end pipeline tests
|
|
228
|
+
βββ docs/ # Architecture docs + ADRs
|
|
336
229
|
```
|
|
337
230
|
|
|
338
|
-
|
|
231
|
+
---
|
|
339
232
|
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
- **
|
|
343
|
-
- **
|
|
344
|
-
- **
|
|
345
|
-
- **
|
|
233
|
+
## Limitations (Honest)
|
|
234
|
+
|
|
235
|
+
- **TypeScript/JavaScript first**: Other languages have basic support
|
|
236
|
+
- **CLI-only**: No IDE integration yet
|
|
237
|
+
- **Heuristic analyzers**: 5 of 12 dimensions use pattern matching, not deep semantic analysis
|
|
238
|
+
- **Early stage**: Not yet battle-tested at enterprise scale
|
|
239
|
+
- **Iteration takes time**: 2-5 minutes per task with convergence loop
|
|
240
|
+
- **LLM-dependent**: Quality of generated code depends on the LLM you use
|
|
346
241
|
|
|
347
242
|
---
|
|
348
243
|
|
|
349
244
|
## Contributing
|
|
350
245
|
|
|
351
|
-
|
|
352
|
-
- Bug reports
|
|
353
|
-
-
|
|
354
|
-
-
|
|
246
|
+
MIT License. We welcome contributions:
|
|
247
|
+
- Bug reports and feature requests
|
|
248
|
+
- New quality analyzers
|
|
249
|
+
- Additional LLM provider integrations
|
|
355
250
|
- Documentation improvements
|
|
356
251
|
- Real-world usage feedback
|
|
357
252
|
|
|
358
253
|
See [CONTRIBUTING.md](./CONTRIBUTING.md).
|
|
359
254
|
|
|
360
|
-
### Development
|
|
361
|
-
|
|
362
|
-
```bash
|
|
363
|
-
# Clone repo
|
|
364
|
-
git clone https://github.com/corbat/corbat-coco
|
|
365
|
-
cd corbat-coco
|
|
366
|
-
|
|
367
|
-
# Install dependencies
|
|
368
|
-
pnpm install
|
|
369
|
-
|
|
370
|
-
# Run in dev mode
|
|
371
|
-
pnpm dev
|
|
372
|
-
|
|
373
|
-
# Run tests
|
|
374
|
-
pnpm test
|
|
375
|
-
|
|
376
|
-
# Run quality benchmark
|
|
377
|
-
pnpm benchmark
|
|
378
|
-
|
|
379
|
-
# Full check (typecheck + lint + test)
|
|
380
|
-
pnpm check
|
|
381
|
-
```
|
|
382
|
-
|
|
383
255
|
---
|
|
384
256
|
|
|
385
|
-
##
|
|
257
|
+
## About Corbat
|
|
386
258
|
|
|
387
|
-
|
|
259
|
+
Corbat-Coco is built by [Corbat](https://corbat.tech), a boutique technology consultancy. We believe AI coding tools should be transparent, measurable, and open source.
|
|
388
260
|
|
|
389
|
-
**
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
**A**: Similar approach (autonomous iteration, quality metrics, multi-agent), but Coco is:
|
|
394
|
-
- **Open source** (vs closed)
|
|
395
|
-
- **Bring your own API keys** (vs $500/mo subscription)
|
|
396
|
-
- **More transparent** (you can inspect every metric)
|
|
397
|
-
- **Earlier stage** (Devin has 2+ years of production usage)
|
|
398
|
-
|
|
399
|
-
### Q: Why are 41.7% of metrics still hardcoded?
|
|
400
|
-
|
|
401
|
-
**A**: These are **safe defaults**, not fake metrics:
|
|
402
|
-
- `style: 100` when no linter is configured (legitimate default)
|
|
403
|
-
- `correctness`, `completeness`, `robustness`, `testQuality`, `documentation` are pending Week 2-4 implementations
|
|
404
|
-
|
|
405
|
-
We're committed to reaching **0% hardcoded** by end of Phase 1 (Week 4).
|
|
406
|
-
|
|
407
|
-
### Q: Can I use this with my company's code?
|
|
408
|
-
|
|
409
|
-
**A**: Yes, but:
|
|
410
|
-
- Code stays on your machine (not sent to third parties)
|
|
411
|
-
- LLM calls go to your chosen provider (Anthropic/OpenAI/Google)
|
|
412
|
-
- Review generated code before committing
|
|
413
|
-
- Start with non-critical projects
|
|
414
|
-
|
|
415
|
-
### Q: Does Coco replace human developers?
|
|
416
|
-
|
|
417
|
-
**A**: No. Coco is a **force multiplier**, not a replacement:
|
|
418
|
-
- Best for boilerplate, CRUD APIs, repetitive tasks
|
|
419
|
-
- Requires human review and validation
|
|
420
|
-
- Struggles with novel algorithms and complex business logic
|
|
421
|
-
- Think "junior developer with infinite patience"
|
|
422
|
-
|
|
423
|
-
### Q: What's the roadmap to 9.0/10?
|
|
424
|
-
|
|
425
|
-
**A**: See [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md) for the complete 12-week plan.
|
|
261
|
+
**Links**:
|
|
262
|
+
- [GitHub](https://github.com/corbat/corbat-coco)
|
|
263
|
+
- [corbat.tech](https://corbat.tech)
|
|
426
264
|
|
|
427
265
|
---
|
|
428
266
|
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
MIT License - see [LICENSE](./LICENSE).
|
|
432
|
-
|
|
433
|
-
---
|
|
434
|
-
|
|
435
|
-
## Credits
|
|
436
|
-
|
|
437
|
-
**Built with**:
|
|
438
|
-
- TypeScript + Node.js
|
|
439
|
-
- Anthropic Claude, OpenAI GPT-4, Google Gemini
|
|
440
|
-
- Vitest, oxc, tree-sitter, c8
|
|
441
|
-
|
|
442
|
-
**Made with π₯₯ by developers who are tired of debugging AI code.**
|
|
443
|
-
|
|
444
|
-
---
|
|
445
|
-
|
|
446
|
-
## Links
|
|
447
|
-
|
|
448
|
-
- **GitHub**: [github.com/corbat/corbat-coco](https://github.com/corbat/corbat-coco)
|
|
449
|
-
- **Documentation**: [docs.corbat.dev](https://docs.corbat.dev)
|
|
450
|
-
- **Roadmap**: [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md)
|
|
451
|
-
- **Week 1 Report**: [WEEK_1_COMPLETE.md](./WEEK_1_COMPLETE.md)
|
|
452
|
-
- **Discord**: [discord.gg/corbat](https://discord.gg/corbat) (coming soon)
|
|
453
|
-
|
|
454
|
-
---
|
|
455
|
-
|
|
456
|
-
**Status**: π§ Week 1 Complete, Weeks 2-12 In Progress
|
|
457
|
-
|
|
458
|
-
**Next Milestone**: Phase 1 Complete (Week 4) - Target Score 7.5/10
|
|
459
|
-
|
|
460
|
-
**Current Score**: ~7.0/10 (honest, verifiable)
|
|
461
|
-
|
|
462
|
-
**Honest motto**: "We're not #1 yet, but we're getting there. One real metric at a time." π₯₯
|
|
267
|
+
**Made with π₯₯ by developers who measure before they ship.**
|