@corbat-tech/coco 1.0.2 β†’ 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,462 +1,267 @@
1
- # πŸ₯₯ Corbat-Coco: Autonomous Coding Agent with Real Quality Iteration
1
+ # πŸ₯₯ Corbat-Coco
2
2
 
3
- **The AI coding agent that doesn't just generate codeβ€”it iterates until it's actually good.**
3
+ **The open-source coding agent that iterates until your code is actually good.**
4
4
 
5
5
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.3-blue)](https://www.typescriptlang.org/)
6
6
  [![Node.js](https://img.shields.io/badge/Node.js-22+-green)](https://nodejs.org/)
7
7
  [![License](https://img.shields.io/badge/License-MIT-yellow)](./LICENSE)
8
- [![Tests](https://img.shields.io/badge/Tests-3909%20passing-brightgreen)](./)
8
+ [![Tests](https://img.shields.io/badge/Tests-4000%2B%20passing-brightgreen)](./)
9
+ [![Coverage](https://img.shields.io/badge/Coverage-80%25%2B-brightgreen)](./)
9
10
 
10
11
  ---
11
12
 
12
- ## What Makes Coco Different
13
-
14
- Most AI coding assistants generate code and hope for the best. Coco is different:
15
-
16
- 1. **Generates** code with your favorite LLM (Claude, GPT-4, Gemini)
17
- 2. **Measures** quality with real metrics (coverage, security, complexity)
18
- 3. **Analyzes** test failures to find root causes
19
- 4. **Fixes** issues with targeted changes
20
- 5. **Repeats** until quality reaches 85+ (senior engineer level)
21
-
22
- All autonomous. All verifiable. All open source.
23
-
24
- ---
25
-
26
- ## The Problem with AI Code Generation
27
-
28
- Current AI assistants:
29
- - Generate code that looks good but fails in production
30
- - Don't run tests or validate output
31
- - Make you iterate manually
32
- - Can't coordinate complex tasks
33
-
34
- **Result**: You spend hours debugging AI-generated code.
35
-
36
- ---
37
-
38
- ## How Coco Solves It
39
-
40
- ### 1. Real Quality Measurement
41
-
42
- Coco measures 12 dimensions of code quality:
43
- - **Test Coverage**: Runs your tests with c8/v8 instrumentation (not estimated)
44
- - **Security**: Scans for vulnerabilities with npm audit + OWASP checks
45
- - **Complexity**: Calculates cyclomatic complexity from AST
46
- - **Correctness**: Validates tests pass + builds succeed
47
- - **Maintainability**: Real metrics from code analysis
48
- - ... and 7 more
13
+ ## The Problem
49
14
 
50
- **No fake scores. No hardcoded values. Real metrics.**
51
-
52
- Current state: **58.3% real measurements** (up from 0%), with 41.7% still using safe defaults.
53
-
54
- ### 2. Smart Iteration Loop
55
-
56
- When tests fail, Coco:
57
- - Parses stack traces to find the error location
58
- - Reads surrounding code for context
59
- - Diagnoses root cause (not just symptoms)
60
- - Generates targeted fix (not rewriting entire file)
61
- - Re-validates and repeats if needed
62
-
63
- **Target**: 70%+ of failures fixed in first iteration.
64
-
65
- ### 3. Multi-Agent Coordination
66
-
67
- Complex tasks are decomposed and executed by specialized agents:
68
- - **Researcher**: Explores codebase, finds patterns
69
- - **Coder**: Writes production code
70
- - **Tester**: Generates comprehensive tests
71
- - **Reviewer**: Identifies issues
72
- - **Optimizer**: Reduces complexity
73
-
74
- Agents work in parallel where possible, coordinate when needed.
75
-
76
- ### 4. AST-Aware Validation
77
-
78
- Before saving any file:
79
- - Parses AST to validate syntax
80
- - Checks TypeScript semantics
81
- - Analyzes imports
82
- - Verifies build succeeds
83
-
84
- **Result**: Zero broken builds from AI edits.
85
-
86
- ### 5. Production Hardening
87
-
88
- - **Error Recovery**: Auto-recovers from 8 error types (syntax, timeout, dependencies, etc.)
89
- - **Checkpoint/Resume**: Ctrl+C saves state, resume anytime
90
- - **Resource Limits**: Prevents runaway costs with configurable quotas
91
- - **Streaming Output**: Real-time feedback as code generates
92
-
93
- ---
15
+ AI coding assistants generate code and hope for the best. You paste it in, tests fail, you iterate manually, you lose an hour. Studies show **67% of AI-generated PRs get rejected** on first review.
94
16
 
95
- ## Architecture
17
+ ## The Solution
96
18
 
97
- ### COCO Methodology (4 Phases)
98
-
99
- 1. **Converge**: Gather requirements, create specification
100
- 2. **Orchestrate**: Design architecture, create task backlog
101
- 3. **Complete**: Execute tasks with quality iteration
102
- 4. **Output**: Generate CI/CD, docs, deployment config
103
-
104
- ### Quality Iteration Loop
19
+ Coco doesn't stop at code generation. It runs your tests, measures quality across 12 dimensions, diagnoses failures, generates targeted fixes, and repeats β€” autonomously β€” until quality reaches a configurable threshold (default: 85/100).
105
20
 
106
21
  ```
107
- Generate Code β†’ Validate AST β†’ Run Tests β†’ Analyze Failures
108
- ↑ ↓
109
- ←────────── Generate Targeted Fixes β†β”€β”€β”€β”€β”€β”€β”€β”˜
22
+ Generate β†’ Test β†’ Measure β†’ Diagnose β†’ Fix β†’ Repeat
23
+ ↓
24
+ Quality β‰₯ 85? β†’ Done βœ…
110
25
  ```
111
26
 
112
- Stops when:
113
- - Quality β‰₯ 85/100 (minimum)
114
- - Score stable for 2+ iterations
115
- - Tests all passing
116
- - Or max 10 iterations reached
117
-
118
- ### Real Analyzers
119
-
120
- | Analyzer | What It Measures | Data Source |
121
- |----------|------------------|-------------|
122
- | Coverage | Lines, branches, functions, statements | c8/v8 instrumentation |
123
- | Security | Vulnerabilities, dangerous patterns | npm audit + static analysis |
124
- | Complexity | Cyclomatic complexity, maintainability | AST traversal |
125
- | Duplication | Code similarity, redundancy | Token-based comparison |
126
- | Build | Compilation success | tsc/build execution |
127
- | Import | Missing dependencies, circular deps | AST + package.json |
27
+ **This is the Quality Convergence Loop.** No other open-source coding agent does this.
128
28
 
129
29
  ---
130
30
 
131
31
  ## Quick Start
132
32
 
133
- ### Installation
134
-
135
33
  ```bash
136
34
  npm install -g corbat-coco
35
+ coco init # Configure your LLM provider
36
+ coco "Build a REST API with authentication" # That's it
137
37
  ```
138
38
 
139
- ### Configuration
39
+ Coco will generate code, run tests, iterate until quality passes, and generate CI/CD + docs.
140
40
 
141
- ```bash
142
- coco init
143
- ```
41
+ ---
144
42
 
145
- Follow prompts to configure:
146
- - AI Provider (Anthropic, OpenAI, Google)
147
- - API Key
148
- - Project preferences
43
+ ## What Makes Coco Different
149
44
 
150
- ### Basic Usage
45
+ ### 1. Quality Convergence Loop (Unique Differentiator)
151
46
 
152
- ```bash
153
- coco "Build a REST API with JWT authentication"
154
- ```
47
+ Other agents generate code once. Coco iterates:
155
48
 
156
- That's it. Coco will:
157
- 1. Ask clarifying questions
158
- 2. Design architecture
159
- 3. Generate code + tests
160
- 4. Iterate until quality β‰₯ 85
161
- 5. Generate CI/CD + docs
49
+ | Iteration | Score | What Happened |
50
+ |-----------|-------|---------------|
51
+ | 1 | 52/100 | Generated code, 3 tests failing |
52
+ | 2 | 71/100 | Fixed test failures, found security issue |
53
+ | 3 | 84/100 | Fixed security, improved coverage |
54
+ | 4 | 91/100 | All tests pass, quality converged βœ… |
162
55
 
163
- ### Resume Interrupted Session
56
+ The loop stops when:
57
+ - Score β‰₯ 85/100 (configurable)
58
+ - Score stabilized (delta < 2 between iterations)
59
+ - All critical issues resolved
60
+ - Or max 10 iterations reached
164
61
 
165
- ```bash
166
- coco resume
167
- ```
62
+ ### 2. 12-Dimension Quality Scoring
168
63
 
169
- ### Check Quality of Existing Code
64
+ Every iteration measures code across 12 real dimensions:
170
65
 
171
- ```bash
172
- coco quality ./src
173
- ```
66
+ | Dimension | Method | Type |
67
+ |-----------|--------|------|
68
+ | **Test Coverage** | c8/v8 instrumentation | Instrumented |
69
+ | **Security** | Pattern matching + optional Snyk | Instrumented |
70
+ | **Complexity** | Cyclomatic complexity via AST | Instrumented |
71
+ | **Duplication** | Line-based similarity detection | Instrumented |
72
+ | **Correctness** | Test pass rate + build verification | Instrumented |
73
+ | **Style** | oxlint/eslint/biome integration | Instrumented |
74
+ | **Documentation** | JSDoc coverage analysis | Instrumented |
75
+ | **Readability** | AST: naming quality, function length, nesting depth | Heuristic |
76
+ | **Maintainability** | AST: file length, coupling, function count | Heuristic |
77
+ | **Test Quality** | Assertion density, trivial ratio, edge cases | Heuristic |
78
+ | **Completeness** | Export density + test file coverage ratio | Heuristic |
79
+ | **Robustness** | Error handling pattern detection via AST | Heuristic |
174
80
 
175
- ---
81
+ > **Transparency**: 7 dimensions use instrumented analysis (real measurements). 5 use heuristic-based static analysis (directional signals via pattern detection). We label which is which.
176
82
 
177
- ## Real Results
83
+ ### 3. Multi-Agent with Weighted Scoring Routing
178
84
 
179
- ### Week 1 Achievements βœ…
85
+ Six specialized agents, each with real LLM tool-use execution:
180
86
 
181
- **Goal**: Replace fake metrics with real measurements
87
+ | Agent | Primary Keywords (weight 3) | Tools |
88
+ |-------|----------------------------|-------|
89
+ | **Researcher** | research, analyze, explore, investigate | read_file, grep, glob |
90
+ | **Coder** | (default) | read_file, write_file, edit_file, bash |
91
+ | **Tester** | test, coverage, spec, mock | read_file, write_file, run_tests |
92
+ | **Reviewer** | review, quality, audit, lint | read_file, calculate_quality, grep |
93
+ | **Optimizer** | optimize, refactor, performance | read_file, write_file, analyze_complexity |
94
+ | **Planner** | plan, design, architect, decompose | read_file, grep, glob, codebase_map |
182
95
 
183
- **Results**:
184
- - Hardcoded metrics: 100% β†’ **41.7%** βœ…
185
- - New analyzers: **4** (coverage, security, complexity, duplication)
186
- - New tests: **62** (all passing)
187
- - E2E tests: **6** (full pipeline validation)
96
+ Task routing scores each role against the task description. The highest-scoring role is selected; below threshold, it defaults to "coder". Each agent runs a multi-turn tool-use loop via the LLM protocol.
188
97
 
189
- **Before**:
190
- ```javascript
191
- // All hardcoded 😱
192
- dimensions: {
193
- testCoverage: 80, // Fake
194
- security: 100, // Fake
195
- complexity: 90, // Fake
196
- // ... all fake
197
- }
198
- ```
98
+ ### 4. Production Hardening
199
99
 
200
- **After**:
201
- ```typescript
202
- // Real measurements βœ…
203
- const coverage = await this.coverageAnalyzer.analyze(files);
204
- const security = await this.securityScanner.scan(files);
205
- const complexity = await this.complexityAnalyzer.analyze(files);
206
-
207
- dimensions: {
208
- testCoverage: coverage.lines.percentage, // REAL
209
- security: security.score, // REAL
210
- complexity: complexity.score, // REAL
211
- // ... 7 more real metrics
212
- }
213
- ```
100
+ - **Error Recovery**: 9 error types with automatic retry strategies and exponential backoff
101
+ - **Checkpoint/Resume**: Ctrl+C saves state. `coco resume` continues from where you left off
102
+ - **Error Messages**: Every error includes an actionable suggestion for how to fix it
103
+ - **Convergence Analysis**: Detects oscillation, diminishing returns, and stuck patterns
104
+ - **AST Validation**: Parses and validates syntax before saving files
214
105
 
215
- ### Benchmark Results
106
+ ---
216
107
 
217
- Running Coco on itself (corbat-coco codebase):
108
+ ## Architecture: COCO Methodology
109
+
110
+ Four phases, each with its own executor:
218
111
 
219
112
  ```
220
- ⏱️ Duration: 19.8s
221
- πŸ“Š Overall Score: 60/100
222
- πŸ“ˆ Real Metrics: 7/12 (58.3%)
223
- πŸ›‘οΈ Security: 0 critical issues
224
- πŸ“ Complexity: 100/100 (low)
225
- πŸ”„ Duplication: 72.5/100 (27.5% duplication)
226
- πŸ“„ Issues Found: 311
227
- πŸ’‘ Suggestions: 3
113
+ CONVERGE ORCHESTRATE COMPLETE OUTPUT
114
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
115
+ β”‚ Gather β”‚ β”‚ Design β”‚ β”‚ Execute with β”‚ β”‚ Generate β”‚
116
+ β”‚ reqs β”‚ ──► β”‚ architecture │──►│ quality │──►│ CI/CD, β”‚
117
+ β”‚ + spec β”‚ β”‚ + backlog β”‚ β”‚ iteration β”‚ β”‚ docs β”‚
118
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
119
+ ↑ ↓
120
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
121
+ β”‚ Convergence β”‚
122
+ β”‚ Loop β”‚
123
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
228
124
  ```
229
125
 
230
- **Validation**: βœ… Target met (≀42% hardcoded)
231
-
232
- ---
233
-
234
- ## Development Roadmap
235
-
236
- ### Phase 1: Foundation βœ… (Weeks 1-4) - COMPLETE
237
-
238
- - [x] Real quality scoring system
239
- - [x] AST-aware generation pipeline
240
- - [x] Smart iteration loop
241
- - [x] Test failure analyzer
242
- - [x] Build verifier
243
- - [x] Import analyzer
126
+ ### Technology Stack
244
127
 
245
- **Current Score**: ~7.0/10
128
+ | Component | Technology |
129
+ |-----------|-----------|
130
+ | Language | TypeScript (ESM, strict mode) |
131
+ | Runtime | Node.js 22+ |
132
+ | Testing | Vitest (4,000+ tests) |
133
+ | Linting | oxlint |
134
+ | Build | tsup |
135
+ | LLM Providers | Anthropic Claude, OpenAI GPT, Google Gemini, Ollama, LM Studio |
136
+ | Auth | OAuth 2.0 PKCE (browser + device code flow) |
246
137
 
247
- ### Phase 2: Intelligence (Weeks 5-8) - IN PROGRESS
138
+ ---
248
139
 
249
- - [x] Agent execution engine
250
- - [x] Parallel agent coordinator
251
- - [ ] Agent communication protocol
252
- - [ ] Semantic code search
253
- - [ ] Codebase knowledge graph
254
- - [ ] Smart task decomposition
255
- - [ ] Adaptive planning
140
+ ## Comparison with Alternatives
256
141
 
257
- **Target Score**: 8.5/10
142
+ | Feature | Cursor | Aider | Goose | Devin | **Coco** |
143
+ |---------|--------|-------|-------|-------|----------|
144
+ | Quality Convergence Loop | ❌ | ❌ | ❌ | PartialΒΉ | **βœ…** |
145
+ | Multi-Dimensional Scoring | ❌ | ❌ | ❌ | Internal | **12 dimensions** |
146
+ | Multi-Agent | ❌ | ❌ | Via MCP | βœ… | **βœ… (weighted routing)** |
147
+ | AST Validation | ❌ | ❌ | ❌ | βœ… | **βœ…** |
148
+ | Error Recovery + Resume | ❌ | ❌ | ❌ | βœ… | **βœ… (9 error types)** |
149
+ | Open Source | ❌ | βœ… | βœ… | ❌ | **βœ…** |
150
+ | Price | $20/mo | FreeΒ² | FreeΒ² | $500/mo | **FreeΒ²** |
258
151
 
259
- ### Phase 3: Excellence (Weeks 9-12) - IN PROGRESS
152
+ ΒΉ Devin iterates internally but doesn't expose a configurable quality scoring system.
153
+ Β² Free beyond LLM API costs (bring your own keys).
260
154
 
261
- - [x] Error recovery system
262
- - [x] Progress tracking & interruption
263
- - [ ] Resource limits & quotas
264
- - [ ] Multi-language AST support
265
- - [ ] Framework detection
266
- - [ ] Interactive dashboard
267
- - [ ] Streaming output
268
- - [ ] Performance optimization
155
+ ### Where Coco Excels
156
+ - **Quality iteration**: The only open-source agent with a configurable multi-dimensional convergence loop
157
+ - **Transparency**: Every score is computed, not estimated. You can inspect the analyzers
158
+ - **Cost**: $0 subscription. ~$2-5 in API costs per project
269
159
 
270
- **Target Score**: 9.0+/10
160
+ ### Where Coco is Behind
161
+ - **IDE integration**: CLI-only today. VS Code extension planned
162
+ - **Maturity**: Earlier stage than Cursor (millions of users) or Devin (2+ years production)
163
+ - **Speed**: Iteration takes time. For quick edits, use Cursor or Copilot
164
+ - **Language support**: Best with TypeScript/JavaScript. Python/Go experimental
271
165
 
272
166
  ---
273
167
 
274
- ## Honest Comparison with Alternatives
275
-
276
- | Feature | Cursor | Aider | Cody | Devin | **Coco** |
277
- |---------|--------|-------|------|-------|----------|
278
- | IDE Integration | βœ… | ❌ | βœ… | ❌ | πŸ”„ (planned Q2) |
279
- | Real Quality Metrics | ❌ | ❌ | ❌ | βœ… | βœ… (58% real) |
280
- | Root Cause Analysis | ❌ | ❌ | ❌ | βœ… | βœ… |
281
- | Multi-Agent | ❌ | ❌ | ❌ | βœ… | βœ… |
282
- | AST Validation | ❌ | ❌ | ❌ | βœ… | βœ… |
283
- | Error Recovery | ❌ | ❌ | ❌ | βœ… | βœ… |
284
- | Checkpoint/Resume | ❌ | ❌ | ❌ | βœ… | βœ… |
285
- | Open Source | ❌ | βœ… | ❌ | ❌ | βœ… |
286
- | Price | $20/mo | Free | $9/mo | $500/mo | **Free** |
168
+ ## CLI Experience
287
169
 
288
- **Verdict**: Coco offers Devin-level autonomy at Aider's price (free).
289
-
290
- ---
170
+ ### Interactive REPL
291
171
 
292
- ## Current Limitations
293
-
294
- We believe in honesty:
172
+ ```bash
173
+ coco # Opens interactive REPL
174
+ ```
295
175
 
296
- - **Languages**: Best with TypeScript/JavaScript. Python/Go/Rust support is experimental.
297
- - **Metrics**: 58.3% real, 41.7% use safe defaults (improving to 100% real by Week 4)
298
- - **IDE Integration**: CLI-first. VS Code extension coming Q2 2026.
299
- - **Learning Curve**: More complex than Copilot. Power tool, not autocomplete.
300
- - **Cost**: Uses your LLM API keys. ~$2-5 per project with Claude.
301
- - **Speed**: Iteration takes time. Not for quick edits (use Cursor for that).
302
- - **Multi-Agent**: Implemented but not yet battle-tested at scale.
176
+ **Slash commands**:
177
+ - `/coco` β€” Toggle quality convergence mode (auto-test + iterate)
178
+ - `/tutorial` β€” Quick 5-step guide for new users
179
+ - `/init` β€” Initialize a new project
180
+ - `/plan` β€” Design architecture and backlog
181
+ - `/build` β€” Build with quality iteration
182
+ - `/task <desc>` β€” Execute a single task
183
+ - `/status` β€” Check project state
184
+ - `/diff` β€” Review changes
185
+ - `/commit` β€” Commit with message
186
+ - `/help` β€” See all commands
187
+
188
+ ### Provider Support
189
+
190
+ | Provider | Auth Method | Models |
191
+ |----------|------------|--------|
192
+ | Anthropic | API key or OAuth PKCE | Claude Opus, Sonnet, Haiku |
193
+ | OpenAI | API key | GPT-4o, GPT-4, o1, o3 |
194
+ | Google | API key or gcloud ADC | Gemini Pro, Flash |
195
+ | Ollama | Local (no key) | Any local model |
196
+ | LM Studio | Local (no key) | Any GGUF model |
197
+ | Moonshot | API key | Kimi models |
303
198
 
304
199
  ---
305
200
 
306
- ## Technical Details
201
+ ## Development
307
202
 
308
- ### Stack
309
-
310
- - **Language**: TypeScript (ESM, strict mode)
311
- - **Runtime**: Node.js 22+
312
- - **Package Manager**: pnpm
313
- - **Testing**: Vitest (3,909 tests)
314
- - **Linting**: oxlint (fast, minimal config)
315
- - **Formatting**: oxfmt
316
- - **Build**: tsup (fast ESM bundler)
203
+ ```bash
204
+ git clone https://github.com/corbat/corbat-coco
205
+ cd corbat-coco
206
+ pnpm install
207
+ pnpm dev # Run in dev mode
208
+ pnpm test # Run 4,000+ tests
209
+ pnpm check # typecheck + lint + test
210
+ ```
317
211
 
318
212
  ### Project Structure
319
213
 
320
214
  ```
321
215
  corbat-coco/
322
216
  β”œβ”€β”€ src/
323
- β”‚ β”œβ”€β”€ agents/ # Multi-agent coordination
324
- β”‚ β”œβ”€β”€ cli/ # CLI commands
325
- β”‚ β”œβ”€β”€ orchestrator/ # Central coordinator
326
- β”‚ β”œβ”€β”€ phases/ # COCO phases (4 phases)
327
- β”‚ β”œβ”€β”€ quality/ # Quality analyzers
328
- β”‚ β”‚ └── analyzers/ # Coverage, security, complexity, etc.
329
- β”‚ β”œβ”€β”€ providers/ # LLM providers (Anthropic, OpenAI, Google)
330
- β”‚ β”œβ”€β”€ tools/ # Tool implementations
331
- β”‚ └── types/ # Type definitions
332
- β”œβ”€β”€ test/
333
- β”‚ β”œβ”€β”€ e2e/ # End-to-end tests
334
- β”‚ └── benchmarks/ # Performance benchmarks
335
- └── docs/ # Documentation
217
+ β”‚ β”œβ”€β”€ agents/ # Multi-agent coordination + weighted routing
218
+ β”‚ β”œβ”€β”€ cli/ # REPL, commands, input handling
219
+ β”‚ β”œβ”€β”€ orchestrator/ # Phase coordinator + recovery
220
+ β”‚ β”œβ”€β”€ phases/ # COCO phases (converge/orchestrate/complete/output)
221
+ β”‚ β”œβ”€β”€ quality/ # 12 quality analyzers
222
+ β”‚ β”œβ”€β”€ providers/ # 6 LLM providers + OAuth
223
+ β”‚ β”œβ”€β”€ tools/ # 20+ tool implementations
224
+ β”‚ β”œβ”€β”€ hooks/ # Lifecycle hooks (safety, lint, format, audit)
225
+ β”‚ β”œβ”€β”€ mcp/ # MCP server for external integration
226
+ β”‚ └── config/ # Zod-validated configuration
227
+ β”œβ”€β”€ test/e2e/ # End-to-end pipeline tests
228
+ └── docs/ # Architecture docs + ADRs
336
229
  ```
337
230
 
338
- ### Quality Thresholds
231
+ ---
339
232
 
340
- - **Minimum Score**: 85/100 (senior-level)
341
- - **Target Score**: 95/100 (excellent)
342
- - **Test Coverage**: 80%+ required
343
- - **Security**: 100/100 (zero tolerance)
344
- - **Max Iterations**: 10 per task
345
- - **Convergence**: Delta < 2 between iterations
233
+ ## Limitations (Honest)
234
+
235
+ - **TypeScript/JavaScript first**: Other languages have basic support
236
+ - **CLI-only**: No IDE integration yet
237
+ - **Heuristic analyzers**: 5 of 12 dimensions use pattern matching, not deep semantic analysis
238
+ - **Early stage**: Not yet battle-tested at enterprise scale
239
+ - **Iteration takes time**: 2-5 minutes per task with convergence loop
240
+ - **LLM-dependent**: Quality of generated code depends on the LLM you use
346
241
 
347
242
  ---
348
243
 
349
244
  ## Contributing
350
245
 
351
- Coco is open source (MIT). We welcome:
352
- - Bug reports
353
- - Feature requests
354
- - Pull requests
246
+ MIT License. We welcome contributions:
247
+ - Bug reports and feature requests
248
+ - New quality analyzers
249
+ - Additional LLM provider integrations
355
250
  - Documentation improvements
356
251
  - Real-world usage feedback
357
252
 
358
253
  See [CONTRIBUTING.md](./CONTRIBUTING.md).
359
254
 
360
- ### Development
361
-
362
- ```bash
363
- # Clone repo
364
- git clone https://github.com/corbat/corbat-coco
365
- cd corbat-coco
366
-
367
- # Install dependencies
368
- pnpm install
369
-
370
- # Run in dev mode
371
- pnpm dev
372
-
373
- # Run tests
374
- pnpm test
375
-
376
- # Run quality benchmark
377
- pnpm benchmark
378
-
379
- # Full check (typecheck + lint + test)
380
- pnpm check
381
- ```
382
-
383
255
  ---
384
256
 
385
- ## FAQ
257
+ ## About Corbat
386
258
 
387
- ### Q: Is Coco production-ready?
259
+ Corbat-Coco is built by [Corbat](https://corbat.tech), a boutique technology consultancy. We believe AI coding tools should be transparent, measurable, and open source.
388
260
 
389
- **A**: Partially. The quality scoring system (Week 1) is production-ready and thoroughly tested. Multi-agent coordination (Week 5-8) is implemented but needs more real-world validation. Use for internal projects first.
390
-
391
- ### Q: How does Coco compare to Devin?
392
-
393
- **A**: Similar approach (autonomous iteration, quality metrics, multi-agent), but Coco is:
394
- - **Open source** (vs closed)
395
- - **Bring your own API keys** (vs $500/mo subscription)
396
- - **More transparent** (you can inspect every metric)
397
- - **Earlier stage** (Devin has 2+ years of production usage)
398
-
399
- ### Q: Why are 41.7% of metrics still hardcoded?
400
-
401
- **A**: These are **safe defaults**, not fake metrics:
402
- - `style: 100` when no linter is configured (legitimate default)
403
- - `correctness`, `completeness`, `robustness`, `testQuality`, `documentation` are pending Week 2-4 implementations
404
-
405
- We're committed to reaching **0% hardcoded** by end of Phase 1 (Week 4).
406
-
407
- ### Q: Can I use this with my company's code?
408
-
409
- **A**: Yes, but:
410
- - Code stays on your machine (not sent to third parties)
411
- - LLM calls go to your chosen provider (Anthropic/OpenAI/Google)
412
- - Review generated code before committing
413
- - Start with non-critical projects
414
-
415
- ### Q: Does Coco replace human developers?
416
-
417
- **A**: No. Coco is a **force multiplier**, not a replacement:
418
- - Best for boilerplate, CRUD APIs, repetitive tasks
419
- - Requires human review and validation
420
- - Struggles with novel algorithms and complex business logic
421
- - Think "junior developer with infinite patience"
422
-
423
- ### Q: What's the roadmap to 9.0/10?
424
-
425
- **A**: See [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md) for the complete 12-week plan.
261
+ **Links**:
262
+ - [GitHub](https://github.com/corbat/corbat-coco)
263
+ - [corbat.tech](https://corbat.tech)
426
264
 
427
265
  ---
428
266
 
429
- ## License
430
-
431
- MIT License - see [LICENSE](./LICENSE).
432
-
433
- ---
434
-
435
- ## Credits
436
-
437
- **Built with**:
438
- - TypeScript + Node.js
439
- - Anthropic Claude, OpenAI GPT-4, Google Gemini
440
- - Vitest, oxc, tree-sitter, c8
441
-
442
- **Made with πŸ₯₯ by developers who are tired of debugging AI code.**
443
-
444
- ---
445
-
446
- ## Links
447
-
448
- - **GitHub**: [github.com/corbat/corbat-coco](https://github.com/corbat/corbat-coco)
449
- - **Documentation**: [docs.corbat.dev](https://docs.corbat.dev)
450
- - **Roadmap**: [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md)
451
- - **Week 1 Report**: [WEEK_1_COMPLETE.md](./WEEK_1_COMPLETE.md)
452
- - **Discord**: [discord.gg/corbat](https://discord.gg/corbat) (coming soon)
453
-
454
- ---
455
-
456
- **Status**: 🚧 Week 1 Complete, Weeks 2-12 In Progress
457
-
458
- **Next Milestone**: Phase 1 Complete (Week 4) - Target Score 7.5/10
459
-
460
- **Current Score**: ~7.0/10 (honest, verifiable)
461
-
462
- **Honest motto**: "We're not #1 yet, but we're getting there. One real metric at a time." πŸ₯₯
267
+ **Made with πŸ₯₯ by developers who measure before they ship.**