npm - @corbat-tech/coco - Versions diffs - 1.0.2 → 1.2.0 - Mend

@corbat-tech/coco 1.0.2 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -1,462 +1,253 @@
-# 🥥 Corbat-Coco: Autonomous Coding Agent with Real Quality Iteration
+<p align="center">
+  <img src="https://img.shields.io/badge/v1.2.0-stable-blueviolet?style=for-the-badge" alt="Version">
+  <img src="https://img.shields.io/badge/TypeScript-5.7-3178c6?style=for-the-badge&logo=typescript&logoColor=white" alt="TypeScript">
+  <img src="https://img.shields.io/badge/Node.js-22+-339933?style=for-the-badge&logo=nodedotjs&logoColor=white" alt="Node.js">
+  <img src="https://img.shields.io/badge/License-MIT-f5c542?style=for-the-badge" alt="MIT License">
+  <img src="https://img.shields.io/badge/Tests-4350%2B_passing-22c55e?style=for-the-badge" alt="Tests">
+</p>
-**The AI coding agent that doesn't just generate code—it iterates until it's actually good.**
+<h1 align="center">🥥 Corbat-Coco</h1>
-[![TypeScript](https://img.shields.io/badge/TypeScript-5.3-blue)](https://www.typescriptlang.org/)
-[![Node.js](https://img.shields.io/badge/Node.js-22+-green)](https://nodejs.org/)
-[![License](https://img.shields.io/badge/License-MIT-yellow)](./LICENSE)
-[![Tests](https://img.shields.io/badge/Tests-3909%20passing-brightgreen)](./)
+<p align="center">
+  <strong>The open-source coding agent that iterates on your code until it's actually production-ready.</strong>
+</p>
----
-## What Makes Coco Different
-Most AI coding assistants generate code and hope for the best. Coco is different:
-1. **Generates** code with your favorite LLM (Claude, GPT-4, Gemini)
-2. **Measures** quality with real metrics (coverage, security, complexity)
-3. **Analyzes** test failures to find root causes
-4. **Fixes** issues with targeted changes
-5. **Repeats** until quality reaches 85+ (senior engineer level)
-All autonomous. All verifiable. All open source.
----
-## The Problem with AI Code Generation
-Current AI assistants:
-- Generate code that looks good but fails in production
-- Don't run tests or validate output
-- Make you iterate manually
-- Can't coordinate complex tasks
-**Result**: You spend hours debugging AI-generated code.
----
-## How Coco Solves It
-### 1. Real Quality Measurement
-Coco measures 12 dimensions of code quality:
-- **Test Coverage**: Runs your tests with c8/v8 instrumentation (not estimated)
-- **Security**: Scans for vulnerabilities with npm audit + OWASP checks
-- **Complexity**: Calculates cyclomatic complexity from AST
-- **Correctness**: Validates tests pass + builds succeed
-- **Maintainability**: Real metrics from code analysis
-- ... and 7 more
-**No fake scores. No hardcoded values. Real metrics.**
-Current state: **58.3% real measurements** (up from 0%), with 41.7% still using safe defaults.
-### 2. Smart Iteration Loop
-When tests fail, Coco:
-- Parses stack traces to find the error location
-- Reads surrounding code for context
-- Diagnoses root cause (not just symptoms)
-- Generates targeted fix (not rewriting entire file)
-- Re-validates and repeats if needed
-**Target**: 70%+ of failures fixed in first iteration.
-### 3. Multi-Agent Coordination
-Complex tasks are decomposed and executed by specialized agents:
-- **Researcher**: Explores codebase, finds patterns
-- **Coder**: Writes production code
-- **Tester**: Generates comprehensive tests
-- **Reviewer**: Identifies issues
-- **Optimizer**: Reduces complexity
-Agents work in parallel where possible, coordinate when needed.
-### 4. AST-Aware Validation
-Before saving any file:
-- Parses AST to validate syntax
-- Checks TypeScript semantics
-- Analyzes imports
-- Verifies build succeeds
-**Result**: Zero broken builds from AI edits.
-### 5. Production Hardening
-- **Error Recovery**: Auto-recovers from 8 error types (syntax, timeout, dependencies, etc.)
-- **Checkpoint/Resume**: Ctrl+C saves state, resume anytime
-- **Resource Limits**: Prevents runaway costs with configurable quotas
-- **Streaming Output**: Real-time feedback as code generates
+<p align="center">
+  <em>Generate → Test → Measure → Fix → Repeat — autonomously.</em>
+</p>
 ---
-## Architecture
-### COCO Methodology (4 Phases)
+## Why Coco?
-1. **Converge**: Gather requirements, create specification
-2. **Orchestrate**: Design architecture, create task backlog
-3. **Complete**: Execute tasks with quality iteration
-4. **Output**: Generate CI/CD, docs, deployment config
+Most AI coding tools generate code and hand it to you. If something breaks — tests fail, types don't match, a security issue slips in — that's your problem.
-### Quality Iteration Loop
+Coco takes a different approach. After generating code, it **runs your tests, measures quality across 12 dimensions, diagnoses what's wrong, and fixes it** — in a loop, autonomously — until the code actually meets a quality bar you define.
 ```
-Generate Code → Validate AST → Run Tests → Analyze Failures
-       ↑                                            ↓
-       ←────────── Generate Targeted Fixes ←───────┘
+ ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
+ │ Generate │ ──► │   Test   │ ──► │ Measure  │ ──► │   Fix    │
+ └──────────┘     └──────────┘     └──────────┘     └──────────┘
+                                                          │
+                                              Score < 85? │ ──► Loop back
+                                              Score ≥ 85? │ ──► Done ✅
 ```
-Stops when:
-- Quality ≥ 85/100 (minimum)
-- Score stable for 2+ iterations
-- Tests all passing
-- Or max 10 iterations reached
-### Real Analyzers
-| Analyzer | What It Measures | Data Source |
-|----------|------------------|-------------|
-| Coverage | Lines, branches, functions, statements | c8/v8 instrumentation |
-| Security | Vulnerabilities, dangerous patterns | npm audit + static analysis |
-| Complexity | Cyclomatic complexity, maintainability | AST traversal |
-| Duplication | Code similarity, redundancy | Token-based comparison |
-| Build | Compilation success | tsc/build execution |
-| Import | Missing dependencies, circular deps | AST + package.json |
+This is the **Quality Convergence Loop** — Coco's core differentiator.
 ---
 ## Quick Start
-### Installation
-```bash
-npm install -g corbat-coco
-```
-### Configuration
 ```bash
-coco init
+npm install -g @corbat-tech/coco
+coco                        # Opens interactive REPL — guided setup on first run
 ```
-Follow prompts to configure:
-- AI Provider (Anthropic, OpenAI, Google)
-- API Key
-- Project preferences
-### Basic Usage
-```bash
-coco "Build a REST API with JWT authentication"
-```
-That's it. Coco will:
-1. Ask clarifying questions
-2. Design architecture
-3. Generate code + tests
-4. Iterate until quality ≥ 85
-5. Generate CI/CD + docs
-### Resume Interrupted Session
+That's it. Coco walks you through provider configuration on first launch.
 ```bash
-coco resume
-```
-### Check Quality of Existing Code
-```bash
-coco quality ./src
+# Or use it directly:
+coco "Add a REST API endpoint for user authentication with tests"
 ```
 ---
-## Real Results
+## What Coco Does Well
-### Week 1 Achievements ✅
+### Quality Convergence Loop
-**Goal**: Replace fake metrics with real measurements
+Coco doesn't just generate code — it iterates until quality converges:
-**Results**:
-- Hardcoded metrics: 100% → **41.7%** ✅
-- New analyzers: **4** (coverage, security, complexity, duplication)
-- New tests: **62** (all passing)
-- E2E tests: **6** (full pipeline validation)
+| Iteration | Score | What happened |
+|:---------:|:-----:|---------------|
+| 1 | 52 | Code generated — 3 tests failing, no error handling |
+| 2 | 71 | Tests fixed, security vulnerability found |
+| 3 | 84 | Security patched, coverage improved to 82% |
+| 4 | 91 | All green — quality converged ✅ |
-**Before**:
-```javascript
-// All hardcoded 😱
-dimensions: {
-  testCoverage: 80,      // Fake
-  security: 100,         // Fake
-  complexity: 90,        // Fake
-  // ... all fake
-}
-```
+The loop is configurable: target score, max iterations, convergence threshold, security requirements. You control the bar.
-**After**:
-```typescript
-// Real measurements ✅
-const coverage = await this.coverageAnalyzer.analyze(files);
-const security = await this.securityScanner.scan(files);
-const complexity = await this.complexityAnalyzer.analyze(files);
-dimensions: {
-  testCoverage: coverage.lines.percentage,  // REAL
-  security: security.score,                  // REAL
-  complexity: complexity.score,              // REAL
-  // ... 7 more real metrics
-}
-```
+### 12-Dimension Quality Scoring
-### Benchmark Results
+Every iteration measures your code across 12 dimensions using real static analysis:
-Running Coco on itself (corbat-coco codebase):
+| Dimension | How it's measured |
+|-----------|-------------------|
+| Test Coverage | c8/v8 instrumentation |
+| Security | Pattern matching + optional Snyk |
+| Complexity | Cyclomatic complexity via AST parsing |
+| Duplication | Line-based similarity detection |
+| Correctness | Test pass rate + build verification |
+| Style | oxlint / eslint / biome integration |
+| Documentation | JSDoc coverage analysis |
+| Readability | AST: naming quality, function length, nesting |
+| Maintainability | AST: file size, coupling, function count |
+| Test Quality | Assertion density, edge case coverage |
+| Completeness | Export density + test file coverage |
+| Robustness | Error handling pattern detection |
-```
-⏱️  Duration: 19.8s
-📊 Overall Score: 60/100
-📈 Real Metrics: 7/12 (58.3%)
-🛡️  Security: 0 critical issues
-📝 Complexity: 100/100 (low)
-🔄 Duplication: 72.5/100 (27.5% duplication)
-📄 Issues Found: 311
-💡 Suggestions: 3
-```
-**Validation**: ✅ Target met (≤42% hardcoded)
----
-## Development Roadmap
+> **Transparency note**: 7 dimensions use instrumented measurements. 5 use heuristic-based static analysis. We label which is which — no black boxes.
-### Phase 1: Foundation ✅ (Weeks 1-4) - COMPLETE
+### Multi-Provider Support
-- [x] Real quality scoring system
-- [x] AST-aware generation pipeline
-- [x] Smart iteration loop
-- [x] Test failure analyzer
-- [x] Build verifier
-- [x] Import analyzer
+Bring your own API keys. Coco works with:
-**Current Score**: ~7.0/10
+| Provider | Auth | Models |
+|----------|------|--------|
+| **Anthropic** | API key / OAuth PKCE | Claude Opus, Sonnet, Haiku |
+| **OpenAI** | API key | GPT-4o, o1, o3 |
+| **Google** | API key / gcloud ADC | Gemini Pro, Flash |
+| **Ollama** | Local | Any local model |
+| **LM Studio** | Local | Any GGUF model |
+| **Moonshot** | API key | Kimi models |
-### Phase 2: Intelligence (Weeks 5-8) - IN PROGRESS
+### Multi-Agent Architecture
-- [x] Agent execution engine
-- [x] Parallel agent coordinator
-- [ ] Agent communication protocol
-- [ ] Semantic code search
-- [ ] Codebase knowledge graph
-- [ ] Smart task decomposition
-- [ ] Adaptive planning
-**Target Score**: 8.5/10
-### Phase 3: Excellence (Weeks 9-12) - IN PROGRESS
-- [x] Error recovery system
-- [x] Progress tracking & interruption
-- [ ] Resource limits & quotas
-- [ ] Multi-language AST support
-- [ ] Framework detection
-- [ ] Interactive dashboard
-- [ ] Streaming output
-- [ ] Performance optimization
-**Target Score**: 9.0+/10
----
+Six specialized agents with weighted-scoring routing:
-## Honest Comparison with Alternatives
+- **Researcher** — Explores, analyzes, maps the codebase
+- **Coder** — Writes and edits code (default route)
+- **Tester** — Generates tests, improves coverage
+- **Reviewer** — Code review, quality auditing
+- **Optimizer** — Refactoring and performance
+- **Planner** — Architecture design, task decomposition
-| Feature | Cursor | Aider | Cody | Devin | **Coco** |
-|---------|--------|-------|------|-------|----------|
-| IDE Integration | ✅ | ❌ | ✅ | ❌ | 🔄 (planned Q2) |
-| Real Quality Metrics | ❌ | ❌ | ❌ | ✅ | ✅ (58% real) |
-| Root Cause Analysis | ❌ | ❌ | ❌ | ✅ | ✅ |
-| Multi-Agent | ❌ | ❌ | ❌ | ✅ | ✅ |
-| AST Validation | ❌ | ❌ | ❌ | ✅ | ✅ |
-| Error Recovery | ❌ | ❌ | ❌ | ✅ | ✅ |
-| Checkpoint/Resume | ❌ | ❌ | ❌ | ✅ | ✅ |
-| Open Source | ❌ | ✅ | ❌ | ❌ | ✅ |
-| Price | $20/mo | Free | $9/mo | $500/mo | **Free** |
+Coco picks the right agent for each task automatically. When confidence is low, it defaults to the coder — no guessing games.
-**Verdict**: Coco offers Devin-level autonomy at Aider's price (free).
+### Interactive REPL
----
+A terminal-first experience with:
-## Current Limitations
+- **Ghost-text completion** — Tab to accept inline suggestions
+- **Slash commands** — `/coco`, `/plan`, `/build`, `/diff`, `/commit`, `/help`
+- **Image paste** — `Ctrl+V` to paste screenshots for visual context
+- **Intent recognition** — Natural language mapped to commands
+- **Context management** — Automatic compaction when context grows large
-We believe in honesty:
+### Production Hardening
-- **Languages**: Best with TypeScript/JavaScript. Python/Go/Rust support is experimental.
-- **Metrics**: 58.3% real, 41.7% use safe defaults (improving to 100% real by Week 4)
-- **IDE Integration**: CLI-first. VS Code extension coming Q2 2026.
-- **Learning Curve**: More complex than Copilot. Power tool, not autocomplete.
-- **Cost**: Uses your LLM API keys. ~$2-5 per project with Claude.
-- **Speed**: Iteration takes time. Not for quick edits (use Cursor for that).
-- **Multi-Agent**: Implemented but not yet battle-tested at scale.
+- **Error recovery** with typed error strategies and exponential backoff
+- **Checkpoint/Resume** — `Ctrl+C` saves state, `coco resume` picks up where you left off
+- **AST validation** — Syntax-checks generated code before saving
+- **Convergence analysis** — Detects oscillation, diminishing returns, and stuck patterns
+- **Path sandboxing** — Tools can only access files within the project
 ---
-## Technical Details
-### Stack
-- **Language**: TypeScript (ESM, strict mode)
-- **Runtime**: Node.js 22+
-- **Package Manager**: pnpm
-- **Testing**: Vitest (3,909 tests)
-- **Linting**: oxlint (fast, minimal config)
-- **Formatting**: oxfmt
-- **Build**: tsup (fast ESM bundler)
+## COCO Methodology
-### Project Structure
+Four phases, each with a dedicated executor:
 ```
-corbat-coco/
-├── src/
-│   ├── agents/           # Multi-agent coordination
-│   ├── cli/              # CLI commands
-│   ├── orchestrator/     # Central coordinator
-│   ├── phases/           # COCO phases (4 phases)
-│   ├── quality/          # Quality analyzers
-│   │   └── analyzers/    # Coverage, security, complexity, etc.
-│   ├── providers/        # LLM providers (Anthropic, OpenAI, Google)
-│   ├── tools/            # Tool implementations
-│   └── types/            # Type definitions
-├── test/
-│   ├── e2e/              # End-to-end tests
-│   └── benchmarks/       # Performance benchmarks
-└── docs/                 # Documentation
+ CONVERGE          ORCHESTRATE         COMPLETE            OUTPUT
+┌──────────┐     ┌──────────────┐   ┌──────────────┐   ┌──────────┐
+│ Gather   │     │ Design       │   │ Execute with │   │ Generate │
+│ reqs     │ ──► │ architecture │──►│ quality      │──►│ CI/CD,   │
+│ + spec   │     │ + backlog    │   │ convergence  │   │ docs     │
+└──────────┘     └──────────────┘   └──────────────┘   └──────────┘
+                                         ↑    ↓
+                                    ┌─────────────┐
+                                    │ Convergence │
+                                    │    Loop     │
+                                    └─────────────┘
 ```
-### Quality Thresholds
-- **Minimum Score**: 85/100 (senior-level)
-- **Target Score**: 95/100 (excellent)
-- **Test Coverage**: 80%+ required
-- **Security**: 100/100 (zero tolerance)
-- **Max Iterations**: 10 per task
-- **Convergence**: Delta < 2 between iterations
+1. **Converge** — Understand what needs to be built. Gather requirements, produce a spec.
+2. **Orchestrate** — Design the architecture, decompose into a task backlog.
+3. **Complete** — Execute each task with the quality convergence loop.
+4. **Output** — Generate CI/CD pipelines, documentation, and deployment config.
 ---
-## Contributing
+## Use Cases
-Coco is open source (MIT). We welcome:
-- Bug reports
-- Feature requests
-- Pull requests
-- Documentation improvements
-- Real-world usage feedback
+Coco is designed for developers who want AI assistance with **accountability**:
+- **Feature development** — Describe what you want, get tested and reviewed code
+- **Vibe coding** — Explore ideas interactively; Coco handles the quality checks
+- **Refactoring** — Point at code and say "make this better" — Coco iterates until metrics improve
+- **Test generation** — Improve coverage with meaningful tests, not boilerplate
+- **Code review** — Get multi-dimensional quality feedback on existing code
+- **Learning** — See how code quality improves across iterations
-See [CONTRIBUTING.md](./CONTRIBUTING.md).
+---
-### Development
+## Development
 ```bash
-# Clone repo
 git clone https://github.com/corbat/corbat-coco
 cd corbat-coco
-# Install dependencies
 pnpm install
-# Run in dev mode
-pnpm dev
-# Run tests
-pnpm test
-# Run quality benchmark
-pnpm benchmark
-# Full check (typecheck + lint + test)
-pnpm check
+pnpm dev          # Run in dev mode (tsx)
+pnpm test         # 4,350+ tests via Vitest
+pnpm check        # typecheck + lint + test
+pnpm build        # Production build (tsup)
 ```
----
-## FAQ
-### Q: Is Coco production-ready?
-**A**: Partially. The quality scoring system (Week 1) is production-ready and thoroughly tested. Multi-agent coordination (Week 5-8) is implemented but needs more real-world validation. Use for internal projects first.
-### Q: How does Coco compare to Devin?
-**A**: Similar approach (autonomous iteration, quality metrics, multi-agent), but Coco is:
-- **Open source** (vs closed)
-- **Bring your own API keys** (vs $500/mo subscription)
-- **More transparent** (you can inspect every metric)
-- **Earlier stage** (Devin has 2+ years of production usage)
-### Q: Why are 41.7% of metrics still hardcoded?
-**A**: These are **safe defaults**, not fake metrics:
-- `style: 100` when no linter is configured (legitimate default)
-- `correctness`, `completeness`, `robustness`, `testQuality`, `documentation` are pending Week 2-4 implementations
-We're committed to reaching **0% hardcoded** by end of Phase 1 (Week 4).
-### Q: Can I use this with my company's code?
-**A**: Yes, but:
-- Code stays on your machine (not sent to third parties)
-- LLM calls go to your chosen provider (Anthropic/OpenAI/Google)
-- Review generated code before committing
-- Start with non-critical projects
-### Q: Does Coco replace human developers?
+### Project Structure
-**A**: No. Coco is a **force multiplier**, not a replacement:
-- Best for boilerplate, CRUD APIs, repetitive tasks
-- Requires human review and validation
-- Struggles with novel algorithms and complex business logic
-- Think "junior developer with infinite patience"
+```
+src/
+├── agents/           # Multi-agent coordination + weighted routing
+├── cli/              # REPL, commands, input handling, output rendering
+├── orchestrator/     # Phase coordinator + state recovery
+├── phases/           # COCO phases (converge/orchestrate/complete/output)
+├── quality/          # 12 quality analyzers + convergence engine
+├── providers/        # 6 LLM providers + OAuth flows
+├── tools/            # 20+ tool implementations
+├── hooks/            # Lifecycle hooks (safety, lint, format, audit)
+├── mcp/              # MCP server for external integration
+└── config/           # Zod-validated configuration system
+```
-### Q: What's the roadmap to 9.0/10?
+### Technology Stack
-**A**: See [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md) for the complete 12-week plan.
+| Component | Technology |
+|-----------|-----------|
+| Language | TypeScript (ESM, strict mode) |
+| Runtime | Node.js 22+ |
+| Testing | Vitest (4,350+ tests) |
+| Linting | oxlint |
+| Formatting | oxfmt |
+| Build | tsup |
+| Schema validation | Zod |
 ---
-## License
+## Known Limitations
-MIT License - see [LICENSE](./LICENSE).
+We'd rather you know upfront:
----
+- **TypeScript/JavaScript first** — Other languages have basic support but fewer analyzers
+- **CLI-only** — No IDE extension yet (VS Code integration is planned)
+- **Iteration takes time** — The convergence loop adds 2-5 minutes per task. For quick one-line fixes, a simpler tool may be faster
+- **Heuristic analyzers** — 5 of 12 quality dimensions use pattern-based heuristics, not deep semantic analysis
+- **LLM-dependent** — Output quality depends on the model you connect. Larger models produce better results
+- **Early stage** — Actively developed. Not yet battle-tested at large enterprise scale
-## Credits
+---
-**Built with**:
-- TypeScript + Node.js
-- Anthropic Claude, OpenAI GPT-4, Google Gemini
-- Vitest, oxc, tree-sitter, c8
+## Contributing
-**Made with 🥥 by developers who are tired of debugging AI code.**
+We welcome contributions of all kinds:
----
-## Links
+- Bug reports and feature requests
+- New quality analyzers
+- Additional LLM provider integrations
+- Documentation and examples
+- Real-world usage feedback
-- **GitHub**: [github.com/corbat/corbat-coco](https://github.com/corbat/corbat-coco)
-- **Documentation**: [docs.corbat.dev](https://docs.corbat.dev)
-- **Roadmap**: [IMPROVEMENT_ROADMAP_2026.md](./IMPROVEMENT_ROADMAP_2026.md)
-- **Week 1 Report**: [WEEK_1_COMPLETE.md](./WEEK_1_COMPLETE.md)
-- **Discord**: [discord.gg/corbat](https://discord.gg/corbat) (coming soon)
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
 ---
-**Status**: 🚧 Week 1 Complete, Weeks 2-12 In Progress
+## About
-**Next Milestone**: Phase 1 Complete (Week 4) - Target Score 7.5/10
+Corbat-Coco is built by [Corbat](https://corbat.tech), a technology consultancy that believes AI coding tools should be transparent, measurable, and open source.
-**Current Score**: ~7.0/10 (honest, verifiable)
+<p align="center">
+  <a href="https://github.com/corbat/corbat-coco">GitHub</a> · <a href="https://corbat.tech">corbat.tech</a>
+</p>
-**Honest motto**: "We're not #1 yet, but we're getting there. One real metric at a time." 🥥
+<p align="center"><strong>MIT License</strong> · Made by developers who measure before they ship. 🥥</p>