@autobe/estimate 0.30.0-dev.20260315 → 0.30.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +661 -661
- package/README.md +201 -261
- package/package.json +14 -5
package/README.md
CHANGED
|
@@ -1,261 +1,201 @@
|
|
|
1
|
-
#
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
section Design Integrity
|
|
203
|
-
DB Coverage Agent : done, 2026-01-15, 28d
|
|
204
|
-
API Endpoint Coverage Agent : done, 2026-01-22, 28d
|
|
205
|
-
Schema Relation Agent : done, 2026-02-01, 28d
|
|
206
|
-
Schema Structure Agent : done, 2026-02-01, 28d
|
|
207
|
-
Schema Content Agent : done, 2026-03-01, 28d
|
|
208
|
-
|
|
209
|
-
section Multi-lingual Support
|
|
210
|
-
Java Compiler PoC : done, 2026-01-01, 30d
|
|
211
|
-
Java Database : done, 2026-01-01, 14d
|
|
212
|
-
Java Interface : done, 2026-01-15, 21d
|
|
213
|
-
Java Test : active, 2026-02-05, 28d
|
|
214
|
-
Java Realize : active, 2026-03-01, 31d
|
|
215
|
-
|
|
216
|
-
section Human Modification Support
|
|
217
|
-
Database Schema Parser : active, 2026-02-15, 28d
|
|
218
|
-
Interface Schema Parser : active, 2026-02-22, 28d
|
|
219
|
-
Requirements Sync Agent : planned, 2026-03-08, 24d
|
|
220
|
-
|
|
221
|
-
section Miscellaneous
|
|
222
|
-
System Prompt Simplification : done, 2026-02-01, 28d
|
|
223
|
-
Estimation Agent : done, 2026-02-01, 28d
|
|
224
|
-
Playground Service Enhancement : active, 2026-02-15, 28d
|
|
225
|
-
PR Articles Writing : active, 2026-02-15, 30d
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
AutoBE has successfully completed Alpha, Beta, and Gamma development phases, establishing a solid foundation with **100% compilation success rate**. The current **Delta Release** focuses on transitioning from horizontal expansion to vertical deepening.
|
|
229
|
-
|
|
230
|
-
**Strategic Shift**: In Gamma, we rapidly implemented features like RAG, Modularization, and Complementation under a "just ship it" philosophy. Delta fills the stability gaps that remained by systematically discovering and fixing hidden defects through Local LLM benchmarks.
|
|
231
|
-
|
|
232
|
-
**Key Focus Areas**:
|
|
233
|
-
|
|
234
|
-
- **Local LLM Benchmark**: Using open-source models like Qwen3 as a touchstone to discover hidden defects that commercial models mask, ensuring more robust operation across all model types
|
|
235
|
-
- **Validation Logic Enhancement**: Strengthening schemas and validation logic through dynamic function calling schemas, JSON Schema validators, and progressive validation pipelines
|
|
236
|
-
- **RAG Optimization**: Completing the Hybrid Search system (Vector + BM25) with dynamic K retrieval and comprehensive benchmark tuning
|
|
237
|
-
- **Design Integrity**: Building mechanisms to verify and ensure design consistency between Database and Interface phases through coverage and schema review agents
|
|
238
|
-
- **Multi-lingual Support**: Launching Java/Spring code generation alongside TypeScript/NestJS, with language-neutral AST structures enabling future language additions
|
|
239
|
-
- **Human Modification Support**: Enabling maintenance continuity by parsing user-modified code back into AutoBE's internal AST representation, ensuring AutoBE remains useful beyond initial generation
|
|
240
|
-
|
|
241
|
-
This roadmap prioritizes stability and depth over feature breadth, informed by real-world production experience from Gamma.
|
|
242
|
-
|
|
243
|
-
## Current Limitations
|
|
244
|
-
|
|
245
|
-
While AutoBE achieves 100% compilation success, please note these current limitations:
|
|
246
|
-
|
|
247
|
-
**Runtime Behavior**: Generated applications compile successfully, but runtime behavior may require testing and refinement. Unexpected runtime errors can occur during server execution, such as database connection issues, API endpoint failures, or business logic exceptions that weren't caught during compilation. We strongly recommend thorough testing in development environments before deploying to production. Our v1.0 release targets 100% runtime success to address these issues.
|
|
248
|
-
|
|
249
|
-
**Design Interpretation**: AutoBE's database and API designs may differ from your expectations. We recommend thoroughly reviewing generated specifications before proceeding with implementation, especially before production deployment.
|
|
250
|
-
|
|
251
|
-
**Token Consumption**: AutoBE requires significant AI token usage for complex projects. Based on our testing, projects typically consume 30M-250M+ tokens depending on complexity (simple todo apps use ~4M tokens, while complex e-commerce platforms may require 250M+ tokens). We are working on RAG optimization to reduce this overhead in future releases.
|
|
252
|
-
|
|
253
|
-
**Maintenance**: AutoBE focuses on initial generation and does not provide ongoing maintenance capabilities. Once your backend is generated, you'll need to handle bug fixes, feature additions, performance optimizations, and security updates manually. We recommend establishing a development workflow that combines the generated codebase with AI coding assistants like Claude Code for efficient ongoing development and maintenance tasks.
|
|
254
|
-
|
|
255
|
-

|
|
256
|
-
|
|
257
|
-
## License
|
|
258
|
-
|
|
259
|
-
AutoBE is licensed under the [GNU Affero General Public License v3.0 (AGPL-3.0)](LICENSE). If you modify AutoBE itself or offer it as a network service, you must make your source code available under the same license.
|
|
260
|
-
|
|
261
|
-
However, backend applications generated by AutoBE can be relicensed under any license you choose, such as MIT. This means you can freely use AutoBE-generated code in commercial projects without open source obligations, similar to how other code generation tools work.
|
|
1
|
+
# @autobe/estimate
|
|
2
|
+
|
|
3
|
+
A CLI tool that evaluates code quality for AutoBE-generated projects.
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
# 1. Build (only needed once)
|
|
9
|
+
cd packages/estimate
|
|
10
|
+
npx tsc --build
|
|
11
|
+
|
|
12
|
+
# 2. Set up environment
|
|
13
|
+
cp .env.example .env
|
|
14
|
+
# Fill in OPENROUTER_API_KEY if you want AI agent evaluation
|
|
15
|
+
|
|
16
|
+
# 3. Run a single evaluation
|
|
17
|
+
npx tsx dist/bin/estimate.js -i /path/to/project -o ./reports
|
|
18
|
+
|
|
19
|
+
# 4. Run the full benchmark suite
|
|
20
|
+
./run-benchmark.sh
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## CLI Usage
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
npx tsx dist/bin/estimate.js -i <path> -o <path> [options]
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
| Option | Description |
|
|
30
|
+
|--------|-------------|
|
|
31
|
+
| `-i, --input <path>` | Path to the project to evaluate (required) |
|
|
32
|
+
| `-o, --output <path>` | Directory to save reports (required) |
|
|
33
|
+
| `-v, --verbose` | Show detailed logs |
|
|
34
|
+
| `--continue-on-gate-failure` | Continue evaluation even if gate fails |
|
|
35
|
+
| `--use-agent` | Enable AI agent evaluation (30% of score) |
|
|
36
|
+
| `--provider <provider>` | LLM provider: `claude`, `openai`, `openrouter` |
|
|
37
|
+
| `--api-key <key>` | API key (or set `OPENROUTER_API_KEY` env var) |
|
|
38
|
+
| `--auto-fix` | Auto-fix simple issues (TS1161, TS7006) |
|
|
39
|
+
| `--run-tests` | Start Docker server and run e2e tests |
|
|
40
|
+
| `--golden` | Run Golden Set evaluation |
|
|
41
|
+
| `--project <project>` | Project type for Golden Set: `todo`, `bbs`, `reddit`, `shopping` |
|
|
42
|
+
|
|
43
|
+
## Scoring System
|
|
44
|
+
|
|
45
|
+
### Gate Check (pass/fail)
|
|
46
|
+
|
|
47
|
+
If the code doesn't compile, you get a 0.
|
|
48
|
+
|
|
49
|
+
- **Source file check**: No TypeScript files in `src/` means instant failure (GATE001)
|
|
50
|
+
- **TypeScript compilation**: Uses `AutoBeTypeScriptCompiler` (in-memory)
|
|
51
|
+
- **Prisma schema validation**: Uses `AutoBeDatabaseCompiler` (in-memory)
|
|
52
|
+
|
|
53
|
+
### Scoring Phases (70% of total)
|
|
54
|
+
|
|
55
|
+
| Phase | Weight | What we check |
|
|
56
|
+
|-------|--------|---------------|
|
|
57
|
+
| Document Quality | 10% | Presence of `docs/analysis/`, README |
|
|
58
|
+
| Requirements Coverage | 25% | Controllers, providers, DTOs coverage |
|
|
59
|
+
| Test Coverage | 30% | Test count, assertion quality, stub detection |
|
|
60
|
+
| Logic Completeness | 25% | TODOs, FIXMEs, empty methods, stub returns |
|
|
61
|
+
| API Completeness | 10% | Endpoint implementation, provider delegation |
|
|
62
|
+
|
|
63
|
+
### Penalties
|
|
64
|
+
|
|
65
|
+
| Penalty | Trigger | Max Deduction |
|
|
66
|
+
|---------|---------|---------------|
|
|
67
|
+
| Warning | Warning ratio > 50% | -10 |
|
|
68
|
+
| Duplication | > 50 duplicate blocks | -5 |
|
|
69
|
+
| JSDoc | > 10% missing | -5 |
|
|
70
|
+
| Schema Sync (SYNC001) | > 5 empty types in DTOs | -5 |
|
|
71
|
+
| Schema Sync (SYNC002) | >= 3 Prisma-Structure mismatches | -5 |
|
|
72
|
+
| Mapping ratio (REQ006) | < 50% controller-provider coverage | -40 |
|
|
73
|
+
|
|
74
|
+
### Reference Info (no score impact)
|
|
75
|
+
|
|
76
|
+
- **Complexity**: Functions with cyclomatic complexity > 15
|
|
77
|
+
- **Duplication**: Blocks of 10+ identical lines
|
|
78
|
+
- **Naming**: PascalCase violations
|
|
79
|
+
- **JSDoc**: Missing documentation comments
|
|
80
|
+
- **Schema Sync**: Empty interfaces (SYNC001) + Prisma-Structure property mismatches (SYNC002)
|
|
81
|
+
|
|
82
|
+
### AI Agent Evaluation (30% of total)
|
|
83
|
+
|
|
84
|
+
Enable with `--use-agent`:
|
|
85
|
+
|
|
86
|
+
- **SecurityAgent**: OWASP Top 10 security analysis
|
|
87
|
+
- **LLMQualityAgent**: Detects hallucinations, incomplete implementations, logic errors
|
|
88
|
+
|
|
89
|
+
### Scoring Formula
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
Raw Phase Score = sum(phase_score * phase_weight)
|
|
93
|
+
Penalties = warning + duplication + jsdoc + schemaSync + mapping (max ~65)
|
|
94
|
+
Adjusted Phase = Raw Phase - Penalties
|
|
95
|
+
|
|
96
|
+
Without agents: Final Score = Adjusted Phase (100%)
|
|
97
|
+
With agents: Final Score = (Adjusted Phase * 70%) + (Agent Average * 30%)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Grading
|
|
101
|
+
|
|
102
|
+
| Grade | Score | Meaning |
|
|
103
|
+
|-------|-------|---------|
|
|
104
|
+
| A | 90-100 | Production ready |
|
|
105
|
+
| B | 80-89 | Minor improvements needed |
|
|
106
|
+
| C | 70-79 | Several issues to address |
|
|
107
|
+
| D | 60-69 | Significant problems |
|
|
108
|
+
| F | 0-59 | Major issues or gate failure |
|
|
109
|
+
|
|
110
|
+
## Benchmarking
|
|
111
|
+
|
|
112
|
+
Run evaluations across all models and projects:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
cd packages/estimate
|
|
116
|
+
|
|
117
|
+
# Scoring only (no LLM calls, fast)
|
|
118
|
+
./run-benchmark.sh
|
|
119
|
+
|
|
120
|
+
# With AI agents (requires OPENROUTER_API_KEY)
|
|
121
|
+
./run-benchmark.sh agent
|
|
122
|
+
|
|
123
|
+
# Full mode (agents + runtime tests + golden set)
|
|
124
|
+
./run-benchmark.sh full
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Results are saved to `reports/benchmark/<model>/<project>/`.
|
|
128
|
+
|
|
129
|
+
### Compare
|
|
130
|
+
|
|
131
|
+
Compare multiple projects side by side:
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
npx tsx dist/bin/estimate.js compare \
|
|
135
|
+
-p "model-a:/path/to/a" "model-b:/path/to/b" \
|
|
136
|
+
-o ./reports/comparison
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## Environment Variables
|
|
140
|
+
|
|
141
|
+
Create a `.env` file in `packages/estimate/`:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
OPENROUTER_API_KEY=sk-or-...
|
|
145
|
+
|
|
146
|
+
# Optional: Langfuse telemetry
|
|
147
|
+
LANGFUSE_PUBLIC_KEY=pk-lf-...
|
|
148
|
+
LANGFUSE_SECRET_KEY=sk-lf-...
|
|
149
|
+
LANGFUSE_HOST=https://cloud.langfuse.com
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Output
|
|
153
|
+
|
|
154
|
+
Each evaluation produces two files:
|
|
155
|
+
|
|
156
|
+
- `estimate-report.md` — Human-readable summary with score breakdown
|
|
157
|
+
- `estimate-report.json` — Machine-readable for CI/CD integration
|
|
158
|
+
|
|
159
|
+
## Sample Output
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
Scoring Phases (70% of total score):
|
|
163
|
+
-------------------------------------
|
|
164
|
+
Gate: Pass
|
|
165
|
+
Document Quality 100/100
|
|
166
|
+
Requirements Coverage 90/100
|
|
167
|
+
Test Coverage 61/100
|
|
168
|
+
Logic Completeness 100/100
|
|
169
|
+
API Completeness 100/100
|
|
170
|
+
-------------------------------------
|
|
171
|
+
|
|
172
|
+
Reference Info (no score impact):
|
|
173
|
+
-------------------------------------
|
|
174
|
+
Complexity: 2 complex functions (max: 22)
|
|
175
|
+
Duplication: 102 duplicate blocks
|
|
176
|
+
Naming: 0 issues
|
|
177
|
+
JSDoc: 36 missing
|
|
178
|
+
Schema Sync: 0/35 empty types, 0 mismatched
|
|
179
|
+
-------------------------------------
|
|
180
|
+
|
|
181
|
+
Final Score: 85/100 (Grade: B)
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
## Troubleshooting
|
|
185
|
+
|
|
186
|
+
**Gate keeps failing**
|
|
187
|
+
- Use `--continue-on-gate-failure` to see all issues
|
|
188
|
+
- Gate uses in-memory compilers — unresolved external modules like `@nestjs/common` are expected
|
|
189
|
+
|
|
190
|
+
**AI agent errors**
|
|
191
|
+
- Check your API key
|
|
192
|
+
- OpenRouter model IDs use `provider/model-name` format
|
|
193
|
+
- Rate limits are retried automatically
|
|
194
|
+
|
|
195
|
+
**Build not working**
|
|
196
|
+
- Run `npx tsc --build` first
|
|
197
|
+
- Make sure `dist/` directory exists
|
|
198
|
+
|
|
199
|
+
## License
|
|
200
|
+
|
|
201
|
+
AGPL-3.0
|
package/package.json
CHANGED
|
@@ -1,7 +1,16 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@autobe/estimate",
|
|
3
|
-
"version": "0.30.
|
|
3
|
+
"version": "0.30.2",
|
|
4
4
|
"description": "Code quality evaluation system for AutoBE generated code",
|
|
5
|
+
"author": "Wrtn Technologies",
|
|
6
|
+
"license": "AGPL-3.0",
|
|
7
|
+
"repository": {
|
|
8
|
+
"type": "git",
|
|
9
|
+
"url": "https://github.com/wrtnlabs/autobe"
|
|
10
|
+
},
|
|
11
|
+
"bugs": {
|
|
12
|
+
"url": "https://github.com/wrtnlabs/autobe/issues"
|
|
13
|
+
},
|
|
5
14
|
"main": "dist/index.js",
|
|
6
15
|
"types": "dist/index.d.ts",
|
|
7
16
|
"bin": {
|
|
@@ -25,10 +34,10 @@
|
|
|
25
34
|
"langfuse": "^3.38.6",
|
|
26
35
|
"openai": "^6.15.0",
|
|
27
36
|
"typescript": "~5.9.3",
|
|
28
|
-
"@autobe/
|
|
29
|
-
"@autobe/
|
|
30
|
-
"@autobe/utils": "0.30.
|
|
31
|
-
"@autobe/filesystem": "0.30.
|
|
37
|
+
"@autobe/interface": "0.30.2",
|
|
38
|
+
"@autobe/compiler": "0.30.2",
|
|
39
|
+
"@autobe/utils": "0.30.2",
|
|
40
|
+
"@autobe/filesystem": "0.30.2"
|
|
32
41
|
},
|
|
33
42
|
"devDependencies": {
|
|
34
43
|
"@types/node": "^20.10.0",
|