carto-md 1.1.3 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CONTRIBUTING.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Contributing to Carto
2
2
 
3
- Carto is free, open source, and community-maintained. The core team owns the merger logic, MCP server, graph clustering, and CLI. The community owns language and framework extractors.
3
+ Carto is free, open source, and community-maintained. The core team owns the SQLite store, MCP server, graph clustering, and CLI. The community owns language and framework extractors.
4
4
 
5
5
  ---
6
6
 
@@ -10,85 +10,190 @@ Carto is free, open source, and community-maintained. The core team owns the mer
10
10
 
11
11
  New language support lives in `src/extractors/languages/`. Each language is an isolated module.
12
12
 
13
- Currently supported: JavaScript/TypeScript, Python, Go, R.
13
+ **Currently supported:** JavaScript/TypeScript, Python, Go, Rust, Ruby, Java, C/C++, C#, R, Prisma, HTML
14
14
 
15
- Wanted: Rust, Ruby, Java, PHP, C#, Swift, Kotlin.
15
+ **Wanted:** PHP, Swift, Kotlin, Elixir, Scala, Haskell, Zig
16
16
 
17
17
  ### Tier 2 — Framework extractors (safe to add, easy to review)
18
18
 
19
- Framework-specific route and model extraction lives in `src/extractors/`. Each framework is an isolated module.
19
+ Framework-specific route and model extraction lives inside the language plugins.
20
20
 
21
- Currently supported:
22
- - **JS/TS**: Express, Next.js (App + Pages Router), tRPC, Drizzle, Zod
23
- - **Python**: FastAPI, Pydantic, SQLAlchemy, Django (models + URLs)
24
- - **Go**: Gin, Echo, Chi, net/http
21
+ **Currently supported:**
22
+ - **JS/TS**: Express, Next.js (App + Pages Router), tRPC, React Router, Drizzle, Zod, TypeScript interfaces
23
+ - **Python**: FastAPI, Flask, Pydantic, SQLAlchemy, Django (models + URLs)
24
+ - **Go**: Gin, Echo, Chi, net/http — routes, structs, import graph
25
+ - **Rust**: Actix-web, Axum, Rocket — routes, structs
26
+ - **Java**: Spring MVC/Boot, JAX-RS — routes, JPA entities, records
27
+ - **C#**: ASP.NET Core (attribute routing + minimal API), EF Core classes, records
28
+ - **Ruby**: Rails routes.rb, Sinatra, ActiveRecord models
25
29
  - **Schema**: Prisma
26
30
  - **Frontend**: HTML fetch()
27
31
  - **R**: Plumber, Shiny, R6, S7
28
32
 
29
- Wanted: Rails, Laravel, NestJS, Hono, Spring, Flask, Fastify.
33
+ **Wanted:** NestJS, Hono, Fastify, Laravel, Django REST Framework, Ktor, Vapor
30
34
 
31
35
  ### Tier 3 — Core (review carefully before merging)
32
36
 
33
- - `src/agents/merger.js` — merger logic. One bad merge = developer loses manual notes = project dies.
34
- - `src/agents/domains.js` — graph-based domain clustering. Wrong clusters = wrong context files.
35
- - `src/engine/carto.js` — programmatic module API. Breaking changes affect tools that import Carto.
36
- - `src/mcp/server.js` — MCP server tools. Breaking changes affect Kiro/Cursor/Claude integration.
37
- - `src/engine/incremental.js` — incremental graph update engine. Bugs here cause stale graphs.
38
- - `src/cache/`file hash + graph cache. Bugs here cause wrong re-index behavior.
39
- - `src/detector/`framework detection logic.
40
- - `src/cli/` — CLI commands.
37
+ - `src/agents/merger.js` — merger logic. One bad merge = developer loses manual notes.
38
+ - `src/agents/leiden.js` — Leiden+CPM graph clustering. Wrong clusters = wrong domain context.
39
+ - `src/store/sqlite-store.js` — SQLite persistence layer.
40
+ - `src/mcp/server-v2.js` — MCP server tools. Breaking changes affect Kiro/Cursor/Claude.
41
+ - `src/store/sync-v2.js` — full sync pipeline.
42
+ - `src/cli/watch.js`incremental update pipeline.
43
+ - `src/extractors/imports.js`import resolution for all languages.
41
44
 
42
45
  ---
43
46
 
44
- ## How to add a language
47
+ ## How to add a language (V2 pattern — tree-sitter based)
45
48
 
46
- 1. Create `src/extractors/languages/yourlanguage.js`
47
- 2. Export a plugin object:
49
+ V2 uses tree-sitter for import and symbol extraction. Babel is only used for deep JS/TS route/model extraction on API handler files.
50
+
51
+ ### Step 1: Install the grammar
52
+
53
+ ```bash
54
+ npm install tree-sitter-yourlanguage --save-exact
55
+ ```
56
+
57
+ ### Step 2: Add grammar definition to `src/extractors/tree-sitter-parser.js`
58
+
59
+ Add an entry to the `GRAMMAR_DEFS` array:
60
+
61
+ ```js
62
+ {
63
+ name: 'yourlanguage',
64
+ extensions: ['.ext'],
65
+ loadGrammar: () => require('tree-sitter-yourlanguage'),
66
+ importQuery: `
67
+ (import_statement source: (string) @src)
68
+ `,
69
+ symbolQuery: `
70
+ (function_declaration name: (identifier) @name)
71
+ (class_declaration name: (identifier) @name)
72
+ `,
73
+ },
74
+ ```
75
+
76
+ The queries use tree-sitter S-expression syntax. Run `node -e "const P = require('tree-sitter'); const L = require('tree-sitter-yourlanguage'); const p = new P(); p.setLanguage(L); console.log(p.parse('your code').rootNode.toString())"` to see the node types.
77
+
78
+ ### Step 3: Create `src/extractors/languages/yourlanguage.js`
48
79
 
49
80
  ```js
81
+ 'use strict';
82
+
83
+ const tsParser = require('../tree-sitter-parser');
84
+
50
85
  module.exports = {
51
86
  name: 'yourlanguage',
52
87
  extensions: ['.ext'],
53
- extract(content, relPath) {
88
+ extract(content, filename) {
89
+ // Fast path: tree-sitter for imports + symbols (runs on ALL files)
90
+ const { imports: tsImports, symbols: tsSymbols } = tsParser.isAvailable()
91
+ ? tsParser.extractAll(content, '.ext')
92
+ : { imports: [], symbols: [] };
93
+
54
94
  return {
55
- routes: [{ method, path, functionName }],
56
- models: [{ className, fields: [{ name, type }], kind: 'yourlanguage' }],
57
- functions: [{ name, params, returnType }],
58
- envVars: ['VAR_NAME'],
59
- dbTables: [{ tableName, modelName }],
95
+ routes: extractRoutes(content), // framework-specific, regex
96
+ models: extractModels(content), // ORM/schema models, regex
97
+ functions: tsSymbols
98
+ .filter(s => s.kind === 'function')
99
+ .map(s => ({ name: s.name, params: '—', returnType: '—' })),
100
+ envVars: extractEnvVars(content), // env var references
101
+ dbTables: [],
60
102
  fetches: [],
61
103
  storageKeys: [],
62
- events: [{ type: 'listener'|'emitter', event: 'event.name' }],
63
- jobs: [{ type: 'cron'|'queue'|'interval', expression?: '* * * * *', name?: 'job-name' }],
104
+ _tsImports: tsImports, // raw import paths (for import graph)
105
+ _tsSymbols: tsSymbols, // all symbols (for get_file_summary)
64
106
  };
65
107
  }
66
108
  };
109
+
110
+ function extractRoutes(content) { return []; }
111
+ function extractModels(content) { return []; }
112
+ function extractEnvVars(content) { return []; }
113
+ ```
114
+
115
+ ### Step 4: Add import resolution to `src/extractors/imports.js`
116
+
117
+ If your language has resolvable local imports (not just package names), add a case in `extractImports()`:
118
+
119
+ ```js
120
+ } else if (ext === '.ext') {
121
+ return extractYourLanguageImports(content, filePath, projectRoot);
122
+ }
67
123
  ```
68
124
 
69
- 3. The loader auto-discovers it no changes to `loader.js` needed
70
- 4. Test on at least 3 real open-source projects
71
- 5. Open a PR with before/after AGENTS.md examples
125
+ Then implement `extractYourLanguageImports()` at the bottom of the file. It should return an array of relative file paths (from project root) that actually exist on disk.
126
+
127
+ ### Step 5: Add to `CODE_EXTS` in `src/store/sync-v2.js`
128
+
129
+ ```js
130
+ const CODE_EXTS = new Set([
131
+ // ... existing ...
132
+ '.ext',
133
+ ]);
134
+ ```
135
+
136
+ ### Step 6: Add to `detectLanguage()` in `src/store/sync-v2.js`
137
+
138
+ ```js
139
+ '.ext': 'yourlanguage',
140
+ ```
141
+
142
+ ### Step 7: Test
143
+
144
+ ```bash
145
+ # Test extraction on a real file
146
+ node -e "
147
+ const { loadLanguagePlugins, getPluginForFile } = require('./src/extractors/loader');
148
+ const plugins = loadLanguagePlugins();
149
+ const plugin = getPluginForFile(plugins, 'test.ext');
150
+ const result = plugin.extract('your code here', 'test.ext');
151
+ console.log(JSON.stringify(result, null, 2));
152
+ "
153
+
154
+ # Run correctness tests
155
+ node test/correctness.js
156
+
157
+ # Run full test suite
158
+ npm test
159
+ ```
72
160
 
73
161
  ---
74
162
 
75
163
  ## How to add a framework extractor
76
164
 
77
- 1. Add detection to `src/detector/framework.js`
78
- 2. Add route/model patterns to the relevant language plugin or create a new extractor in `src/extractors/`
79
- 3. Test on at least 2 real projects using that framework
80
- 4. Open a PR with before/after AGENTS.md examples
165
+ Framework-specific extraction (routes, models) lives inside the language plugin. Add regex patterns to the relevant `extractRoutes()` or `extractModels()` function.
166
+
167
+ Example adding Hono routes to the JS plugin:
168
+
169
+ ```js
170
+ // In src/extractors/languages/javascript.js, inside extractExpressRoutes():
171
+
172
+ // Hono: app.get('/path', handler) — same pattern as Express, already covered
173
+ // Hono with chaining: app.route('/api', apiRouter) — add if needed
174
+ ```
175
+
176
+ Test on at least 2 real open-source projects using the framework.
81
177
 
82
178
  ---
83
179
 
84
- ## How to add a domain keyword
180
+ ## How domain clustering works (V2)
85
181
 
86
- Domain clustering lives in `src/agents/domains.js`. The `DOMAIN_MAP` array maps keywords to domain names. If your framework creates a new domain category, add it:
182
+ Domain detection uses **Leiden+CPM graph clustering** (`src/agents/leiden.js`). Files that import each other heavily cluster together. Domain names are inferred from path tokens, with keyword hints for well-known patterns.
87
183
 
88
- ```js
89
- { keywords: ['graphql', 'resolver', 'mutation'], domain: 'GRAPHQL' },
184
+ For non-SaaS repos, users can define custom domains in `carto.config.json`:
185
+
186
+ ```json
187
+ {
188
+ "domains": {
189
+ "EDITOR": ["editor", "monaco", "text"],
190
+ "PLATFORM": ["platform", "service", "registry"]
191
+ }
192
+ }
90
193
  ```
91
194
 
195
+ The keyword seeds in `src/store/sync-v2.js` (the `keywordSeeds` object) can be extended for new domain types.
196
+
92
197
  ---
93
198
 
94
199
  ## Ground rules
@@ -98,6 +203,7 @@ Domain clustering lives in `src/agents/domains.js`. The `DOMAIN_MAP` array maps
98
203
  - **Test on unknown repos.** Don't just test on projects you wrote. Find a real open-source repo using the framework and verify the output is correct.
99
204
  - **No cloud, no telemetry, no tracking.** Carto is local only. Forever. Don't add any network calls except the existing npm update check.
100
205
  - **No paid features.** Free forever. MIT. Don't propose monetization.
206
+ - **tree-sitter first.** For new languages, always use tree-sitter for imports and symbols. Only use regex for framework-specific patterns (routes, models) that tree-sitter queries can't easily express.
101
207
 
102
208
  ---
103
209
 
@@ -110,6 +216,8 @@ npm install
110
216
  node src/cli/index.js init # test in any project
111
217
  node src/cli/index.js serve # test MCP server
112
218
  npm test # run test suite (30 tests)
219
+ node test/correctness.js # run correctness tests (31 tests)
220
+ node test/benchmark.js # run benchmarks against real repos
113
221
  ```
114
222
 
115
223
  ---
@@ -117,20 +225,23 @@ npm test # run test suite (30 tests)
117
225
  ## PR checklist
118
226
 
119
227
  - [ ] Tested on at least 2-3 real open-source projects
120
- - [ ] Before/after AGENTS.md included in PR description
121
- - [ ] Plugin returns all fields including `events` and `jobs` (can be empty arrays)
228
+ - [ ] Before/after AGENTS.md or `get_architecture` output included in PR description
229
+ - [ ] Plugin uses tree-sitter for imports/symbols (not Babel or regex for the hot path)
230
+ - [ ] Plugin returns all fields including `_tsImports` and `_tsSymbols`
231
+ - [ ] Import resolution added to `src/extractors/imports.js` if language has local imports
232
+ - [ ] Extension added to `CODE_EXTS` and `detectLanguage()` in `sync-v2.js`
122
233
  - [ ] No changes to merger logic (unless explicitly fixing a merger bug)
123
234
  - [ ] No network calls added
124
- - [ ] `carto --version` still works
125
- - [ ] `npm test` passes
235
+ - [ ] `npm test` passes (30/30)
236
+ - [ ] `node test/correctness.js` passes (31/31)
126
237
 
127
238
  ---
128
239
 
129
240
  ## Issues
130
241
 
131
- - **Bug**: Open an issue with the project type, command run, and what AGENTS.md or domain files produced vs what you expected.
242
+ - **Bug**: Open an issue with the project type, command run, and what output was produced vs expected.
132
243
  - **Language request**: Open an issue titled "Language: [name]"
133
244
  - **Framework request**: Open an issue titled "Framework: [name]"
134
- - **Domain keyword**: Open an issue titled "Domain: [name]" if your codebase doesn't cluster correctly
245
+ - **Domain clustering issue**: Open an issue titled "Domains: [repo name]" with the repo URL and what domains were detected vs what you expected.
135
246
 
136
247
  All issues acknowledged within 24 hours.