carto-md 1.1.4 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CONTRIBUTING.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Contributing to Carto
2
2
 
3
- Carto is free, open source, and community-maintained. The core team owns the merger logic, MCP server, graph clustering, and CLI. The community owns language and framework extractors.
3
+ Carto is free, open source, and community-maintained. The core team owns the SQLite store, MCP server, graph clustering, and CLI. The community owns language and framework extractors.
4
4
 
5
5
  ---
6
6
 
@@ -10,88 +10,178 @@ Carto is free, open source, and community-maintained. The core team owns the mer
10
10
 
11
11
  New language support lives in `src/extractors/languages/`. Each language is an isolated module.
12
12
 
13
- Currently supported: JavaScript/TypeScript, Python, Go, R.
13
+ **Currently supported:** JavaScript/TypeScript, Python, Go, Rust, Ruby, Java, C/C++, C#, R, Prisma, HTML
14
14
 
15
- Wanted: Rust, Ruby, Java, PHP, C#, Swift, Kotlin.
15
+ **Wanted:** PHP, Swift, Kotlin, Elixir, Scala, Haskell, Zig
16
16
 
17
17
  ### Tier 2 — Framework extractors (safe to add, easy to review)
18
18
 
19
- Framework-specific route and model extraction lives in `src/extractors/`. Each framework is an isolated module.
19
+ Framework-specific route and model extraction lives inside the language plugins.
20
20
 
21
- Currently supported:
22
- - **JS/TS**: Express, Next.js (App + Pages Router), tRPC, Drizzle, Zod
21
+ **Currently supported:**
22
+ - **JS/TS**: Express, Next.js (App + Pages Router), tRPC, React Router, Drizzle, Zod, TypeScript interfaces
23
23
  - **Python**: FastAPI, Flask, Pydantic, SQLAlchemy, Django (models + URLs)
24
- - **Go**: Gin, Echo, Chi, Fiber, net/http — routes, structs, import graph
24
+ - **Go**: Gin, Echo, Chi, net/http — routes, structs, import graph
25
+ - **Rust**: Actix-web, Axum, Rocket — routes, structs
26
+ - **Java**: Spring MVC/Boot, JAX-RS — routes, JPA entities, records
27
+ - **C#**: ASP.NET Core (attribute routing + minimal API), EF Core classes, records
28
+ - **Ruby**: Rails routes.rb, Sinatra, ActiveRecord models
25
29
  - **Schema**: Prisma
26
30
  - **Frontend**: HTML fetch()
27
31
  - **R**: Plumber, Shiny, R6, S7
28
32
 
29
- Wanted: Rails, Laravel, NestJS, Hono, Spring, Fastify.
33
+ **Wanted:** NestJS, Hono, Fastify, Laravel, Django REST Framework, Ktor, Vapor
30
34
 
31
35
  ### Tier 3 — Core (review carefully before merging)
32
36
 
33
- - `src/agents/merger.js` — merger logic. One bad merge = developer loses manual notes = project dies.
34
- - `src/agents/domains.js` — graph-based domain clustering. Wrong clusters = wrong context files.
35
- - `src/engine/carto.js` — programmatic module API. Breaking changes affect tools that import Carto.
36
- - `src/mcp/server.js` — MCP server tools. Breaking changes affect Kiro/Cursor/Claude integration.
37
- - `src/engine/incremental.js` — incremental graph update engine. Bugs here cause stale graphs.
38
- - `src/cache/`file hash + graph cache. Bugs here cause wrong re-index behavior.
39
- - `src/detector/`framework detection logic.
40
- - `src/cli/` — CLI commands.
37
+ - `src/agents/merger.js` — merger logic. One bad merge = developer loses manual notes.
38
+ - `src/agents/leiden.js` — Leiden+CPM graph clustering. Wrong clusters = wrong domain context.
39
+ - `src/store/sqlite-store.js` — SQLite persistence layer.
40
+ - `src/mcp/server-v2.js` — MCP server tools. Breaking changes affect Kiro/Cursor/Claude.
41
+ - `src/store/sync-v2.js` — full sync pipeline.
42
+ - `src/cli/watch.js`incremental update pipeline.
43
+ - `src/extractors/imports.js`import resolution for all languages.
41
44
 
42
45
  ---
43
46
 
44
- ## How to add a language
47
+ ## How to add a language (V2 pattern — tree-sitter based)
45
48
 
46
- 1. Create `src/extractors/languages/yourlanguage.js`
47
- 2. Export a plugin object:
49
+ V2 uses tree-sitter for import and symbol extraction. Babel is only used for deep JS/TS route/model extraction on API handler files.
50
+
51
+ ### Step 1: Install the grammar
52
+
53
+ ```bash
54
+ npm install tree-sitter-yourlanguage --save-exact
55
+ ```
56
+
57
+ ### Step 2: Add grammar definition to `src/extractors/tree-sitter-parser.js`
58
+
59
+ Add an entry to the `GRAMMAR_DEFS` array:
48
60
 
49
61
  ```js
62
+ {
63
+ name: 'yourlanguage',
64
+ extensions: ['.ext'],
65
+ loadGrammar: () => require('tree-sitter-yourlanguage'),
66
+ importQuery: `
67
+ (import_statement source: (string) @src)
68
+ `,
69
+ symbolQuery: `
70
+ (function_declaration name: (identifier) @name)
71
+ (class_declaration name: (identifier) @name)
72
+ `,
73
+ },
74
+ ```
75
+
76
+ The queries use tree-sitter S-expression syntax. Run `node -e "const P = require('tree-sitter'); const L = require('tree-sitter-yourlanguage'); const p = new P(); p.setLanguage(L); console.log(p.parse('your code').rootNode.toString())"` to see the node types.
77
+
78
+ ### Step 3: Create `src/extractors/languages/yourlanguage.js`
79
+
80
+ ```js
81
+ 'use strict';
82
+
83
+ const tsParser = require('../tree-sitter-parser');
84
+
50
85
  module.exports = {
51
86
  name: 'yourlanguage',
52
87
  extensions: ['.ext'],
53
- extract(content, relPath) {
88
+ extract(content, filename) {
89
+ // Fast path: tree-sitter for imports + symbols (runs on ALL files)
90
+ const { imports: tsImports, symbols: tsSymbols } = tsParser.isAvailable()
91
+ ? tsParser.extractAll(content, '.ext')
92
+ : { imports: [], symbols: [] };
93
+
54
94
  return {
55
- routes: [{ method, path, functionName }],
56
- models: [{ className, fields: [{ name, type }], kind: 'yourlanguage' }],
57
- functions: [{ name, params, returnType }],
58
- envVars: ['VAR_NAME'],
59
- dbTables: [{ tableName, modelName }],
95
+ routes: extractRoutes(content), // framework-specific, regex
96
+ models: extractModels(content), // ORM/schema models, regex
97
+ functions: tsSymbols
98
+ .filter(s => s.kind === 'function')
99
+ .map(s => ({ name: s.name, params: '—', returnType: '—' })),
100
+ envVars: extractEnvVars(content), // env var references
101
+ dbTables: [],
60
102
  fetches: [],
61
103
  storageKeys: [],
62
- events: [{ type: 'listener'|'emitter', event: 'event.name' }],
63
- jobs: [{ type: 'cron'|'queue'|'interval', expression?: '* * * * *', name?: 'job-name' }],
104
+ _tsImports: tsImports, // raw import paths (for import graph)
105
+ _tsSymbols: tsSymbols, // all symbols (for get_file_summary)
64
106
  };
65
107
  }
66
108
  };
109
+
110
+ function extractRoutes(content) { return []; }
111
+ function extractModels(content) { return []; }
112
+ function extractEnvVars(content) { return []; }
67
113
  ```
68
114
 
69
- 3. The loader auto-discovers it no changes to `loader.js` needed
70
- 4. Test on at least 3 real open-source projects
71
- 5. Open a PR with before/after AGENTS.md examples
115
+ ### Step 4: Add import resolution to `src/extractors/imports.js`
72
116
 
73
- ---
117
+ If your language has resolvable local imports (not just package names), add a case in `extractImports()`:
74
118
 
75
- ## How to add a framework extractor
119
+ ```js
120
+ } else if (ext === '.ext') {
121
+ return extractYourLanguageImports(content, filePath, projectRoot);
122
+ }
123
+ ```
124
+
125
+ Then implement `extractYourLanguageImports()` at the bottom of the file. It should return an array of relative file paths (from project root) that actually exist on disk.
126
+
127
+ ### Step 5: Add to `CODE_EXTS` in `src/store/sync-v2.js`
128
+
129
+ ```js
130
+ const CODE_EXTS = new Set([
131
+ // ... existing ...
132
+ '.ext',
133
+ ]);
134
+ ```
135
+
136
+ ### Step 6: Add to `detectLanguage()` in `src/store/sync-v2.js`
137
+
138
+ ```js
139
+ '.ext': 'yourlanguage',
140
+ ```
76
141
 
77
- 1. Add detection to `src/detector/framework.js`
78
- 2. Add route/model patterns to the relevant language plugin or create a new extractor in `src/extractors/`
79
- 3. Test on at least 2 real projects using that framework
80
- 4. Open a PR with before/after AGENTS.md examples
142
+ ### Step 7: Test
143
+
144
+ ```bash
145
+ # Test extraction on a real file
146
+ node -e "
147
+ const { loadLanguagePlugins, getPluginForFile } = require('./src/extractors/loader');
148
+ const plugins = loadLanguagePlugins();
149
+ const plugin = getPluginForFile(plugins, 'test.ext');
150
+ const result = plugin.extract('your code here', 'test.ext');
151
+ console.log(JSON.stringify(result, null, 2));
152
+ "
153
+
154
+ # Run correctness tests
155
+ node test/correctness.js
156
+
157
+ # Run full test suite
158
+ npm test
159
+ ```
81
160
 
82
161
  ---
83
162
 
84
- ## How to add a domain keyword
163
+ ## How to add a framework extractor
164
+
165
+ Framework-specific extraction (routes, models) lives inside the language plugin. Add regex patterns to the relevant `extractRoutes()` or `extractModels()` function.
85
166
 
86
- Domain clustering lives in `src/agents/domains.js`. The `DEFAULT_DOMAIN_MAP` array maps keywords to domain names. If your framework creates a new domain category, add it:
167
+ Example adding Hono routes to the JS plugin:
87
168
 
88
169
  ```js
89
- { keywords: ['graphql', 'resolver', 'mutation'], domain: 'GRAPHQL' },
170
+ // In src/extractors/languages/javascript.js, inside extractExpressRoutes():
171
+
172
+ // Hono: app.get('/path', handler) — same pattern as Express, already covered
173
+ // Hono with chaining: app.route('/api', apiRouter) — add if needed
90
174
  ```
91
175
 
92
- ### Project-level custom domains
176
+ Test on at least 2 real open-source projects using the framework.
177
+
178
+ ---
179
+
180
+ ## How domain clustering works (V2)
181
+
182
+ Domain detection uses **Leiden+CPM graph clustering** (`src/agents/leiden.js`). Files that import each other heavily cluster together. Domain names are inferred from path tokens, with keyword hints for well-known patterns.
93
183
 
94
- For non-web repos (CLIs, desktop apps, compilers), users can define their own domains in `carto.config.json` at the project root without touching `domains.js`:
184
+ For non-SaaS repos, users can define custom domains in `carto.config.json`:
95
185
 
96
186
  ```json
97
187
  {
@@ -102,7 +192,7 @@ For non-web repos (CLIs, desktop apps, compilers), users can define their own do
102
192
  }
103
193
  ```
104
194
 
105
- Custom config overrides the default domain map entirely for that project.
195
+ The keyword seeds in `src/store/sync-v2.js` (the `keywordSeeds` object) can be extended for new domain types.
106
196
 
107
197
  ---
108
198
 
@@ -113,6 +203,7 @@ Custom config overrides the default domain map entirely for that project.
113
203
  - **Test on unknown repos.** Don't just test on projects you wrote. Find a real open-source repo using the framework and verify the output is correct.
114
204
  - **No cloud, no telemetry, no tracking.** Carto is local only. Forever. Don't add any network calls except the existing npm update check.
115
205
  - **No paid features.** Free forever. MIT. Don't propose monetization.
206
+ - **tree-sitter first.** For new languages, always use tree-sitter for imports and symbols. Only use regex for framework-specific patterns (routes, models) that tree-sitter queries can't easily express.
116
207
 
117
208
  ---
118
209
 
@@ -125,6 +216,8 @@ npm install
125
216
  node src/cli/index.js init # test in any project
126
217
  node src/cli/index.js serve # test MCP server
127
218
  npm test # run test suite (30 tests)
219
+ node test/correctness.js # run correctness tests (31 tests)
220
+ node test/benchmark.js # run benchmarks against real repos
128
221
  ```
129
222
 
130
223
  ---
@@ -132,20 +225,23 @@ npm test # run test suite (30 tests)
132
225
  ## PR checklist
133
226
 
134
227
  - [ ] Tested on at least 2-3 real open-source projects
135
- - [ ] Before/after AGENTS.md included in PR description
136
- - [ ] Plugin returns all fields including `events` and `jobs` (can be empty arrays)
228
+ - [ ] Before/after AGENTS.md or `get_architecture` output included in PR description
229
+ - [ ] Plugin uses tree-sitter for imports/symbols (not Babel or regex for the hot path)
230
+ - [ ] Plugin returns all fields including `_tsImports` and `_tsSymbols`
231
+ - [ ] Import resolution added to `src/extractors/imports.js` if language has local imports
232
+ - [ ] Extension added to `CODE_EXTS` and `detectLanguage()` in `sync-v2.js`
137
233
  - [ ] No changes to merger logic (unless explicitly fixing a merger bug)
138
234
  - [ ] No network calls added
139
- - [ ] `carto --version` still works
140
- - [ ] `npm test` passes
235
+ - [ ] `npm test` passes (30/30)
236
+ - [ ] `node test/correctness.js` passes (31/31)
141
237
 
142
238
  ---
143
239
 
144
240
  ## Issues
145
241
 
146
- - **Bug**: Open an issue with the project type, command run, and what AGENTS.md or domain files produced vs what you expected.
242
+ - **Bug**: Open an issue with the project type, command run, and what output was produced vs expected.
147
243
  - **Language request**: Open an issue titled "Language: [name]"
148
244
  - **Framework request**: Open an issue titled "Framework: [name]"
149
- - **Domain keyword**: Open an issue titled "Domain: [name]" if your codebase doesn't cluster correctly
245
+ - **Domain clustering issue**: Open an issue titled "Domains: [repo name]" with the repo URL and what domains were detected vs what you expected.
150
246
 
151
247
  All issues acknowledged within 24 hours.