npm - @optave/codegraph - Versions diffs - 2.1.0 → 2.1.1-dev.00f091c - Mend

@optave/codegraph 2.1.0 → 2.1.1-dev.00f091c

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +21 -20
package/package.json +5 -5
package/src/builder.js +238 -33
package/src/cli.js +20 -0
package/src/db.js +4 -0
package/src/extractors/csharp.js +6 -1
package/src/extractors/go.js +6 -1
package/src/extractors/java.js +4 -1
package/src/extractors/javascript.js +23 -5
package/src/extractors/php.js +8 -2
package/src/extractors/python.js +8 -1
package/src/extractors/ruby.js +4 -1
package/src/extractors/rust.js +12 -2
package/src/index.js +1 -0
package/src/journal.js +109 -0
package/src/mcp.js +45 -3
package/src/parser.js +1 -0
package/src/queries.js +396 -0
package/src/watcher.js +25 -0

package/README.md CHANGED Viewed

@@ -50,14 +50,13 @@ Most tools in this space can't do that:
 | **Heavy infrastructure that's slow to restart** | code-graph-rag (Memgraph), axon (KuzuDB), badger-graph (Dgraph) | External databases add latency to every write. Bulk-inserting a full graph into Memgraph is not a sub-second operation |
 | **No persistence between runs** | pyan, cflow | Re-parse from scratch every time. No database, no delta, no incremental anything |
-**Codegraph solves this with incremental builds:**
+**Codegraph solves this with three-tier incremental change detection:**
-1. Every file gets an MD5 hash stored in SQLite
-2. On rebuild, only files whose hash changed get re-parsed
-3. Stale nodes and edges for changed files are cleaned, then re-inserted
-4. Everything else is untouched
+1. **Tier 0 — Journal (O(changed)):** If `codegraph watch` was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files — zero filesystem scanning
+2. **Tier 1 — mtime+size (O(n) stats, O(changed) reads):** No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte — 10-100x cheaper than hashing
+3. **Tier 2 — Hash (O(changed) reads):** Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted
-**Result:** change one file in a 3,000-file project → rebuild completes in **under a second**. Put it in a commit hook, a file watcher, or let your AI agent trigger it. The graph is always current.
+**Result:** change one file in a 3,000-file project → rebuild completes in **under a second**. With watch mode active, rebuilds are near-instant — the journal makes the build proportional to the number of changed files, not the size of the codebase. Put it in a commit hook, a file watcher, or let your AI agent trigger it. The graph is always current.
 And because the core pipeline is pure local computation (tree-sitter + SQLite), there are no API calls, no network latency, and no cost. LLM-powered features (semantic search, richer embeddings) are a separate optional layer — they enhance the graph but never block it from being current.
@@ -80,7 +79,7 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
 | Git diff impact | **Yes** | — | — | — | — | **Yes** | — | **Yes** |
 | Watch mode | **Yes** | — | **Yes** | — | — | — | — | — |
 | Cycle detection | **Yes** | — | **Yes** | — | — | — | — | **Yes** |
-| Incremental rebuilds | **Yes** | — | **Yes** | — | — | — | — | — |
+| Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — | — | — |
 | Zero config | **Yes** | — | **Yes** | — | — | — | — | — |
 | Embeddable JS library (`npm install`) | **Yes** | — | — | — | — | — | — | — |
 | LLM-optional (works without API keys) | **Yes** | **Yes** | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
@@ -91,7 +90,7 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
 | | Differentiator | In practice |
 |---|---|---|
-| **⚡** | **Always-fresh graph** | Sub-second incremental rebuilds via file-hash tracking. Run on every commit, every save, in watch mode — the graph is never stale. Competitors re-index everything from scratch |
+| **⚡** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds even on large codebases. Competitors re-index everything from scratch; Merkle-tree approaches still require O(n) filesystem scanning |
 | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider for richer embeddings and AI-powered search — your code only goes to the provider you already chose |
 | **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
 | **🤖** | **Built for AI agents** | 13-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default, your code doesn't leak to other projects |
@@ -101,12 +100,12 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
 ### How other tools compare
-The key question is: **can you rebuild your graph on every commit in a large codebase without it costing money or taking minutes?** Most tools in this space either re-index everything from scratch (slow), require cloud API calls for core features (costly), or both. Codegraph's incremental builds keep the graph current in milliseconds — and the core pipeline needs no API keys at all. LLM-powered features are opt-in, using whichever provider you already work with.
+The key question is: **can you rebuild your graph on every commit in a large codebase without it costing money or taking minutes?** Most tools in this space either re-index everything from scratch (slow), require cloud API calls for core features (costly), or both. Codegraph's three-tier incremental detection achieves true O(changed) in the best case — when the watcher is running, rebuilds are proportional only to the number of files that changed, not the size of the codebase. The core pipeline needs no API keys at all. LLM-powered features are opt-in, using whichever provider you already work with.
 | Tool | What it does well | The tradeoff |
 |---|---|---|
 | [joern](https://github.com/joernio/joern) | Full CPG (AST + CFG + PDG) for vulnerability discovery, Scala query DSL, 14 languages, daily releases | No incremental builds — full re-parse on every change. Requires JDK 21, no built-in MCP, no watch mode |
-| [narsil-mcp](https://github.com/postrv/narsil-mcp) | 90 MCP tools, 32 languages, taint analysis, SBOM, dead code, neural search, Merkle-tree incremental indexing, single ~30MB binary | Primarily MCP-only — no standalone CLI query interface. Neural search requires API key or ONNX source build |
+| [narsil-mcp](https://github.com/postrv/narsil-mcp) | 90 MCP tools, 32 languages, taint analysis, SBOM, dead code, neural search, Merkle-tree incremental indexing, single ~30MB binary | Merkle trees still require O(n) filesystem scanning on every rebuild. Primarily MCP-only — no standalone CLI query interface. Neural search requires API key or ONNX source build |
 | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | Graph RAG with Memgraph, multi-provider AI, semantic search, code editing via AST | No incremental rebuilds — full re-index + re-embed through cloud APIs on every change. Requires Docker |
 | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | Formal Code Property Graph (AST + CFG + PDG + DFG), ~10 languages, MCP module, LLVM IR support, academic specifications | No incremental builds. Requires JVM + Gradle, no zero config, no watch mode |
 | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | Knowledge graph with precomputed structural intelligence, 7 MCP tools, hybrid search (BM25 + semantic + RRF), clustering, process tracing | Full 6-phase pipeline re-run on changes. KuzuDB graph DB, browser mode limited to ~5,000 files. **PolyForm NC — no commercial use** |
@@ -137,10 +136,11 @@ Here is a cold, analytical breakdown to help you decide which tool fits your wor
 | **Language Support** | 11 languages | 32 languages |
 | **Primary Interface** | CLI-first with MCP integration | MCP-first (CLI is secondary) |
 | **Supply Chain Risk** | Low (minimal dependency tree) | Higher (requires massive dependency graph for embedded ML/scanners) |
-| **Graph Updates** | Sub-second incremental (file-hash) | Parallel re-indexing / Merkle trees |
+| **Graph Updates** | **Three-tier O(changed)** — journal → mtime+size → hash. With watch mode, only changed files are touched | Merkle trees — O(n) filesystem scan on every rebuild to recompute tree hashes |
 #### Choose Codegraph if:
+* **You need the fastest possible incremental rebuilds.** Codegraph’s three-tier change detection (journal → mtime+size → hash) achieves true O(changed) when the watcher is running — only touched files are processed. Narsil’s Merkle trees still require O(n) filesystem scanning to recompute hashes on every rebuild, even when nothing changed. On a 3,000-file project, this is the difference between near-instant and noticeable.
 * **You want to optimize AI agent reasoning.** Large Language Models degrade in performance and hallucinate when overwhelmed with choices. Codegraph’s tight 13-tool surface area ensures agents quickly understand their capabilities without wasting context window tokens.
 * **You are concerned about supply chain attacks.** To support 90 tools, SBOMs, and neural embeddings, a tool must pull in a massive dependency tree. Codegraph keeps its dependencies minimal, dramatically reducing the risk of malicious code sneaking onto your machine.
 * **You want deterministic blast-radius checks.** Features like `diff-impact` are built specifically to tell you exactly how a changed function cascades through your codebase before you merge a PR.
@@ -265,10 +265,10 @@ A single trailing semicolon is ignored (falls back to single-query mode). The `-
 | Flag | Model | Dimensions | Size | License | Notes |
 |---|---|---|---|---|---|
-| `minilm` (default) | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
+| `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
 | `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
 | `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
-| `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
+| `jina-code` (default) | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
 | `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
 | `nomic-v1.5` | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | Improved nomic, Matryoshka dimensions |
 | `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |
@@ -376,15 +376,16 @@ Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]
 ## 📊 Performance
-Benchmarked on a ~3,200-file TypeScript project:
+Self-measured on every release via CI ([full history](generated/BENCHMARKS.md)):
-| Metric | Value |
+| Metric | Latest |
 |---|---|
-| Build time | ~30s |
-| Nodes | 19,000+ |
-| Edges | 120,000+ |
-| Query time | <100ms |
-| DB size | ~5 MB |
+| Build speed (native) | **2.5 ms/file** |
+| Build speed (WASM) | **5 ms/file** |
+| Query time | **1ms** |
+| ~50,000 files (est.) | **~125.0s build** |
+Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
 ## 🤖 AI Agent Integration

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@optave/codegraph",
-  "version": "2.1.0",
+  "version": "2.1.1-dev.00f091c",
   "description": "Local code graph CLI — parse codebases with tree-sitter, build dependency graphs, query them",
   "type": "module",
   "main": "src/index.js",
@@ -61,10 +61,10 @@
   "optionalDependencies": {
     "@huggingface/transformers": "^3.8.1",
     "@modelcontextprotocol/sdk": "^1.0.0",
-    "@optave/codegraph-darwin-arm64": "2.1.0",
-    "@optave/codegraph-darwin-x64": "2.1.0",
-    "@optave/codegraph-linux-x64-gnu": "2.1.0",
-    "@optave/codegraph-win32-x64-msvc": "2.1.0"
+    "@optave/codegraph-darwin-arm64": "2.1.1-dev.00f091c",
+    "@optave/codegraph-darwin-x64": "2.1.1-dev.00f091c",
+    "@optave/codegraph-linux-x64-gnu": "2.1.1-dev.00f091c",
+    "@optave/codegraph-win32-x64-msvc": "2.1.1-dev.00f091c"
   },
   "devDependencies": {
     "@biomejs/biome": "^2.4.4",

package/src/builder.js CHANGED Viewed

@@ -5,12 +5,44 @@ import path from 'node:path';
 import { loadConfig } from './config.js';
 import { EXTENSIONS, IGNORE_DIRS, normalizePath } from './constants.js';
 import { initSchema, openDb } from './db.js';
+import { readJournal, writeJournalHeader } from './journal.js';
 import { debug, warn } from './logger.js';
 import { getActiveEngine, parseFilesAuto } from './parser.js';
 import { computeConfidence, resolveImportPath, resolveImportsBatch } from './resolve.js';
 export { resolveImportPath } from './resolve.js';
+const BUILTIN_RECEIVERS = new Set([
+  'console',
+  'Math',
+  'JSON',
+  'Object',
+  'Array',
+  'String',
+  'Number',
+  'Boolean',
+  'Date',
+  'RegExp',
+  'Map',
+  'Set',
+  'WeakMap',
+  'WeakSet',
+  'Promise',
+  'Symbol',
+  'Error',
+  'TypeError',
+  'RangeError',
+  'Proxy',
+  'Reflect',
+  'Intl',
+  'globalThis',
+  'window',
+  'document',
+  'process',
+  'Buffer',
+  'require',
+]);
 export function collectFiles(dir, files = [], config = {}, directories = null) {
   const trackDirs = directories !== null;
   let entries;
@@ -81,8 +113,24 @@ function fileHash(content) {
   return createHash('md5').update(content).digest('hex');
 }
+/**
+ * Stat a file, returning { mtimeMs, size } or null on error.
+ */
+function fileStat(filePath) {
+  try {
+    const s = fs.statSync(filePath);
+    return { mtimeMs: s.mtimeMs, size: s.size };
+  } catch {
+    return null;
+  }
+}
 /**
  * Determine which files have changed since last build.
+ * Three-tier cascade:
+ *   Tier 0 — Journal: O(changed) when watcher was running
+ *   Tier 1 — mtime+size: O(n) stats, O(changed) reads
+ *   Tier 2 — Hash comparison: O(changed) reads (fallback from Tier 1)
  */
 function getChangedFiles(db, allFiles, rootDir) {
   // Check if file_hashes table exists
@@ -95,7 +143,6 @@ function getChangedFiles(db, allFiles, rootDir) {
   }
   if (!hasTable) {
-    // No hash table = first build, everything is new
     return {
       changed: allFiles.map((f) => ({ file: f })),
       removed: [],
@@ -105,36 +152,140 @@ function getChangedFiles(db, allFiles, rootDir) {
   const existing = new Map(
     db
-      .prepare('SELECT file, hash FROM file_hashes')
+      .prepare('SELECT file, hash, mtime, size FROM file_hashes')
       .all()
-      .map((r) => [r.file, r.hash]),
+      .map((r) => [r.file, r]),
   );
-  const changed = [];
+  // Build set of current files for removal detection
   const currentFiles = new Set();
+  for (const file of allFiles) {
+    currentFiles.add(normalizePath(path.relative(rootDir, file)));
+  }
+  const removed = [];
+  for (const existingFile of existing.keys()) {
+    if (!currentFiles.has(existingFile)) {
+      removed.push(existingFile);
+    }
+  }
+  // ── Tier 0: Journal ──────────────────────────────────────────────
+  const journal = readJournal(rootDir);
+  if (journal.valid) {
+    // Validate journal timestamp against DB — journal should be from after the last build
+    const dbMtimes = db.prepare('SELECT MAX(mtime) as latest FROM file_hashes').get();
+    const latestDbMtime = dbMtimes?.latest || 0;
+    // Empty journal = no watcher was running, fall to Tier 1 for safety
+    const hasJournalEntries = journal.changed.length > 0 || journal.removed.length > 0;
+    if (hasJournalEntries && journal.timestamp >= latestDbMtime) {
+      debug(
+        `Tier 0: journal valid, ${journal.changed.length} changed, ${journal.removed.length} removed`,
+      );
+      const changed = [];
+      for (const relPath of journal.changed) {
+        const absPath = path.join(rootDir, relPath);
+        const stat = fileStat(absPath);
+        if (!stat) continue;
+        let content;
+        try {
+          content = fs.readFileSync(absPath, 'utf-8');
+        } catch {
+          continue;
+        }
+        const hash = fileHash(content);
+        const record = existing.get(relPath);
+        if (!record || record.hash !== hash) {
+          changed.push({ file: absPath, content, hash, relPath, stat });
+        }
+      }
+      // Merge journal removals with filesystem removals (dedup)
+      const removedSet = new Set(removed);
+      for (const relPath of journal.removed) {
+        if (existing.has(relPath)) removedSet.add(relPath);
+      }
+      return { changed, removed: [...removedSet], isFullBuild: false };
+    }
+    debug(
+      `Tier 0: skipped (${hasJournalEntries ? 'timestamp stale' : 'no entries'}), falling to Tier 1`,
+    );
+  }
+  // ── Tier 1: mtime+size fast-path ─────────────────────────────────
+  const needsHash = []; // Files that failed mtime+size check
+  const skipped = []; // Files that passed mtime+size check
   for (const file of allFiles) {
     const relPath = normalizePath(path.relative(rootDir, file));
-    currentFiles.add(relPath);
+    const record = existing.get(relPath);
+    if (!record) {
+      // New file — needs full read+hash
+      needsHash.push({ file, relPath });
+      continue;
+    }
+    const stat = fileStat(file);
+    if (!stat) continue;
+    const storedMtime = record.mtime || 0;
+    const storedSize = record.size || 0;
+    // size > 0 guard: pre-v4 rows have size=0, always fall through to hash
+    if (storedSize > 0 && Math.floor(stat.mtimeMs) === storedMtime && stat.size === storedSize) {
+      skipped.push(relPath);
+      continue;
+    }
+    needsHash.push({ file, relPath, stat });
+  }
+  if (needsHash.length > 0) {
+    debug(`Tier 1: ${skipped.length} skipped by mtime+size, ${needsHash.length} need hash check`);
+  }
+  // ── Tier 2: Hash comparison ──────────────────────────────────────
+  const changed = [];
+  for (const item of needsHash) {
     let content;
     try {
-      content = fs.readFileSync(file, 'utf-8');
+      content = fs.readFileSync(item.file, 'utf-8');
     } catch {
       continue;
     }
     const hash = fileHash(content);
-    if (existing.get(relPath) !== hash) {
-      changed.push({ file, content, hash, relPath });
+    const stat = item.stat || fileStat(item.file);
+    const record = existing.get(item.relPath);
+    if (!record || record.hash !== hash) {
+      changed.push({ file: item.file, content, hash, relPath: item.relPath, stat });
+    } else if (stat) {
+      // Hash matches but mtime/size was stale — self-heal by updating stored metadata
+      changed.push({
+        file: item.file,
+        content,
+        hash,
+        relPath: item.relPath,
+        stat,
+        metadataOnly: true,
+      });
     }
   }
-  const removed = [];
-  for (const existingFile of existing.keys()) {
-    if (!currentFiles.has(existingFile)) {
-      removed.push(existingFile);
-    }
+  // Filter out metadata-only updates from the "changed" list for parsing,
+  // but keep them so the caller can update file_hashes
+  const parseChanged = changed.filter((c) => !c.metadataOnly);
+  if (needsHash.length > 0) {
+    debug(
+      `Tier 2: ${parseChanged.length} actually changed, ${changed.length - parseChanged.length} metadata-only`,
+    );
   }
   return { changed, removed, isFullBuild: false };
@@ -180,9 +331,33 @@ export async function buildGraph(rootDir, opts = {}) {
     ? getChangedFiles(db, files, rootDir)
     : { changed: files.map((f) => ({ file: f })), removed: [], isFullBuild: true };
-  if (!isFullBuild && changed.length === 0 && removed.length === 0) {
+  // Separate metadata-only updates (mtime/size self-heal) from real changes
+  const parseChanges = changed.filter((c) => !c.metadataOnly);
+  const metadataUpdates = changed.filter((c) => c.metadataOnly);
+  if (!isFullBuild && parseChanges.length === 0 && removed.length === 0) {
+    // Still update metadata for self-healing even when no real changes
+    if (metadataUpdates.length > 0) {
+      try {
+        const healHash = db.prepare(
+          'INSERT OR REPLACE INTO file_hashes (file, hash, mtime, size) VALUES (?, ?, ?, ?)',
+        );
+        const healTx = db.transaction(() => {
+          for (const item of metadataUpdates) {
+            const mtime = item.stat ? Math.floor(item.stat.mtimeMs) : 0;
+            const size = item.stat ? item.stat.size : 0;
+            healHash.run(item.relPath, item.hash, mtime, size);
+          }
+        });
+        healTx();
+        debug(`Self-healed mtime/size for ${metadataUpdates.length} files`);
+      } catch {
+        /* ignore heal errors */
+      }
+    }
     console.log('No changes detected. Graph is up to date.');
     db.close();
+    writeJournalHeader(rootDir, Date.now());
     return;
   }
@@ -191,7 +366,7 @@ export async function buildGraph(rootDir, opts = {}) {
       'PRAGMA foreign_keys = OFF; DELETE FROM node_metrics; DELETE FROM edges; DELETE FROM nodes; PRAGMA foreign_keys = ON;',
     );
   } else {
-    console.log(`Incremental: ${changed.length} changed, ${removed.length} removed`);
+    console.log(`Incremental: ${parseChanges.length} changed, ${removed.length} removed`);
     // Remove metrics/edges/nodes for changed and removed files
     const deleteNodesForFile = db.prepare('DELETE FROM nodes WHERE file = ?');
     const deleteEdgesForFile = db.prepare(`
@@ -206,7 +381,7 @@ export async function buildGraph(rootDir, opts = {}) {
       deleteMetricsForFile.run(relPath);
       deleteNodesForFile.run(relPath);
     }
-    for (const item of changed) {
+    for (const item of parseChanges) {
       const relPath = item.relPath || normalizePath(path.relative(rootDir, item.file));
       deleteEdgesForFile.run({ f: relPath });
       deleteMetricsForFile.run(relPath);
@@ -224,11 +399,11 @@ export async function buildGraph(rootDir, opts = {}) {
     'INSERT INTO edges (source_id, target_id, kind, confidence, dynamic) VALUES (?, ?, ?, ?, ?)',
   );
-  // Prepare hash upsert
+  // Prepare hash upsert (with size column from migration v4)
   let upsertHash;
   try {
     upsertHash = db.prepare(
-      'INSERT OR REPLACE INTO file_hashes (file, hash, mtime) VALUES (?, ?, ?)',
+      'INSERT OR REPLACE INTO file_hashes (file, hash, mtime, size) VALUES (?, ?, ?, ?)',
     );
   } catch {
     upsertHash = null;
@@ -246,17 +421,17 @@ export async function buildGraph(rootDir, opts = {}) {
     // We'll fill these in during the parse pass + edge pass
   }
-  const filesToParse = isFullBuild ? files.map((f) => ({ file: f })) : changed;
+  const filesToParse = isFullBuild ? files.map((f) => ({ file: f })) : parseChanges;
   // ── Unified parse via parseFilesAuto ───────────────────────────────
   const filePaths = filesToParse.map((item) => item.file);
   const allSymbols = await parseFilesAuto(filePaths, rootDir, engineOpts);
-  // Build a hash lookup from incremental data (changed items may carry pre-computed hashes)
-  const precomputedHashes = new Map();
+  // Build a lookup from incremental data (changed items may carry pre-computed hashes + stats)
+  const precomputedData = new Map();
   for (const item of filesToParse) {
-    if (item.hash && item.relPath) {
-      precomputedHashes.set(item.relPath, item.hash);
+    if (item.relPath) {
+      precomputedData.set(item.relPath, item);
     }
   }
@@ -272,11 +447,14 @@ export async function buildGraph(rootDir, opts = {}) {
         insertNode.run(exp.name, exp.kind, relPath, exp.line, null);
       }
-      // Update file hash for incremental builds
+      // Update file hash with real mtime+size for incremental builds
       if (upsertHash) {
-        const existingHash = precomputedHashes.get(relPath);
-        if (existingHash) {
-          upsertHash.run(relPath, existingHash, Date.now());
+        const precomputed = precomputedData.get(relPath);
+        if (precomputed?.hash) {
+          const stat = precomputed.stat || fileStat(path.join(rootDir, relPath));
+          const mtime = stat ? Math.floor(stat.mtimeMs) : 0;
+          const size = stat ? stat.size : 0;
+          upsertHash.run(relPath, precomputed.hash, mtime, size);
         } else {
           const absPath = path.join(rootDir, relPath);
           let code;
@@ -286,11 +464,23 @@ export async function buildGraph(rootDir, opts = {}) {
             code = null;
           }
           if (code !== null) {
-            upsertHash.run(relPath, fileHash(code), Date.now());
+            const stat = fileStat(absPath);
+            const mtime = stat ? Math.floor(stat.mtimeMs) : 0;
+            const size = stat ? stat.size : 0;
+            upsertHash.run(relPath, fileHash(code), mtime, size);
           }
         }
       }
     }
+    // Also update metadata-only entries (self-heal mtime/size without re-parse)
+    if (upsertHash) {
+      for (const item of metadataUpdates) {
+        const mtime = item.stat ? Math.floor(item.stat.mtimeMs) : 0;
+        const size = item.stat ? item.stat.size : 0;
+        upsertHash.run(item.relPath, item.hash, mtime, size);
+      }
+    }
   });
   insertAll();
@@ -458,7 +648,9 @@ export async function buildGraph(rootDir, opts = {}) {
       }
       // Call edges with confidence scoring — using pre-loaded lookup maps (N+1 fix)
+      const seenCallEdges = new Set();
       for (const call of symbols.calls) {
+        if (call.receiver && BUILTIN_RECEIVERS.has(call.receiver)) continue;
         let caller = null;
         for (const def of symbols.definitions) {
           if (def.line <= call.line) {
@@ -493,10 +685,18 @@ export async function buildGraph(rootDir, opts = {}) {
             );
             if (methodCandidates.length > 0) {
               targets = methodCandidates;
-            } else {
-              // Global fallback
-              targets = nodesByName.get(call.name) || [];
+            } else if (
+              !call.receiver ||
+              call.receiver === 'this' ||
+              call.receiver === 'self' ||
+              call.receiver === 'super'
+            ) {
+              // Scoped fallback — same-dir or parent-dir only, not global
+              targets = (nodesByName.get(call.name) || []).filter(
+                (n) => computeConfidence(relPath, n.file, null) >= 0.5,
+              );
             }
+            // else: method call on a receiver — skip global fallback entirely
           }
         }
@@ -509,7 +709,9 @@ export async function buildGraph(rootDir, opts = {}) {
         }
         for (const t of targets) {
-          if (t.id !== caller.id) {
+          const edgeKey = `${caller.id}|${t.id}`;
+          if (t.id !== caller.id && !seenCallEdges.has(edgeKey)) {
+            seenCallEdges.add(edgeKey);
             const confidence = computeConfidence(relPath, t.file, importedFrom);
             insertEdge.run(caller.id, t.id, 'calls', confidence, isDynamic);
             edgeCount++;
@@ -582,6 +784,9 @@ export async function buildGraph(rootDir, opts = {}) {
   console.log(`Stored in ${dbPath}`);
   db.close();
+  // Write journal header after successful build
+  writeJournalHeader(rootDir, Date.now());
   if (!opts.skipRegistry) {
     const tmpDir = path.resolve(os.tmpdir());
     const resolvedRoot = path.resolve(rootDir);

package/src/cli.js CHANGED Viewed

@@ -11,6 +11,7 @@ import { buildEmbeddings, MODELS, search } from './embedder.js';
 import { exportDOT, exportJSON, exportMermaid } from './export.js';
 import { setVerbose } from './logger.js';
 import {
+  context,
   diffImpact,
   fileDeps,
   fnDeps,
@@ -130,6 +131,25 @@ program
     });
   });
+program
+  .command('context <name>')
+  .description('Full context for a function: source, deps, callers, tests, signature')
+  .option('-d, --db <path>', 'Path to graph.db')
+  .option('--depth <n>', 'Include callee source up to N levels deep', '0')
+  .option('--no-source', 'Metadata only (skip source extraction)')
+  .option('--include-tests', 'Include test source code')
+  .option('-T, --no-tests', 'Exclude test files from callers')
+  .option('-j, --json', 'Output as JSON')
+  .action((name, opts) => {
+    context(name, opts.db, {
+      depth: parseInt(opts.depth, 10),
+      noSource: !opts.source,
+      noTests: !opts.tests,
+      includeTests: opts.includeTests,
+      json: opts.json,
+    });
+  });
 program
   .command('diff-impact [ref]')
   .description('Show impact of git changes (unstaged, staged, or vs a ref)')

package/src/db.js CHANGED Viewed

@@ -67,6 +67,10 @@ export const MIGRATIONS = [
       );
     `,
   },
+  {
+    version: 4,
+    up: `ALTER TABLE file_hashes ADD COLUMN size INTEGER DEFAULT 0;`,
+  },
 ];
 export function openDb(dbPath) {

package/src/extractors/csharp.js CHANGED Viewed

@@ -186,7 +186,12 @@ export function extractCSharpSymbols(tree, _filePath) {
             calls.push({ name: fn.text, line: node.startPosition.row + 1 });
           } else if (fn.type === 'member_access_expression') {
             const name = fn.childForFieldName('name');
-            if (name) calls.push({ name: name.text, line: node.startPosition.row + 1 });
+            if (name) {
+              const expr = fn.childForFieldName('expression');
+              const call = { name: name.text, line: node.startPosition.row + 1 };
+              if (expr) call.receiver = expr.text;
+              calls.push(call);
+            }
           } else if (fn.type === 'generic_name' || fn.type === 'member_binding_expression') {
             const name = fn.childForFieldName('name') || fn.child(0);
             if (name) calls.push({ name: name.text, line: node.startPosition.row + 1 });

package/src/extractors/go.js CHANGED Viewed

@@ -152,7 +152,12 @@ export function extractGoSymbols(tree, _filePath) {
             calls.push({ name: fn.text, line: node.startPosition.row + 1 });
           } else if (fn.type === 'selector_expression') {
             const field = fn.childForFieldName('field');
-            if (field) calls.push({ name: field.text, line: node.startPosition.row + 1 });
+            if (field) {
+              const operand = fn.childForFieldName('operand');
+              const call = { name: field.text, line: node.startPosition.row + 1 };
+              if (operand) call.receiver = operand.text;
+              calls.push(call);
+            }
           }
         }
         break;

package/src/extractors/java.js CHANGED Viewed

@@ -203,7 +203,10 @@ export function extractJavaSymbols(tree, _filePath) {
       case 'method_invocation': {
         const nameNode = node.childForFieldName('name');
         if (nameNode) {
-          calls.push({ name: nameNode.text, line: node.startPosition.row + 1 });
+          const obj = node.childForFieldName('object');
+          const call = { name: nameNode.text, line: node.startPosition.row + 1 };
+          if (obj) call.receiver = obj.text;
+          calls.push(call);
         }
         break;
       }