PyPI - coderay - Versions diffs - 1.1.0__tar.gz → 1.2.0__tar.gz - Mend

coderay 1.1.0tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (146) hide show

{coderay-1.1.0/src/coderay.egg-info → coderay-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderay
-Version: 1.1.0
+Version: 1.2.0
 Summary: X-ray your codebase — semantic search, code graphs, file skeletons, and MCP server
 Author-email: Bogdan Copocean <bogdancopocean@gmail.com>
 License-Expression: MIT
@@ -41,6 +41,7 @@ Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0; extra == "dev"
 Requires-Dist: ruff>=0.8.0; extra == "dev"
 Requires-Dist: mypy>=1.0.0; extra == "dev"
+Requires-Dist: tiktoken>=0.5.0; extra == "dev"
 Provides-Extra: maintain
 Requires-Dist: pylance>=0.15.0; extra == "maintain"
 Provides-Extra: mlx
@@ -56,38 +57,71 @@ Dynamic: license-file
 [![License](https://img.shields.io/github/license/bogdan-copocean/coderay)](LICENSE)
 [![CI](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml/badge.svg)](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml)
-**CodeRay** ships a **local code index** with **semantic search**, **file skeletons** (signatures and docstrings, no bodies), and **blast radius** (callers, imports, inheritance) — plus an **MCP stdio server** so agents can use the same tools. Ask *by meaning*, skim **API shape**, trace **who calls what**, then read implementation when it matters: fewer tokens, less noise, answers anchored to the right files.
+**CodeRay** builds a **local code index** that gives AI agents a smarter way to explore a codebase — reading only what they need, not whole files.
-**No LLM inside CodeRay, no network, no API key – it runs on your machine.**
+**Runs locally. No LLM. No network. No API key.**
+## The problem
-## Tools
+AI agents exploring a codebase default to reading whole files – even when one function is all that's needed. Every unnecessary line **burns tokens** and **floods the context window**: driving up API costs and noise with every read.
+The root cause is simple: agents know the **file paths** but no finer location. Without knowing *where* in a file something lives, they have no choice but to read everything.
+**CodeRay fixes this.** Every tool returns **file paths with exact line ranges** — so agents locate first, then read only the lines that matter.
+## How it works
+CodeRay exposes three primitives, each returning **paths + line ranges**:
+| Tool | Question it answers | What agents get |
+|------|---------------------|-----------------|
+| **search** | *Where is the code that does X?* | Relevant chunks with file paths and line ranges |
+| **skeleton** | *What's the shape of this file?* | Signatures + docstrings only, each tagged with its line range |
+| **impact** | *What breaks if I change this?* | Callers, imports, and inheritors — located by line range |
+### The two-phase flow
-**CodeRay sits next to ripgrep, not instead of it.** Ripgrep when you know the string or symbol; search, skeleton, and impact when you care about *intent*, *structure*, or *dependencies*—then open the file when you need real implementation detail.
+1. **Locate** — run `search`, `skeleton`, or `impact` to find what's needed. Every result includes a file path and a symbol-level line range.
+2. **Read precisely** — use those line ranges to load only the relevant snippet. Skip the rest.
-Semantic search is retrieval, not proof: hits can miss or rank oddly. Treat them as candidates, confirm with a skeleton or read, and keep the index fresh with `coderay watch` or `coderay build` when things drift.
+This keeps context windows lean and agent reasoning focused. CodeRay is not a replacement for `grep` — it fills the gap when exact names are unknown or a map is needed before reading.
+### Token savings (tiktoken, `cl100k_base`)
+| File | Lines | Full read | Skeleton | Savings | % reduction |
+|------|-------|-----------|----------|---------|-------------|
+| `src/coderay/graph/impact.py` | 249 | 2,333 | 693 | **3.4×** | **70%** |
+| `src/coderay/cli/commands.py` | 584 | 4,327 | 1,906 | **2.3×** | **56%** |
+| `src/coderay/pipeline/indexer.py` | 408 | 3,065 | 1,433 | **2.1×** | **53%** |
+| Query | Search hit tokens | vs full `indexer.py` read |
+|-------|-------------------|---------------------------|
+| "how are files re-indexed on change" | 479 | **~6x cheaper** |
+## Tools
-Skeleton shows API shape and docstrings, not every branch. Use **search** and **impact** to narrow where to look, then read the file (or spans) when you need control flow or line-accurate edits. CodeRay trims noise on those round trips; it does not forbid them.
+### Semantic search
-**Semantic search** — “How/where” by meaning.
+Agents search by **meaning**, not by name — useful when the exact function or class is unknown. Results return **file paths with line ranges** pointing at relevant chunks. Treat them as candidates: confirm with `skeleton` or a ranged read before acting. Keep the index fresh with `coderay watch` or `coderay build` when the tree drifts.
 <img src="assets/coderay-search.gif" alt="coderay search demo" width="100%" />
 ### Blast radius
-Callers and dependents (calls, imports, inheritance).
+Shows **callers, imports, and inheritance** for a symbol before it changes. Each result is tied to a file path and line range — combine with `skeleton` or ranged reads on those locations when bodies are needed.
 <img src="assets/coderay-impact.gif" alt="coderay impact demo" width="100%" />
 ### Skeleton
-Signatures and docstrings only; API surface without bodies.
+Returns **signatures and docstrings only** — no function bodies. Every block is tagged with its path and line range so subsequent reads can be scoped to exactly those lines. A full file read should happen only when the skeleton isn't enough.
 <img src="assets/coderay-skeleton.gif" alt="coderay skeleton demo" width="100%" />
 ### Full read
-Same file as skeleton: raw source costs more tokens.
+**Same file, raw source — for comparison:**
 <img src="assets/coderay-fullread.gif" alt="same file, raw source head" width="100%" />
@@ -102,7 +136,7 @@ Same file as skeleton: raw source costs more tokens.
 ## MCP
-Same tools as above, exposed to the agent so it can search, sketch structure, and trace impact instead of vacuuming whole files by default. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For choosing tools versus a plain read, see [AGENTS.md](AGENTS.md).
+Same three tools over MCP: search, skeleton (paths and line ranges), and impact—so **AI agents** can **narrow context** before full-file reads. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For tool choice versus a plain read, see [AGENTS.md](AGENTS.md).
 ```bash
 which coderay-mcp
@@ -123,32 +157,6 @@ which coderay-mcp
 `CODERAY_REPO_ROOT` must be the directory that contains `.coderay.toml`. More detail: [`mcp_server/README.md`](src/coderay/mcp_server/README.md).
-## Why this matters
-Noisy context windows make models confident about the wrong code. CodeRay front-loads **intent** (search), **shape** (skeleton), and **dependencies** (impact) so the expensive read happens after you have a map—not instead of ever reading implementation when control flow matters.
-### Token savings (tiktoken, `cl100k_base`)
-Measured on this repo after a full index.
-| File                               | Lines | Full read | Skeleton | Savings  |
-| ---------------------------------- | ----- | --------- | -------- | -------- |
-| `src/coderay/pipeline/indexer.py`  | 400   | 3,024     | 757      | **4.0x** |
-| `src/coderay/graph/code_graph.py`  | 500   | 4,261     | 1,022    | **4.2x** |
-| `src/coderay/mcp_server/server.py` | 316   | 2,268     | 1,313    | **1.7x** |
-| Query                                | Search hit tokens | vs full `indexer.py` read |
-| ------------------------------------ | ----------------- | ------------------------- |
-| "how are files re-indexed on change" | 479               | **~6x cheaper**           |
-*Not guarantees — model, chunks, and files affect counts.*
----
 ## Features
 - **Languages** — Python, JavaScript, and TypeScript — [`parsing/README.md`](src/coderay/parsing/README.md)
@@ -157,7 +165,6 @@ Measured on this repo after a full index.
 - **Embeddings** — fastembed (CPU) or MLX on Apple Silicon — [`embedding/README.md`](src/coderay/embedding/README.md)
 - **Watch** — incremental re-index; `.coderay.toml` is the source of truth for what’s indexed
----
 ## Install

{coderay-1.1.0 → coderay-1.2.0}/README.md RENAMED Viewed

@@ -4,38 +4,71 @@
 [![License](https://img.shields.io/github/license/bogdan-copocean/coderay)](LICENSE)
 [![CI](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml/badge.svg)](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml)
-**CodeRay** ships a **local code index** with **semantic search**, **file skeletons** (signatures and docstrings, no bodies), and **blast radius** (callers, imports, inheritance) — plus an **MCP stdio server** so agents can use the same tools. Ask *by meaning*, skim **API shape**, trace **who calls what**, then read implementation when it matters: fewer tokens, less noise, answers anchored to the right files.
+**CodeRay** builds a **local code index** that gives AI agents a smarter way to explore a codebase — reading only what they need, not whole files.
-**No LLM inside CodeRay, no network, no API key – it runs on your machine.**
+**Runs locally. No LLM. No network. No API key.**
+## The problem
-## Tools
+AI agents exploring a codebase default to reading whole files – even when one function is all that's needed. Every unnecessary line **burns tokens** and **floods the context window**: driving up API costs and noise with every read.
+The root cause is simple: agents know the **file paths** but no finer location. Without knowing *where* in a file something lives, they have no choice but to read everything.
+**CodeRay fixes this.** Every tool returns **file paths with exact line ranges** — so agents locate first, then read only the lines that matter.
+## How it works
+CodeRay exposes three primitives, each returning **paths + line ranges**:
+| Tool | Question it answers | What agents get |
+|------|---------------------|-----------------|
+| **search** | *Where is the code that does X?* | Relevant chunks with file paths and line ranges |
+| **skeleton** | *What's the shape of this file?* | Signatures + docstrings only, each tagged with its line range |
+| **impact** | *What breaks if I change this?* | Callers, imports, and inheritors — located by line range |
+### The two-phase flow
-**CodeRay sits next to ripgrep, not instead of it.** Ripgrep when you know the string or symbol; search, skeleton, and impact when you care about *intent*, *structure*, or *dependencies*—then open the file when you need real implementation detail.
+1. **Locate** — run `search`, `skeleton`, or `impact` to find what's needed. Every result includes a file path and a symbol-level line range.
+2. **Read precisely** — use those line ranges to load only the relevant snippet. Skip the rest.
-Semantic search is retrieval, not proof: hits can miss or rank oddly. Treat them as candidates, confirm with a skeleton or read, and keep the index fresh with `coderay watch` or `coderay build` when things drift.
+This keeps context windows lean and agent reasoning focused. CodeRay is not a replacement for `grep` — it fills the gap when exact names are unknown or a map is needed before reading.
+### Token savings (tiktoken, `cl100k_base`)
+| File | Lines | Full read | Skeleton | Savings | % reduction |
+|------|-------|-----------|----------|---------|-------------|
+| `src/coderay/graph/impact.py` | 249 | 2,333 | 693 | **3.4×** | **70%** |
+| `src/coderay/cli/commands.py` | 584 | 4,327 | 1,906 | **2.3×** | **56%** |
+| `src/coderay/pipeline/indexer.py` | 408 | 3,065 | 1,433 | **2.1×** | **53%** |
+| Query | Search hit tokens | vs full `indexer.py` read |
+|-------|-------------------|---------------------------|
+| "how are files re-indexed on change" | 479 | **~6x cheaper** |
+## Tools
-Skeleton shows API shape and docstrings, not every branch. Use **search** and **impact** to narrow where to look, then read the file (or spans) when you need control flow or line-accurate edits. CodeRay trims noise on those round trips; it does not forbid them.
+### Semantic search
-**Semantic search** — “How/where” by meaning.
+Agents search by **meaning**, not by name — useful when the exact function or class is unknown. Results return **file paths with line ranges** pointing at relevant chunks. Treat them as candidates: confirm with `skeleton` or a ranged read before acting. Keep the index fresh with `coderay watch` or `coderay build` when the tree drifts.
 <img src="assets/coderay-search.gif" alt="coderay search demo" width="100%" />
 ### Blast radius
-Callers and dependents (calls, imports, inheritance).
+Shows **callers, imports, and inheritance** for a symbol before it changes. Each result is tied to a file path and line range — combine with `skeleton` or ranged reads on those locations when bodies are needed.
 <img src="assets/coderay-impact.gif" alt="coderay impact demo" width="100%" />
 ### Skeleton
-Signatures and docstrings only; API surface without bodies.
+Returns **signatures and docstrings only** — no function bodies. Every block is tagged with its path and line range so subsequent reads can be scoped to exactly those lines. A full file read should happen only when the skeleton isn't enough.
 <img src="assets/coderay-skeleton.gif" alt="coderay skeleton demo" width="100%" />
 ### Full read
-Same file as skeleton: raw source costs more tokens.
+**Same file, raw source — for comparison:**
 <img src="assets/coderay-fullread.gif" alt="same file, raw source head" width="100%" />
@@ -50,7 +83,7 @@ Same file as skeleton: raw source costs more tokens.
 ## MCP
-Same tools as above, exposed to the agent so it can search, sketch structure, and trace impact instead of vacuuming whole files by default. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For choosing tools versus a plain read, see [AGENTS.md](AGENTS.md).
+Same three tools over MCP: search, skeleton (paths and line ranges), and impact—so **AI agents** can **narrow context** before full-file reads. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For tool choice versus a plain read, see [AGENTS.md](AGENTS.md).
 ```bash
 which coderay-mcp
@@ -71,32 +104,6 @@ which coderay-mcp
 `CODERAY_REPO_ROOT` must be the directory that contains `.coderay.toml`. More detail: [`mcp_server/README.md`](src/coderay/mcp_server/README.md).
-## Why this matters
-Noisy context windows make models confident about the wrong code. CodeRay front-loads **intent** (search), **shape** (skeleton), and **dependencies** (impact) so the expensive read happens after you have a map—not instead of ever reading implementation when control flow matters.
-### Token savings (tiktoken, `cl100k_base`)
-Measured on this repo after a full index.
-| File                               | Lines | Full read | Skeleton | Savings  |
-| ---------------------------------- | ----- | --------- | -------- | -------- |
-| `src/coderay/pipeline/indexer.py`  | 400   | 3,024     | 757      | **4.0x** |
-| `src/coderay/graph/code_graph.py`  | 500   | 4,261     | 1,022    | **4.2x** |
-| `src/coderay/mcp_server/server.py` | 316   | 2,268     | 1,313    | **1.7x** |
-| Query                                | Search hit tokens | vs full `indexer.py` read |
-| ------------------------------------ | ----------------- | ------------------------- |
-| "how are files re-indexed on change" | 479               | **~6x cheaper**           |
-*Not guarantees — model, chunks, and files affect counts.*
----
 ## Features
 - **Languages** — Python, JavaScript, and TypeScript — [`parsing/README.md`](src/coderay/parsing/README.md)
@@ -105,7 +112,6 @@ Measured on this repo after a full index.
 - **Embeddings** — fastembed (CPU) or MLX on Apple Silicon — [`embedding/README.md`](src/coderay/embedding/README.md)
 - **Watch** — incremental re-index; `.coderay.toml` is the source of truth for what’s indexed
----
 ## Install

{coderay-1.1.0 → coderay-1.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "coderay"
-version = "1.1.0"
+version = "1.2.0"
 description = "X-ray your codebase — semantic search, code graphs, file skeletons, and MCP server"
 readme = "README.md"
 license = "MIT"
@@ -53,6 +53,7 @@ dev = [
     "pytest-cov>=4.0",
     "ruff>=0.8.0",
     "mypy>=1.0.0",
+    "tiktoken>=0.5.0",
 ]
 maintain = [
     "pylance>=0.15.0",

coderay-1.2.0/src/coderay/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "1.2.0"

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/cli/commands.py RENAMED Viewed

@@ -4,6 +4,7 @@ import logging
 import os
 import sys
 import time
+import warnings
 from pathlib import Path
 import click
@@ -51,6 +52,12 @@ def _setup_logging(verbose: bool = False) -> None:
     ):
         logging.getLogger(name).setLevel(logging.WARNING)
     os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
+    warnings.filterwarnings(
+        "ignore",
+        message="Cannot enable progress bars: environment variable",
+        category=UserWarning,
+        module="huggingface_hub.utils.tqdm",
+    )
 def _set_repo_root(repo_root: Path) -> None:
@@ -333,7 +340,7 @@ def maintain(ctx: click.Context) -> None:
 @cli.command()
-@click.argument("file_path", type=click.Path(exists=True, path_type=Path))
+@click.argument("file_path", type=str)
 @click.option(
     "--include-imports",
     is_flag=True,
@@ -346,17 +353,53 @@ def maintain(ctx: click.Context) -> None:
     default=None,
     help="Filter to a specific class or top-level function by name.",
 )
+@click.option(
+    "--lines",
+    "line_range",
+    default=None,
+    metavar="START-END",
+    help=(
+        "File line range (1-based inclusive); keep only symbols fully within this span."
+        " Do not combine with a :START-END suffix on FILE_PATH (same meaning)."
+    ),
+)
 def skeleton(
-    file_path: Path,
+    file_path: str,
     include_imports: bool,
     symbol: str | None,
+    line_range: str | None,
 ) -> None:
     """Print signatures without bodies (cheaper than reading the full file)."""
     from coderay.skeleton.extractor import extract_skeleton
+    from coderay.skeleton.path_range import (
+        parse_file_line_range,
+        parse_skeleton_file_arg,
+    )
-    content = file_path.read_text(encoding="utf-8", errors="replace")
+    try:
+        path_str, rng_from_path = parse_skeleton_file_arg(file_path, parse_suffix=True)
+    except ValueError as e:
+        raise click.BadParameter(str(e)) from e
+    file_line_range = rng_from_path
+    if line_range:
+        if file_line_range is not None:
+            raise click.UsageError(
+                "Use either a path ending with :START-END or --lines, not both."
+            )
+        try:
+            file_line_range = parse_file_line_range(line_range)
+        except ValueError as e:
+            raise click.BadParameter(str(e), param_hint="--lines") from e
+    resolved = Path(path_str)
+    if not resolved.is_file():
+        raise click.BadParameter(f"not a file: {path_str}", param_hint="file_path")
+    content = resolved.read_text(encoding="utf-8", errors="replace")
     out = extract_skeleton(
-        file_path, content, include_imports=include_imports, symbol=symbol
+        resolved,
+        content,
+        include_imports=include_imports,
+        symbol=symbol,
+        line_range=file_line_range,
     )
     click.echo(out)

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/core/timing.py RENAMED Viewed

@@ -30,7 +30,7 @@ def timed(phase: str) -> Callable[[F], F]:
 class TimedPhase:
-    """Context manager: measure block execution time."""
+    """Context manager: measure block execution time; log completion at DEBUG."""
     def __init__(self, phase: str, *, log: bool = True) -> None:
         self.phase = phase
@@ -42,10 +42,15 @@ class TimedPhase:
         self.t0 = time.perf_counter()
         return self
+    def elapsed_so_far(self) -> float:
+        """Return seconds since __enter__ (before __exit__)."""
+        return time.perf_counter() - self.t0
     def __exit__(self, *args: object) -> None:
         self.elapsed = time.perf_counter() - self.t0
         if self.log:
-            logger.info("%s: %.3fs", self.phase, self.elapsed)
+            logger.debug("%s: %.3fs", self.phase, self.elapsed)
 timed_phase = TimedPhase  # Convenience alias for context manager usage

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/embedding/base.py RENAMED Viewed

@@ -48,13 +48,13 @@ def load_embedder_from_config() -> Embedder:
     config = get_config()
     ed = config.embedder
     backend = resolved_embedder_backend(ed.backend)
-    if (ed.backend or "auto").strip().lower() == "auto":
-        logger.info("embedder.backend=auto -> %s", backend)
     if backend == "mlx" and not mlx_optional_installed():
         raise RuntimeError(
             "embedder.backend is 'mlx' but MLX is not installed. "
             "On Apple Silicon: pip install 'coderay[mlx]'"
         )
+    model_name = ed.mlx.model_name if backend == "mlx" else ed.fastembed.model_name
+    logger.info("embedder.backend=%s model=%s", backend, model_name)
     if backend == "mlx":
         mx = ed.mlx
         return MLXEmbedder(

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/embedding/local.py RENAMED Viewed

@@ -5,6 +5,7 @@ from typing import Any
 from onnxruntime.capi.onnxruntime_pybind11_state import NoSuchFile
+from coderay.core.timing import timed_phase
 from coderay.embedding.base import Embedder, EmbedTask
 from coderay.embedding.prefixes import SEARCH_PREFIXES, requires_prefix
@@ -38,15 +39,14 @@ class LocalEmbedder(Embedder):
             return TextEmbedding(model_name=name, local_files_only=local_only)
         try:
-            logger.info("Loading model %s from cache...", self._model_name)
             self._model = _open(name=self._model_name, local_only=True)
-            logger.info("Model %s loaded from cache.", self._model_name)
         except (NoSuchFile, ValueError) as e:
             if isinstance(e, ValueError) and "Could not load model" not in str(e):
                 raise
-            logger.info("Downloading model %s (one-time)...", self._model_name)
             self._model = _open(name=self._model_name, local_only=False)
-            logger.info("Model %s downloaded and ready.", self._model_name)
+            logger.info("Model %s ready (downloaded).", self._model_name)
+        else:
+            logger.info("Model %s ready (cache).", self._model_name)
     def _apply_prefix(self, texts: list[str], task: EmbedTask) -> list[str]:
         if not requires_prefix(self._model_name):
@@ -66,9 +66,21 @@ class LocalEmbedder(Embedder):
             self._load_model()
         prefixed = self._apply_prefix(texts, task)
-        logger.info("Embedding %d chunks (task=%s)...", len(prefixed), task.value)
-        embeddings = list(self._model.embed(prefixed, batch_size=self._batch_size))
+        n = len(prefixed)
+        logger.info("Embedding %d chunks (task=%s)...", n, task.value)
+        raw: list[Any] = []
+        bs = self._batch_size
+        with timed_phase("embedding", log=False) as tp:
+            for i in range(0, n, bs):
+                sub = prefixed[i : i + bs]
+                part = list(self._model.embed(sub, batch_size=self._batch_size))
+                raw.extend(part)
+                logger.info("Embedded %d/%d chunks", min(i + len(sub), n), n)
+        logger.info(
+            "Embedding complete: %d chunks in %.2fs",
+            n,
+            tp.elapsed,
+        )
         if self._matryoshka_dimensions is not None:
-            return [e.tolist()[: self._matryoshka_dimensions] for e in embeddings]
-        return [e.tolist() for e in embeddings]
+            return [e.tolist()[: self._matryoshka_dimensions] for e in raw]
+        return [e.tolist() for e in raw]

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/embedding/mlx_backend.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import logging
+from coderay.core.timing import timed_phase
 from coderay.embedding.base import Embedder, EmbedTask
 from coderay.embedding.prefixes import SEARCH_PREFIXES, requires_prefix
@@ -44,6 +45,7 @@ class MLXEmbedder(Embedder):
             prefix = SEARCH_PREFIXES.get(task, "")
             texts = [prefix + t for t in texts] if prefix else texts
+        logger.info("Embedding %d chunks (task=%s)...", len(texts), task.value)
         return self._embed_batched(texts)
     def _ensure_loaded(self) -> None:
@@ -53,20 +55,13 @@ class MLXEmbedder(Embedder):
         from mlx_embeddings import load
         cached = self._is_cached()
-        if cached:
-            logger.info(
-                "Loading model %s from cache (%s)...",
-                self._model_name,
-                mx.default_device(),
-            )
-        else:
-            logger.info(
-                "Downloading model %s (one-time, %s)...",
-                self._model_name,
-                mx.default_device(),
-            )
         self._model, self._tokenizer = load(self._model_name)
-        logger.info("Model %s ready.", self._model_name)
+        logger.info(
+            "Model %s ready (%s, %s).",
+            self._model_name,
+            "cache" if cached else "downloaded",
+            mx.default_device(),
+        )
     def _is_cached(self) -> bool:
         """Check if model exists in huggingface cache."""
@@ -83,12 +78,18 @@ class MLXEmbedder(Embedder):
         n = len(texts)
         bs = self._batch_size
-        for i in range(0, n, bs):
-            batch = texts[i : i + bs]
-            arr = self._embed_single_batch(batch)
-            out.extend(arr.tolist())
-            logger.info("MLX embedded %d/%d", min(i + bs, n), n)
+        with timed_phase("embedding", log=False) as tp:
+            for i in range(0, n, bs):
+                batch = texts[i : i + bs]
+                arr = self._embed_single_batch(batch)
+                out.extend(arr.tolist())
+                logger.info("Embedded %d/%d chunks", min(i + bs, n), n)
+        logger.info(
+            "Embedding complete: %d chunks in %.2fs",
+            n,
+            tp.elapsed,
+        )
         return out
     def _embed_single_batch(self, batch: list[str]):

coderay-1.2.0/src/coderay/graph/README.md ADDED Viewed

@@ -0,0 +1,45 @@
+# graph
+Directed **calls**, **imports**, and **inheritance** over indexed source. The implementation is laid out as extractors, lowering, merge, and post-merge passes in this package; this file describes **behavior**, not file names.
+## Pipeline (conceptual)
+Per file: CST → **facts** (definitions, calls, imports, inherits) → **materialise** into `GraphNode` / `GraphEdge`. Multi-file **merge** builds one `CodeGraph`. **Post-merge** runs language passes and global rewrites (e.g. resolving bare-name call targets when unambiguous repo-wide).
+Cross-file lowering uses a **module index** (dotted name → file path) so imports and qualified names can become `file_path::symbol` targets. Edges may point at **phantom** strings (unresolved callee) until passes or later tooling refine them.
+## Targets and phantoms
+Call/import/inherit **targets are strings**: resolved node ids (`file::qual`), module-style refs (`pkg.mod.sym`), or **phantoms** (short names, unknowns). Heuristics classify targets for filtering and UX; **materialise** can emit edges whose endpoints are not yet graph nodes.
+**`include_external`** (config) drops edges whose targets are not considered “in repo” for the current index.
+## Symbol resolution (`CodeGraph`)
+Indexes back **short names** and **qualified names** to node ids. **Unique** short name → one id; **ambiguous** → `resolve_symbol` returns `None` (callers must use full id or disambiguate).
+## Impact radius (`impact.py`)
+**Reverse** traversal from a symbol: who **calls**, **imports**, or **inherits** toward it, up to a **depth** limit. Not every edge kind is impact-relevant; module nodes are filtered when the same file is already represented by concrete symbols.
+**Resolution layers:** exact id → optional **fuzzy** match by trailing name within a file → hints when ambiguous or empty results. **Seeds** for a method can include the **parent class’s** same-named method when inheritance is present, so callers of the base implementation count toward impact on overrides. **Phantom aliases** (same symbol under different string ids) are considered so edges from re-exports or legacy shapes are not missed.
+**Limitations:** static graph only—dynamic dispatch, reflection, and cross-repo callers are not modeled; hints may suggest grep when imports exist but call edges could not be resolved.
+## Callee lowering (`CalleeResolver`)
+Raw callee text from the tree (e.g. `self.m`, `super().x`, `a.b`) is combined with **per-file bindings** (imports, instance typing, scopes) to produce target strings. Order matters: **super** / **self** handling runs before generic **simple** and **dotted-chain** resolution. Behavior is shared across languages where configs align (`self`/`super` prefixes); edge cases differ by language grammar and binding richness.
+## Known limitations (general)
+- **Soundness:** graph is **heuristic**, not a type system; wrong or missing edges are expected under metaprogramming, conditional imports, and incomplete index scope.
+- **Staleness:** graph reflects last build; **watch** / rebuild needed after large refactors.
+- **Language coverage:** depth varies by language (Python/JS/TS today); new languages plug in via the same fact/materialise/merge shape but need their own extractors and tests.
+## Tests
+[`tests/unit/graph/`](../../../tests/unit/graph/) (invariants, extractors, resolver), [`tests/regression/graph/`](../../../tests/regression/graph/) (multi-file fixtures).
+## Persistence
+`graph.json` under the index directory; **`schema_version`** supports loading older serialised shapes when bumped.

{coderay-1.1.0 → coderay-1.2.0}/src/coderay/graph/__init__.py RENAMED Viewed

@@ -6,14 +6,15 @@ from coderay.graph.builder import (
     save_graph,
 )
 from coderay.graph.code_graph import CodeGraph
-from coderay.graph.extractor import extract_graph_from_file
+from coderay.graph.graph_builder import GraphBuilder, build_project_index
 __all__ = [
     "GRAPH_FILENAME",
     "CodeGraph",
+    "GraphBuilder",
+    "build_project_index",
     "build_and_save_graph",
     "build_graph",
-    "extract_graph_from_file",
     "load_graph",
     "save_graph",
 ]

coderay 1.1.0__tar.gz → 1.2.0__tar.gz

coderay 1.1.0tar.gz → 1.2.0tar.gz