PyPI - coderay - Versions diffs - 1.1.1__tar.gz → 1.2.1__tar.gz - Mend

coderay 1.1.1tar.gz → 1.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (121) hide show

{coderay-1.1.1/src/coderay.egg-info → coderay-1.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderay
-Version: 1.1.1
+Version: 1.2.1
 Summary: X-ray your codebase — semantic search, code graphs, file skeletons, and MCP server
 Author-email: Bogdan Copocean <bogdancopocean@gmail.com>
 License-Expression: MIT
@@ -41,6 +41,7 @@ Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0; extra == "dev"
 Requires-Dist: ruff>=0.8.0; extra == "dev"
 Requires-Dist: mypy>=1.0.0; extra == "dev"
+Requires-Dist: tiktoken>=0.5.0; extra == "dev"
 Provides-Extra: maintain
 Requires-Dist: pylance>=0.15.0; extra == "maintain"
 Provides-Extra: mlx
@@ -56,38 +57,71 @@ Dynamic: license-file
 [![License](https://img.shields.io/github/license/bogdan-copocean/coderay)](LICENSE)
 [![CI](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml/badge.svg)](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml)
-**CodeRay** ships a **local code index** with **semantic search**, **file skeletons** (signatures and docstrings, no bodies), and **blast radius** (callers, imports, inheritance) — plus an **MCP stdio server** so agents can use the same tools. Ask *by meaning*, skim **API shape**, trace **who calls what**, then read implementation when it matters: fewer tokens, less noise, answers anchored to the right files.
+**CodeRay** builds a **local code index** that gives AI agents a smarter way to explore a codebase — reading only what they need, not whole files.
-**No LLM inside CodeRay, no network, no API key – it runs locally and offline on your machine.**
+**Runs locally. No LLM. No network. No API key.**
+## The problem
-## Tools
+AI agents exploring a codebase default to reading whole files – even when one function is all that's needed. Every unnecessary line **burns tokens** and **floods the context window**: driving up API costs and noise with every read.
+The root cause is simple: agents know the **file paths** but no finer location. Without knowing *where* in a file something lives, they have no choice but to read everything.
+**CodeRay fixes this.** Every tool returns **file paths with exact line ranges** — so agents locate first, then read only the lines that matter.
+## How it works
+CodeRay exposes three primitives, each returning **paths + line ranges**:
+| Tool | Question it answers | What agents get |
+|------|---------------------|-----------------|
+| **search** | *Where is the code that does X?* | Relevant chunks with file paths and line ranges |
+| **skeleton** | *What's the shape of this file?* | Signatures + docstrings only, each tagged with its line range |
+| **impact** | *What breaks if I change this?* | Callers, imports, and inheritors — located by line range |
+### The two-phase flow
-**CodeRay sits next to ripgrep, not instead of it.** Ripgrep when you know the string or symbol; search, skeleton, and impact when you care about *intent*, *structure*, or *dependencies*—then open the file when you need real implementation detail.
+1. **Locate** — run `search`, `skeleton`, or `impact` to find what's needed. Every result includes a file path and a symbol-level line range.
+2. **Read precisely** — use those line ranges to load only the relevant snippet. Skip the rest.
-Semantic search is retrieval, not proof: hits can miss or rank oddly. Treat them as candidates, confirm with a skeleton or read, and keep the index fresh with `coderay watch` or `coderay build` when things drift.
+This keeps context windows lean and agent reasoning focused. CodeRay is not a replacement for `grep` — it fills the gap when exact names are unknown or a map is needed before reading.
+### Token savings (tiktoken, `cl100k_base`)
+| File | Lines | Full read | Skeleton | Savings | % reduction |
+|------|-------|-----------|----------|---------|-------------|
+| `src/coderay/graph/impact.py` | 249 | 2,333 | 693 | **3.4×** | **70%** |
+| `src/coderay/cli/commands.py` | 584 | 4,327 | 1,906 | **2.3×** | **56%** |
+| `src/coderay/pipeline/indexer.py` | 408 | 3,065 | 1,433 | **2.1×** | **53%** |
+| Query | Search hit tokens | vs full `indexer.py` read |
+|-------|-------------------|---------------------------|
+| "how are files re-indexed on change" | 479 | **~6x cheaper** |
+## Tools
-Skeleton shows API shape and docstrings, not every branch. Use **search** and **impact** to narrow where to look, then read the file (or spans) when you need control flow or line-accurate edits. CodeRay trims noise on those round trips; it does not forbid them.
+### Semantic search
-**Semantic search** — “How/where” by meaning.
+Agents search by **meaning**, not by name — useful when the exact function or class is unknown. Results return **file paths with line ranges** pointing at relevant chunks. Treat them as candidates: confirm with `skeleton` or a ranged read before acting. Keep the index fresh with `coderay watch` or `coderay build` when the tree drifts.
 <img src="assets/coderay-search.gif" alt="coderay search demo" width="100%" />
 ### Blast radius
-Callers and dependents (calls, imports, inheritance).
+Shows **callers, imports, and inheritance** for a symbol before it changes. Each result is tied to a file path and line range — combine with `skeleton` or ranged reads on those locations when bodies are needed.
 <img src="assets/coderay-impact.gif" alt="coderay impact demo" width="100%" />
 ### Skeleton
-Signatures and docstrings only; API surface without bodies.
+Returns **signatures and docstrings only** — no function bodies. Every block is tagged with its path and line range so subsequent reads can be scoped to exactly those lines. A full file read should happen only when the skeleton isn't enough.
 <img src="assets/coderay-skeleton.gif" alt="coderay skeleton demo" width="100%" />
 ### Full read
-Same file as skeleton: raw source costs more tokens.
+**Same file, raw source — for comparison:**
 <img src="assets/coderay-fullread.gif" alt="same file, raw source head" width="100%" />
@@ -102,7 +136,7 @@ Same file as skeleton: raw source costs more tokens.
 ## MCP
-Same tools as above, exposed to the agent so it can search, sketch structure, and trace impact instead of vacuuming whole files by default. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For choosing tools versus a plain read, see [AGENTS.md](AGENTS.md).
+Same three tools over MCP: search, skeleton (paths and line ranges), and impact—so **AI agents** can **narrow context** before full-file reads. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For tool choice versus a plain read, see [AGENTS.md](AGENTS.md).
 ```bash
 which coderay-mcp
@@ -123,37 +157,12 @@ which coderay-mcp
 `CODERAY_REPO_ROOT` must be the directory that contains `.coderay.toml`. More detail: [`mcp_server/README.md`](src/coderay/mcp_server/README.md).
-## Why this matters
-Noisy context windows make models confident about the wrong code. CodeRay front-loads **intent** (search), **shape** (skeleton), and **dependencies** (impact) so the expensive read happens after you have a map—not instead of ever reading implementation when control flow matters.
-### Token savings (tiktoken, `cl100k_base`)
-Measured on this repo after a full index.
-| File                               | Lines | Full read | Skeleton | Savings  |
-| ---------------------------------- | ----- | --------- | -------- | -------- |
-| `src/coderay/pipeline/indexer.py`  | 400   | 3,024     | 757      | **4.0x** |
-| `src/coderay/graph/code_graph.py`  | 500   | 4,261     | 1,022    | **4.2x** |
-| `src/coderay/mcp_server/server.py` | 316   | 2,268     | 1,313    | **1.7x** |
-| Query                                | Search hit tokens | vs full `indexer.py` read |
-| ------------------------------------ | ----------------- | ------------------------- |
-| "how are files re-indexed on change" | 479               | **~6x cheaper**           |
-*Not guarantees — model, chunks, and files affect counts.*
 ## Features
 - **Languages** — Python, JavaScript, and TypeScript — [`parsing/README.md`](src/coderay/parsing/README.md)
 - **Multi-repo / monorepo** — roots, aliases, optional `include` subtrees — [`core/README.md`](src/coderay/core/README.md)
 - **Hybrid search** — vector + BM25 (RRF), optional boosting — [`retrieval/README.md`](src/coderay/retrieval/README.md)
-- **Embeddings** — fastembed (CPU) or MLX on Apple Silicon — [`embedding/README.md`](src/coderay/embedding/README.md)
+- **Embeddings** — fastembed (CPU) or MLX on Apple Silicon; defaults to MiniLM L6 for speed — configure BGE in `.coderay.toml` for stronger (heavier) vectors — [`embedding/README.md`](src/coderay/embedding/README.md)
 - **Watch** — incremental re-index; `.coderay.toml` is the source of truth for what’s indexed

{coderay-1.1.1 → coderay-1.2.1}/README.md RENAMED Viewed

@@ -4,38 +4,71 @@
 [![License](https://img.shields.io/github/license/bogdan-copocean/coderay)](LICENSE)
 [![CI](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml/badge.svg)](https://github.com/bogdan-copocean/coderay/actions/workflows/ci.yml)
-**CodeRay** ships a **local code index** with **semantic search**, **file skeletons** (signatures and docstrings, no bodies), and **blast radius** (callers, imports, inheritance) — plus an **MCP stdio server** so agents can use the same tools. Ask *by meaning*, skim **API shape**, trace **who calls what**, then read implementation when it matters: fewer tokens, less noise, answers anchored to the right files.
+**CodeRay** builds a **local code index** that gives AI agents a smarter way to explore a codebase — reading only what they need, not whole files.
-**No LLM inside CodeRay, no network, no API key – it runs locally and offline on your machine.**
+**Runs locally. No LLM. No network. No API key.**
+## The problem
-## Tools
+AI agents exploring a codebase default to reading whole files – even when one function is all that's needed. Every unnecessary line **burns tokens** and **floods the context window**: driving up API costs and noise with every read.
+The root cause is simple: agents know the **file paths** but no finer location. Without knowing *where* in a file something lives, they have no choice but to read everything.
+**CodeRay fixes this.** Every tool returns **file paths with exact line ranges** — so agents locate first, then read only the lines that matter.
+## How it works
+CodeRay exposes three primitives, each returning **paths + line ranges**:
+| Tool | Question it answers | What agents get |
+|------|---------------------|-----------------|
+| **search** | *Where is the code that does X?* | Relevant chunks with file paths and line ranges |
+| **skeleton** | *What's the shape of this file?* | Signatures + docstrings only, each tagged with its line range |
+| **impact** | *What breaks if I change this?* | Callers, imports, and inheritors — located by line range |
+### The two-phase flow
-**CodeRay sits next to ripgrep, not instead of it.** Ripgrep when you know the string or symbol; search, skeleton, and impact when you care about *intent*, *structure*, or *dependencies*—then open the file when you need real implementation detail.
+1. **Locate** — run `search`, `skeleton`, or `impact` to find what's needed. Every result includes a file path and a symbol-level line range.
+2. **Read precisely** — use those line ranges to load only the relevant snippet. Skip the rest.
-Semantic search is retrieval, not proof: hits can miss or rank oddly. Treat them as candidates, confirm with a skeleton or read, and keep the index fresh with `coderay watch` or `coderay build` when things drift.
+This keeps context windows lean and agent reasoning focused. CodeRay is not a replacement for `grep` — it fills the gap when exact names are unknown or a map is needed before reading.
+### Token savings (tiktoken, `cl100k_base`)
+| File | Lines | Full read | Skeleton | Savings | % reduction |
+|------|-------|-----------|----------|---------|-------------|
+| `src/coderay/graph/impact.py` | 249 | 2,333 | 693 | **3.4×** | **70%** |
+| `src/coderay/cli/commands.py` | 584 | 4,327 | 1,906 | **2.3×** | **56%** |
+| `src/coderay/pipeline/indexer.py` | 408 | 3,065 | 1,433 | **2.1×** | **53%** |
+| Query | Search hit tokens | vs full `indexer.py` read |
+|-------|-------------------|---------------------------|
+| "how are files re-indexed on change" | 479 | **~6x cheaper** |
+## Tools
-Skeleton shows API shape and docstrings, not every branch. Use **search** and **impact** to narrow where to look, then read the file (or spans) when you need control flow or line-accurate edits. CodeRay trims noise on those round trips; it does not forbid them.
+### Semantic search
-**Semantic search** — “How/where” by meaning.
+Agents search by **meaning**, not by name — useful when the exact function or class is unknown. Results return **file paths with line ranges** pointing at relevant chunks. Treat them as candidates: confirm with `skeleton` or a ranged read before acting. Keep the index fresh with `coderay watch` or `coderay build` when the tree drifts.
 <img src="assets/coderay-search.gif" alt="coderay search demo" width="100%" />
 ### Blast radius
-Callers and dependents (calls, imports, inheritance).
+Shows **callers, imports, and inheritance** for a symbol before it changes. Each result is tied to a file path and line range — combine with `skeleton` or ranged reads on those locations when bodies are needed.
 <img src="assets/coderay-impact.gif" alt="coderay impact demo" width="100%" />
 ### Skeleton
-Signatures and docstrings only; API surface without bodies.
+Returns **signatures and docstrings only** — no function bodies. Every block is tagged with its path and line range so subsequent reads can be scoped to exactly those lines. A full file read should happen only when the skeleton isn't enough.
 <img src="assets/coderay-skeleton.gif" alt="coderay skeleton demo" width="100%" />
 ### Full read
-Same file as skeleton: raw source costs more tokens.
+**Same file, raw source — for comparison:**
 <img src="assets/coderay-fullread.gif" alt="same file, raw source head" width="100%" />
@@ -50,7 +83,7 @@ Same file as skeleton: raw source costs more tokens.
 ## MCP
-Same tools as above, exposed to the agent so it can search, sketch structure, and trace impact instead of vacuuming whole files by default. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For choosing tools versus a plain read, see [AGENTS.md](AGENTS.md).
+Same three tools over MCP: search, skeleton (paths and line ranges), and impact—so **AI agents** can **narrow context** before full-file reads. Point the server at a checkout whose root contains `.coderay.toml` (`CODERAY_REPO_ROOT` below). For tool choice versus a plain read, see [AGENTS.md](AGENTS.md).
 ```bash
 which coderay-mcp
@@ -71,37 +104,12 @@ which coderay-mcp
 `CODERAY_REPO_ROOT` must be the directory that contains `.coderay.toml`. More detail: [`mcp_server/README.md`](src/coderay/mcp_server/README.md).
-## Why this matters
-Noisy context windows make models confident about the wrong code. CodeRay front-loads **intent** (search), **shape** (skeleton), and **dependencies** (impact) so the expensive read happens after you have a map—not instead of ever reading implementation when control flow matters.
-### Token savings (tiktoken, `cl100k_base`)
-Measured on this repo after a full index.
-| File                               | Lines | Full read | Skeleton | Savings  |
-| ---------------------------------- | ----- | --------- | -------- | -------- |
-| `src/coderay/pipeline/indexer.py`  | 400   | 3,024     | 757      | **4.0x** |
-| `src/coderay/graph/code_graph.py`  | 500   | 4,261     | 1,022    | **4.2x** |
-| `src/coderay/mcp_server/server.py` | 316   | 2,268     | 1,313    | **1.7x** |
-| Query                                | Search hit tokens | vs full `indexer.py` read |
-| ------------------------------------ | ----------------- | ------------------------- |
-| "how are files re-indexed on change" | 479               | **~6x cheaper**           |
-*Not guarantees — model, chunks, and files affect counts.*
 ## Features
 - **Languages** — Python, JavaScript, and TypeScript — [`parsing/README.md`](src/coderay/parsing/README.md)
 - **Multi-repo / monorepo** — roots, aliases, optional `include` subtrees — [`core/README.md`](src/coderay/core/README.md)
 - **Hybrid search** — vector + BM25 (RRF), optional boosting — [`retrieval/README.md`](src/coderay/retrieval/README.md)
-- **Embeddings** — fastembed (CPU) or MLX on Apple Silicon — [`embedding/README.md`](src/coderay/embedding/README.md)
+- **Embeddings** — fastembed (CPU) or MLX on Apple Silicon; defaults to MiniLM L6 for speed — configure BGE in `.coderay.toml` for stronger (heavier) vectors — [`embedding/README.md`](src/coderay/embedding/README.md)
 - **Watch** — incremental re-index; `.coderay.toml` is the source of truth for what’s indexed

{coderay-1.1.1 → coderay-1.2.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "coderay"
-version = "1.1.1"
+version = "1.2.1"
 description = "X-ray your codebase — semantic search, code graphs, file skeletons, and MCP server"
 readme = "README.md"
 license = "MIT"
@@ -53,6 +53,7 @@ dev = [
     "pytest-cov>=4.0",
     "ruff>=0.8.0",
     "mypy>=1.0.0",
+    "tiktoken>=0.5.0",
 ]
 maintain = [
     "pylance>=0.15.0",

coderay-1.2.1/src/coderay/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "1.2.1"

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/cli/commands.py RENAMED Viewed

@@ -340,7 +340,7 @@ def maintain(ctx: click.Context) -> None:
 @cli.command()
-@click.argument("file_path", type=click.Path(exists=True, path_type=Path))
+@click.argument("file_path", type=str)
 @click.option(
     "--include-imports",
     is_flag=True,
@@ -353,17 +353,53 @@ def maintain(ctx: click.Context) -> None:
     default=None,
     help="Filter to a specific class or top-level function by name.",
 )
+@click.option(
+    "--lines",
+    "line_range",
+    default=None,
+    metavar="START-END",
+    help=(
+        "File line range (1-based inclusive); keep only symbols fully within this span."
+        " Do not combine with a :START-END suffix on FILE_PATH (same meaning)."
+    ),
+)
 def skeleton(
-    file_path: Path,
+    file_path: str,
     include_imports: bool,
     symbol: str | None,
+    line_range: str | None,
 ) -> None:
     """Print signatures without bodies (cheaper than reading the full file)."""
     from coderay.skeleton.extractor import extract_skeleton
+    from coderay.skeleton.path_range import (
+        parse_file_line_range,
+        parse_skeleton_file_arg,
+    )
-    content = file_path.read_text(encoding="utf-8", errors="replace")
+    try:
+        path_str, rng_from_path = parse_skeleton_file_arg(file_path, parse_suffix=True)
+    except ValueError as e:
+        raise click.BadParameter(str(e)) from e
+    file_line_range = rng_from_path
+    if line_range:
+        if file_line_range is not None:
+            raise click.UsageError(
+                "Use either a path ending with :START-END or --lines, not both."
+            )
+        try:
+            file_line_range = parse_file_line_range(line_range)
+        except ValueError as e:
+            raise click.BadParameter(str(e), param_hint="--lines") from e
+    resolved = Path(path_str)
+    if not resolved.is_file():
+        raise click.BadParameter(f"not a file: {path_str}", param_hint="file_path")
+    content = resolved.read_text(encoding="utf-8", errors="replace")
     out = extract_skeleton(
-        file_path, content, include_imports=include_imports, symbol=symbol
+        resolved,
+        content,
+        include_imports=include_imports,
+        symbol=symbol,
+        line_range=file_line_range,
     )
     click.echo(out)

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/core/defaults/default.coderay.toml RENAMED Viewed

@@ -38,13 +38,13 @@ backend = "auto"
 [embedder.fastembed]
 # Default embedder. Runs on CPU.
-model_name = "BAAI/bge-small-en-v1.5"
+model_name = "sentence-transformers/all-MiniLM-L6-v2"
 dimensions = 384
 batch_size = 64
 [embedder.mlx]
 # Apple Silicon embedder (MLX/Metal; device depends on runtime).
-model_name = "mlx-community/bge-small-en-v1.5-bf16"
+model_name = "mlx-community/all-MiniLM-L6-v2-4bit"
 dimensions = 384
 batch_size = 256

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/embedding/README.md RENAMED Viewed

@@ -27,25 +27,33 @@ Maps code chunks to dense vectors for storage and query.
 Run `coderay build --full` after any change to `[embedder]` config. Vectors
 from different models are not compatible.
-## If indexing is slow
+## Defaults and trade-offs
-The default model (BGE Small, ~67MB via fastembed / ~25MB via MLX bf16) is a
-good balance of speed and retrieval quality. If your repo is large and the first
-build takes too long, consider a lighter model:
+The default is **MiniLM L6** (`sentence-transformers/all-MiniLM-L6-v2` on CPU,
+`mlx-community/all-MiniLM-L6-v2-bf16` on MLX): fast indexing and good enough
+semantic search for most workflows. For **stronger embeddings** (often better
+retrieval on code), switch to **BGE Small** — expect a heavier download and more
+compute than MiniLM.
-| Model | Backend | Size | Dimensions | Trade-off |
-|-------|---------|------|------------|-----------|
-| `BAAI/bge-small-en-v1.5` | fastembed | ~67MB | 384 | **Default.** Best retrieval quality in this size class. |
-| `sentence-transformers/all-MiniLM-L6-v2` | fastembed | ~90MB | 384 | Widely used, slightly lower code retrieval quality than BGE Small. Larger download. |
-| `mlx-community/bge-small-en-v1.5-4bit` | mlx | ~19MB | 384 | 4-bit quantised BGE Small. Fast on Apple Silicon, minimal download. Small quality delta vs bf16 — untested on code retrieval specifically. |
-| `mlx-community/all-MiniLM-L6-v2-4bit` | mlx | ~13MB | 384 | Smallest option. Fastest cold start. Noticeably lower retrieval quality for code; best suited for quick experimentation. |
+| Model | Backend | Size (approx.) | Dimensions | Notes |
+|-------|---------|----------------|------------|-------|
+| `sentence-transformers/all-MiniLM-L6-v2` | fastembed | ~90MB | 384 | **Default.** Fast; widely used. |
+| `BAAI/bge-small-en-v1.5` | fastembed | ~67MB | 384 | Heavier quality focus; strong retrieval in this size class. |
+| `mlx-community/all-MiniLM-L6-v2-bf16` | mlx | ~45MB | 384 | **Default** on Apple Silicon with `coderay[mlx]`. |
+| `mlx-community/bge-small-en-v1.5-bf16` | mlx | ~25MB | 384 | BGE on MLX; better embeddings than MiniLM, more work per batch. |
+| `mlx-community/bge-small-en-v1.5-4bit` | mlx | ~19MB | 384 | 4-bit BGE; smaller download, small quality delta vs bf16. |
+| `mlx-community/all-MiniLM-L6-v2-4bit` | mlx | ~13MB | 384 | Smallest; fastest cold start; lower retrieval quality for code. |
-To switch, update `.coderay.toml` and run `coderay build --full`:
+To use BGE instead of the defaults, edit `.coderay.toml` and run `coderay build --full`:
 ```toml
-# Example: lighter MLX model on Apple Silicon
+[embedder.fastembed]
+model_name = "BAAI/bge-small-en-v1.5"
+dimensions = 384
+batch_size = 64
 [embedder.mlx]
-model_name = "mlx-community/bge-small-en-v1.5-4bit"
+model_name = "mlx-community/bge-small-en-v1.5-bf16"
 dimensions = 384
 batch_size = 256
 ```

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/mcp_server/server.py RENAMED Viewed

@@ -37,8 +37,10 @@ mcp = FastMCP(
         "\n"
         "- semantic_search: search code by meaning. Best for "
         "'how/where' questions. Use grep for exact symbol lookup.\n"
-        "- get_file_skeleton: signatures and docstrings only, no bodies. "
-        "Check a file's API before reading full source. "
+        "- get_file_skeleton: signatures and docstrings only, no bodies; "
+        "absolute path line per symbol (with optional symbol line range suffix) "
+        "for filepath:START-END style refs. "
+        "Optional file line range narrows output. "
         "Works without the index.\n"
         "- get_impact_radius: reverse dependency traversal from the code "
         "graph. Shows callers/dependents of a function or class. "
@@ -177,10 +179,11 @@ async def semantic_search(
 @mcp.tool(
     description=(
         "Extracts class/function signatures and docstrings from a file — "
-        "no bodies. Significantly fewer tokens than reading the full source "
-        "(a 500-line file typically compresses to ~100 lines of skeleton). "
-        "Use this before deciding whether to read a file in full. "
-        "Does not require the index."
+        "no bodies. Each symbol is preceded by the absolute file path and "
+        "symbol line range suffix (1-based inclusive) for filepath:START-END refs. "
+        "Optional file line range via path suffix :START-END or file_line_range "
+        "(same meaning; do not pass both). Narrows to declarations fully within that"
+        "range. Does not require the index."
     ),
     annotations=READ_ONLY_ANNOTATIONS,
     tags={"analysis"},
@@ -188,7 +191,12 @@ async def semantic_search(
 async def get_file_skeleton(
     file_path: Annotated[
         str,
-        Field(description="Absolute or relative path to the file"),
+        Field(
+            description=(
+                "Path to the file. Optional :START-END suffix (same as file_line_range)"
+                "; do not combine with file_line_range."
+            ),
+        ),
     ],
     include_imports: Annotated[
         bool,
@@ -206,18 +214,46 @@ async def get_file_skeleton(
             ),
         ),
     ] = None,
+    file_line_range: Annotated[
+        str | None,
+        Field(
+            description=(
+                "Optional file line range as START-END (1-based inclusive). "
+                "Do not combine with a :START-END suffix on file_path."
+            ),
+        ),
+    ] = None,
 ) -> str:
     """Get file API surface (signatures, no bodies)."""
     from coderay.skeleton.extractor import extract_skeleton
+    from coderay.skeleton.path_range import (
+        parse_file_line_range,
+        parse_skeleton_file_arg,
+    )
+    try:
+        path_str, rng_suffix = parse_skeleton_file_arg(file_path, parse_suffix=True)
+    except ValueError as e:
+        raise ValueError(str(e)) from e
+    line_range: tuple[int, int] | None = rng_suffix
+    if file_line_range:
+        if line_range is not None:
+            raise ValueError(
+                "Use either file_path :START-END suffix or file_line_range, not both."
+            )
+        try:
+            line_range = parse_file_line_range(file_line_range)
+        except ValueError as e:
+            raise ValueError(str(e)) from e
     workspace_root = _resolve_index_dir().parent.resolve()
-    candidate = (workspace_root / file_path).resolve()
+    candidate = (workspace_root / path_str).resolve()
     try:
         candidate.relative_to(workspace_root)
     except ValueError:
         raise FileNotFoundError(f"File not found: {file_path}")
     if not candidate.is_file():
-        raise FileNotFoundError(f"File not found: {file_path}")
+        raise FileNotFoundError(f"File not found: {path_str}")
     content = await asyncio.to_thread(
         candidate.read_text, encoding="utf-8", errors="replace"
     )
@@ -227,6 +263,7 @@ async def get_file_skeleton(
         content,
         include_imports=include_imports,
         symbol=symbol,
+        line_range=line_range,
     )

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/parsing/languages.py RENAMED Viewed

@@ -37,8 +37,11 @@ class GraphConfig:
 @dataclass
 class SkeletonConfig:
-    """Skeleton-only: docstrings and pass-through at module scope."""
+    """Skeleton: declaration types (chunker-style), docstrings, module pass-through."""
+    # Node types that emit as declarations. JS/TS omits export_statement (unwrap) and
+    # lexical_declaration (top_level_expr_types). See chunk_types in this file.
+    symbol_types: tuple[str, ...]
     docstring_expr_type: str = "expression_statement"
     top_level_expr_types: tuple[str, ...] = ("expression_statement",)
     body_block_types: tuple[str, ...] = ("block", "statement_block")
@@ -102,8 +105,16 @@ _PYTHON_CST_DISPATCH = CstDispatchConfig(
 )
+_PY_CHUNK_TYPES: tuple[str, ...] = (
+    "function_definition",
+    "class_definition",
+    "decorated_definition",
+)
 def _python_skeleton() -> SkeletonConfig:
     return SkeletonConfig(
+        symbol_types=_PY_CHUNK_TYPES,
         docstring_expr_type="expression_statement",
         top_level_expr_types=("expression_statement",),
         body_block_types=("block",),
@@ -111,13 +122,7 @@ def _python_skeleton() -> SkeletonConfig:
 def _python_chunker() -> ChunkerConfig:
-    return ChunkerConfig(
-        chunk_types=(
-            "function_definition",
-            "class_definition",
-            "decorated_definition",
-        ),
-    )
+    return ChunkerConfig(chunk_types=_PY_CHUNK_TYPES)
 _PYTHON_GRAPH = GraphConfig(
@@ -172,8 +177,33 @@ _JS_TS_GRAPH = GraphConfig(
 )
+# Chunker includes export_statement and lexical_declaration; skeleton unwraps exports
+# and treats top-level lexical_declaration via top_level_expr_types.
+_JS_TS_CHUNK_TYPES: tuple[str, ...] = (
+    "function_declaration",
+    "class_declaration",
+    "method_definition",
+    "arrow_function",
+    "export_statement",
+    "lexical_declaration",
+    "interface_declaration",
+    "type_alias_declaration",
+)
+_JS_TS_SKELETON_SYMBOL_TYPES: tuple[str, ...] = (
+    "function_declaration",
+    "class_declaration",
+    "method_definition",
+    "arrow_function",
+    "interface_declaration",
+    "type_alias_declaration",
+    "type_declaration",
+)
 def _js_ts_skeleton() -> SkeletonConfig:
     return SkeletonConfig(
+        symbol_types=_JS_TS_SKELETON_SYMBOL_TYPES,
         docstring_expr_type="expression_statement",
         top_level_expr_types=("expression_statement", "lexical_declaration"),
         body_block_types=("statement_block",),
@@ -181,18 +211,7 @@ def _js_ts_skeleton() -> SkeletonConfig:
 def _js_ts_chunker() -> ChunkerConfig:
-    return ChunkerConfig(
-        chunk_types=(
-            "function_declaration",
-            "class_declaration",
-            "method_definition",
-            "arrow_function",
-            "export_statement",
-            "lexical_declaration",
-            "interface_declaration",
-            "type_alias_declaration",
-        ),
-    )
+    return ChunkerConfig(chunk_types=_JS_TS_CHUNK_TYPES)
 @dataclass

{coderay-1.1.1 → coderay-1.2.1}/src/coderay/skeleton/README.md RENAMED Viewed

@@ -6,8 +6,9 @@ demand (not stored in the index). Works without a built index.
 ## How it works
-`extractor.py` uses tree-sitter to parse the file, then walks the CST using
-`classify_node` from `parsing/cst_kind.py` to identify structural boundaries.
+`extractor.py` uses tree-sitter to parse the file, then walks the CST; declaration
+nodes come from `LanguageConfig.skeleton.symbol_types` (per language, like chunk
+types in `parsing/languages.py`), with shape from `cst` function/class/decorator sets.
 Function and method bodies are replaced with `...`. Class headers are kept as
 context even when filtering to a specific symbol.

coderay 1.1.1__tar.gz → 1.2.1__tar.gz

coderay 1.1.1tar.gz → 1.2.1tar.gz