PyPI - codexlr8 - Versions diffs - 0.0.1__tar.gz → 0.0.3__tar.gz - Mend

codexlr8 0.0.1tar.gz → 0.0.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{codexlr8-0.0.1 → codexlr8-0.0.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codexlr8
-Version: 0.0.1
+Version: 0.0.3
 Summary: A codebase search engine for LLM coding agents
 Author-email: Sadig Akhund <sadigaxund@gmail.com>
 License: Apache-2.0
@@ -25,6 +25,8 @@ Requires-Dist: mcp>=1.0
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0; extra == "dev"
+Provides-Extra: embeddings
+Requires-Dist: sentence-transformers>=3.0; extra == "embeddings"
 Dynamic: license-file
 # CodeXLR8
@@ -64,11 +66,56 @@ CodeXLR8 indexes your codebase into an SQLite FTS5 database alongside optional `
 | Layer | Source | Boost |
 |---|---|---|
-| 1 | Raw file content (function names, variables, comments, docstrings) | FTS5 BM25 base |
-| 2 | `.meta.yaml` `summary` + `tags` | 0.6× – 0.8× |
+| 1 | Raw file content | 0.3× per token |
+| 2a | File path (filename, directory) | 0.5× – 0.8× |
+| 2b | `.meta.yaml` `summary` + `tags` | 0.6× – 0.8× |
 | 3 | `.meta.yaml` `public_api` | 1.0× (strongest) |
-Search uses AND semantics (like Google): all query tokens must match. If no results, falls back to OR with a ≥50% token threshold.
+Search uses OR semantics with token-coverage scoring: more matching tokens = higher score. A ≥50% post-filter eliminates single-token noise for multi-word queries. Path weighting (Layer 2a) provides differentiation even without metadata — a file whose name IS the query token ranks above one that merely mentions it.
+### Scoped search and clustering
+```bash
+# Narrow to a specific directory (like grep -rn "pattern" dir/)
+codexlr8 search . "get_visible" --scope lib/mpl_toolkits/
+# Cluster results by directory to see where matches concentrate
+codexlr8 search . "get_visible" --grouped
+# 12 results in 3 directories (8 files) across project:
+#   lib/mpl_toolkits/mplot3d/  (5 files)
+#     ─ axes3d.py:388  [score: 0.90]
+#     ...
+# Diagnose your query — see which terms hit, which don't
+codexlr8 search . "axes not hiding" --explain
+# Query analysis:
+#   "axes"    212 matches  — broad term (212/212 results)
+#   "not"     77 matches
+#   "hiding"  0 matches    — consider dropping or replacing
+#   Top score: 1.20 (strong match)
+# Combine both — group, then scope to drill down
+```
+### Search Quality & Fine-Tuning
+```bash
+# Measure search accuracy against known queries
+codexlr8 eval . --queries queries.json
+# Precision@1: 67%, MRR: 0.83, Recall@5: 67%
+# Typos are auto-corrected (fuzzy fallback on zero results)
+codexlr8 search . "funtion"  # → corrects to "function"
+# Opt-in embeddings: hybrid BM25 + semantic search
+# pip install codexlr8[embeddings]
+# set embeddings.enabled: true in .codexlr8.yaml
+# Fine-tune a model on YOUR codebase vocabulary
+codexlr8 recommend-model .   # picks the right model for your size
+codexlr8 train .              # TSDAE training, 5-45min on CPU
+codexlr8 eval .               # measure improvement
+```
 ## .meta.yaml Sidecars

{codexlr8-0.0.1 → codexlr8-0.0.3}/README.md RENAMED Viewed

@@ -35,11 +35,56 @@ CodeXLR8 indexes your codebase into an SQLite FTS5 database alongside optional `
 | Layer | Source | Boost |
 |---|---|---|
-| 1 | Raw file content (function names, variables, comments, docstrings) | FTS5 BM25 base |
-| 2 | `.meta.yaml` `summary` + `tags` | 0.6× – 0.8× |
+| 1 | Raw file content | 0.3× per token |
+| 2a | File path (filename, directory) | 0.5× – 0.8× |
+| 2b | `.meta.yaml` `summary` + `tags` | 0.6× – 0.8× |
 | 3 | `.meta.yaml` `public_api` | 1.0× (strongest) |
-Search uses AND semantics (like Google): all query tokens must match. If no results, falls back to OR with a ≥50% token threshold.
+Search uses OR semantics with token-coverage scoring: more matching tokens = higher score. A ≥50% post-filter eliminates single-token noise for multi-word queries. Path weighting (Layer 2a) provides differentiation even without metadata — a file whose name IS the query token ranks above one that merely mentions it.
+### Scoped search and clustering
+```bash
+# Narrow to a specific directory (like grep -rn "pattern" dir/)
+codexlr8 search . "get_visible" --scope lib/mpl_toolkits/
+# Cluster results by directory to see where matches concentrate
+codexlr8 search . "get_visible" --grouped
+# 12 results in 3 directories (8 files) across project:
+#   lib/mpl_toolkits/mplot3d/  (5 files)
+#     ─ axes3d.py:388  [score: 0.90]
+#     ...
+# Diagnose your query — see which terms hit, which don't
+codexlr8 search . "axes not hiding" --explain
+# Query analysis:
+#   "axes"    212 matches  — broad term (212/212 results)
+#   "not"     77 matches
+#   "hiding"  0 matches    — consider dropping or replacing
+#   Top score: 1.20 (strong match)
+# Combine both — group, then scope to drill down
+```
+### Search Quality & Fine-Tuning
+```bash
+# Measure search accuracy against known queries
+codexlr8 eval . --queries queries.json
+# Precision@1: 67%, MRR: 0.83, Recall@5: 67%
+# Typos are auto-corrected (fuzzy fallback on zero results)
+codexlr8 search . "funtion"  # → corrects to "function"
+# Opt-in embeddings: hybrid BM25 + semantic search
+# pip install codexlr8[embeddings]
+# set embeddings.enabled: true in .codexlr8.yaml
+# Fine-tune a model on YOUR codebase vocabulary
+codexlr8 recommend-model .   # picks the right model for your size
+codexlr8 train .              # TSDAE training, 5-45min on CPU
+codexlr8 eval .               # measure improvement
+```
 ## .meta.yaml Sidecars

{codexlr8-0.0.1 → codexlr8-0.0.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "codexlr8"
-version = "0.0.1"
+version = "0.0.3"
 description = "A codebase search engine for LLM coding agents"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -37,6 +37,9 @@ dev = [
     "pytest>=7.0",
     "pytest-cov>=4.0",
 ]
+embeddings = [
+    "sentence-transformers>=3.0",
+]
 [project.scripts]
 codexlr8 = "codexlr8.cli:main"

{codexlr8-0.0.1 → codexlr8-0.0.3}/src/codexlr8/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """CodeXLR8 — A codebase search engine for LLM coding agents."""
-__version__ = "0.0.1"
+__version__ = "0.0.3"

{codexlr8-0.0.1 → codexlr8-0.0.3}/src/codexlr8/cli.py RENAMED Viewed

@@ -1,12 +1,13 @@
 """CodeXLR8 CLI — search-first codebase navigation for agents."""
 import asyncio
+import os
 import click
 from .config import load_config
 from .scanner import scan_project
 from .meta import generate_missing_sidecars
-from .search import SearchEngine
+from .search import SearchEngine, _group_results, _explain_query, _tokenize
 EXCLUDE_HELP = (
@@ -62,10 +63,19 @@ def scan(project_path: str, output: str | None):
 @click.argument("query")
 @click.option("--exclude", "-x", "exclude_patterns", multiple=True,
               callback=_parse_excludes, help=EXCLUDE_HELP)
+@click.option("--scope", "-s", default=None,
+              help="Restrict search to files under a path prefix (e.g. src/ or lib/mpl_toolkits/)")
+@click.option("--grouped", "-g", is_flag=True, default=False,
+              help="Cluster results by directory before listing files")
+@click.option("--explain", "-e", is_flag=True, default=False,
+              help="Show token breakdown and query diagnostics")
+@click.option("--group-depth", default=3,
+              help="Max directory depth for grouping (default: 3)")
 @click.option("--format", "-f", "output_format",
               type=click.Choice(["text", "json"]), default="text")
 @click.option("--limit", "-n", default=10, help="Maximum number of results")
 def search(project_path: str, query: str, exclude_patterns: list[str],
+           scope: str | None, grouped: bool, explain: bool, group_depth: int,
            output_format: str, limit: int):
     """Search the codebase for code matching QUERY.
@@ -74,19 +84,52 @@ def search(project_path: str, query: str, exclude_patterns: list[str],
     \b
     Examples:
       codexlr8 search . "login auth"
+      codexlr8 search . "login auth" --grouped
+      codexlr8 search . "login auth" --explain
       codexlr8 search . "login auth" --exclude "tests/*"
       codexlr8 search . "login auth" -x "tests/*" -x "vendor/*"
+      codexlr8 search . "get_visible" --scope lib/mpl_toolkits/
     """
     engine = SearchEngine(project_path)
-    results = engine.search(query, limit=limit, exclude=exclude_patterns)
+    results = engine.search(query, limit=limit, exclude=exclude_patterns, scope=scope)
     if output_format == "json":
         import json
-        click.echo(json.dumps(results, indent=2))
+        output = {"results": results}
+        if explain:
+            output["explain"] = _explain_query(query, _tokenize(query), results)
+        if grouped:
+            groups_data = _group_results(results, group_depth)
+            output["grouped"] = True
+            output["groups"] = groups_data["groups"]
+            output["summary"] = {
+                "total_results": groups_data["total_results"],
+                "total_files": groups_data["total_files"],
+                "total_groups": len(groups_data["groups"]),
+            }
+        click.echo(json.dumps(output, indent=2))
         return
     if not results:
         click.echo("No results found.")
+        if explain:
+            tokens = _tokenize(query)
+            click.echo()
+            click.echo("Query analysis:")
+            for t in tokens:
+                click.echo(f"  \"{t}\"  \u2717 no matches")
+            click.echo()
+            click.echo("0 tokens matched. All terms are absent from the codebase.")
+        return
+    if explain:
+        tokens = _tokenize(query)
+        explain_data = _explain_query(query, tokens, results)
+        _print_explain(explain_data)
+        click.echo()
+    if grouped:
+        _print_grouped(results, group_depth, scope)
         return
     for i, r in enumerate(results, 1):
@@ -96,6 +139,8 @@ def search(project_path: str, query: str, exclude_patterns: list[str],
             click.echo(f"   meta:   {r['summary']}")
         if r.get("tags"):
             click.echo(f"   tags:   {', '.join(r['tags'])}")
+        if r.get("matched_tokens"):
+            click.echo(f"   matched: {', '.join(r['matched_tokens'])}")
         if r.get("preview"):
             click.echo("   preview: |")
             for line in r["preview"].strip().splitlines()[:6]:
@@ -151,6 +196,144 @@ def status(project_path: str):
     click.echo(f"Files without .meta.yaml: {state['files_without_meta']}")
     click.echo(f"Total lines indexed: {state['total_lines']}")
     click.echo(f"Index age: {state.get('index_age', 'N/A')}")
+    click.echo(f"Coverage: {state.get('coverage_pct', 0)}%")
+    if state.get("warning"):
+        click.echo()
+        click.secho(f"  Warning: {state['warning']}", fg="yellow")
+@main.command()
+@click.argument("project_path", type=click.Path(exists=True, file_okay=False))
+@click.option("--queries", "-q", required=True,
+              type=click.Path(exists=True, dir_okay=False),
+              help="Path to JSON file with query definitions")
+@click.option("--limit", "-n", default=10,
+              help="Max results per query (default: 10)")
+def eval_cmd(project_path: str, queries: str, limit: int):
+    """Evaluate search quality against a query set.
+    QUERIES is a JSON file with an array of query objects:
+    [{"query": "...", "expected": "path/to/file.py", "min_rank": 1}]
+    Outputs a per-query pass/fail table and aggregate metrics:
+    Precision@1, Mean Reciprocal Rank (MRR), Recall@5.
+    """
+    from .eval import load_queries, run_eval
+    import json
+    try:
+        query_defs = load_queries(queries)
+    except (json.JSONDecodeError, ValueError) as e:
+        raise click.ClickException(f"Invalid queries file: {e}")
+    if not query_defs:
+        raise click.ClickException("Queries file contains no queries.")
+    metrics = run_eval(project_path, query_defs, limit=limit)
+    # Per-query table
+    click.secho("  Query                             Expected            Mode   Lines    Rank  Score   Status", fg="cyan", bold=True)
+    click.secho("  " + "─" * 105, fg="cyan")
+    for r in metrics["results"]:
+        query_str = f'"{r["query"]}"'.ljust(34)
+        expected_str = r["expected"].ljust(20)
+        mode_str = r.get("assert", "file").ljust(7)
+        lines_str = ""
+        if r.get("line_start"):
+            lines_str = f"{r['line_start']}-{r['line_end']}".ljust(8)
+        else:
+            lines_str = "—".ljust(8)
+        rank_str = str(r["rank"]).ljust(6) if r["rank"] else "—     "
+        score_str = f'{r["score"]:.2f}'.ljust(8) if r["score"] else "—       "
+        status = r["status"]
+        if status.startswith("pass"):
+            status_style = {"fg": "green"}
+        elif "found" in status:
+            status_style = {"fg": "yellow"}
+        else:
+            status_style = {"fg": "red"}
+        click.echo(f"  {query_str} {expected_str} {mode_str} {lines_str} {rank_str} {score_str} {click.style(status, **status_style)}")
+    # Aggregate metrics
+    click.echo()
+    click.echo(click.style("  " + "─" * 40, fg="cyan"))
+    click.secho(f"  Precision@1:  {metrics['precision_at_1']:.2%}  "
+                f"({metrics['passed']}/{metrics['num_queries']} passed)", fg="green")
+    click.secho(f"  MRR:          {metrics['mrr']:.4f}", fg="green")
+    click.secho(f"  Recall@5:     {metrics['recall_at_5']:.2%}", fg="green")
+@main.command()
+@click.argument("project_path", type=click.Path(exists=True, file_okay=False), default=".")
+@click.option("--model", "-m", default="all-MiniLM-L6-v2",
+              help="Embedding model to fine-tune")
+@click.option("--epochs", "-e", default=3,
+              help="Training epochs (default: 3)")
+@click.option("--incremental", "-i", is_flag=True, default=False,
+              help="Fine-tune only on changed files")
+def train(project_path: str, model: str, epochs: int, incremental: bool):
+    """Fine-tune an embedding model on this codebase for better search accuracy.
+    Uses TSDAE (denoising auto-encoder) to adapt a pretrained model to
+    your codebase's vocabulary. The fine-tuned model is saved to
+    .codexlr8_model/ and referenced in .codexlr8.yaml.
+    Requirements: pip install codexlr8[embeddings]
+    """
+    try:
+        from .train import train_model
+    except ImportError as e:
+        raise click.ClickException(
+            "Training requires 'pip install codexlr8[embeddings]'"
+        ) from e
+    click.echo()
+    click.secho("  Training embedding model on this codebase...", fg="cyan", bold=True)
+    click.echo(f"  Model:  {model}")
+    click.echo(f"  Epochs: {epochs}")
+    click.echo()
+    try:
+        result = train_model(project_path, model_name=model,
+                             epochs=epochs, incremental=incremental)
+    except ValueError as e:
+        raise click.ClickException(str(e))
+    dur = result["duration_sec"]
+    dur_str = f"{dur}s" if dur < 60 else f"{dur // 60}m{dur % 60}s"
+    click.echo()
+    click.secho(f"  Trained on {result['num_examples']} files in {dur_str}", fg="green")
+    click.secho(f"  Model saved to {result['model_path']}", fg="green")
+    click.secho(f"  Embeddings enabled in .codexlr8.yaml", fg="green")
+    click.echo()
+    click.secho("  Run 'codexlr8 eval .' to measure improvement.", dim=True)
+@main.command()
+@click.argument("project_path", type=click.Path(exists=True, file_okay=False), default=".")
+def recommend_model_cmd(project_path: str):
+    """Suggest the best embedding model for this codebase size."""
+    try:
+        from .train import recommend_model
+    except ImportError as e:
+        raise click.ClickException(
+            "Requires 'pip install codexlr8[embeddings]'"
+        ) from e
+    rec = recommend_model(project_path)
+    click.echo()
+    click.secho(f"  Codebase: {rec['num_files']} files, ~{rec['est_tokens']:,} tokens", fg="cyan")
+    click.echo()
+    click.secho(f"  Recommended: {rec['model']} ({rec['param_count']})", fg="green", bold=True)
+    click.echo(f"  Est. training time: {rec['est_training_time']}")
+    click.echo(f"  Expected quality gain: {rec.get('quality_gain', '+5-12% MRR')}")
+    click.echo()
+    click.secho("  Run 'codexlr8 train .' to start training.", dim=True)
 @main.command()
@@ -240,7 +423,8 @@ def setup(project_path: str):
     include = [p.strip() for p in custom_include.split(",") if p.strip()]
     click.echo()
-    defaults = ["tests/*", "test/*", "spec/*", "__tests__/*", "test_*", "*_test.*"]
+    defaults = ["tests/*", "test/*", "spec/*", "__tests__/*", "test_*", "*_test.*",
+                "examples/*", "docs/*", "tutorials/*", "benchmarks/*"]
     custom_exclude = click.prompt(
         click.style("    Exclude (comma-separated)", fg="bright_white"),
         default=", ".join(defaults),
@@ -306,6 +490,102 @@ def setup(project_path: str):
     click.secho("  Run 'codexlr8 index .' to build your first search index.", dim=True)
+def _print_explain(data: dict):
+    """Print query diagnostic breakdown."""
+    click.secho("Query analysis:", fg="cyan", bold=True)
+    click.echo(f"  Original:  \"{data['query']}\"")
+    click.echo(f"  Tokens:    {', '.join(data['tokens'])}")
+    click.echo()
+    for token in data["tokens"]:
+        hits = data["token_hits"].get(token, 0)
+        total = data["total_results"]
+        if hits == 0:
+            status = click.style(f"{hits} matches", fg="red")
+            hint = " — consider dropping or replacing"
+        elif hits <= 3:
+            status = click.style(f"{hits} matches", fg="yellow")
+            hint = " — very specific"
+        elif hits <= total * 0.1:
+            status = click.style(f"{hits} matches", fg="green")
+            hint = ""
+        else:
+            status = click.style(f"{hits} matches", fg="yellow")
+            hint = f" — broad term ({hits}/{total} results)"
+        click.echo(f"  \"{token}\"  {status}{hint}")
+    for fw in data["filtered"]:
+        click.echo(f"  \"{fw}\"  {click.style('filtered', fg='yellow')} — single letter, ignored")
+    click.echo()
+    top = data["top_score"]
+    if top < 0.60:
+        quality = click.style("weak", fg="red")
+    elif top < 1.20:
+        quality = click.style("moderate", fg="yellow")
+    else:
+        quality = click.style("strong", fg="green")
+    click.echo(f"  Top score: {top} ({quality} match)")
+    if data["filtered"]:
+        click.echo(click.style("  Tip:", dim=True) + " single-letter words are ignored. Use full terms.")
+    zero_match = [t for t in data["tokens"] if data["token_hits"].get(t, 0) == 0]
+    if zero_match:
+        click.echo(click.style("  Tip:", dim=True) + f" \"{zero_match[0]}\" doesn't exist — try a synonym or drop it.")
+def _print_grouped(results: list[dict], group_depth: int, scope: str | None):
+    """Print search results clustered by directory."""
+    groups_data = _group_results(results, group_depth)
+    groups = groups_data["groups"]
+    total = groups_data["total_results"]
+    files = groups_data["total_files"]
+    scope_label = f"in {scope}" if scope else "across project"
+    click.echo(f"{total} results in {len(groups)} directories ({files} files) {scope_label}:")
+    click.echo()
+    top_groups = groups[:5]
+    for g in top_groups:
+        # Directory header with match count
+        label = g["prefix"].rstrip(os.sep)
+        click.echo(f"{label}/  ({g['count']} files)")
+        for f in g["files"]:
+            line_info = f"{f['path']}:{f['line_start']}-{f['line_end']}"
+            score_info = f"{f['score']:.2f}"
+            click.echo(f"  {click.style(line_info, fg='cyan')}  "
+                       f"[score: {score_info}]")
+            # Summary line from preview or metadata
+            if f.get("summary"):
+                click.echo(f"    {f['summary']}")
+            elif f.get("preview"):
+                first_line = f["preview"].strip().splitlines()[0].strip() if f["preview"].strip() else ""
+                if first_line:
+                    click.echo(f"    {first_line[:100]}")
+        if g["has_more"]:
+            click.echo(f"  ... and {g['remaining']} more files")
+        click.echo()
+    if len(groups) > 5:
+        click.echo(f"... and {len(groups) - 5} more directories")
+    # Scope hint
+    click.echo()
+    if scope:
+        click.echo(click.style("Already scoped. Remove --scope to broaden.", dim=True))
+    else:
+        click.echo(
+            click.style(
+                f"Use --scope <dir> to narrow results (e.g. --scope {top_groups[0]['prefix']})",
+                dim=True
+            )
+        )
 def _inject_mcp_config(config_path: str, mcp_json: str) -> None:
     """Inject the CodeXLR8 MCP config into an existing client config file.
@@ -406,7 +686,51 @@ codebase_search(query="stripe charge customer refund")
 codebase_search(query="shopping cart checkout payment")
 ```
-Describe what you're looking for in natural language. The engine uses AND semantics — more terms increase precision, not noise.
+### Query strategy
+Describe what you're looking for in natural language. The engine uses OR semantics with a scoring layer — more terms increase precision through token-coverage ranking, not a hard AND requirement.
+**Good queries use distinct, discriminating terms:**
+| Task | Good query | Why |
+|---|---|---|
+| Fix login bug | `"login auth session token"` | Covers auth module, session, tokens — distinct terms, not synonyms |
+| Payment refund | `"stripe refund charge customer"` | Each term narrows to a different aspect of the feature |
+| 3D plot visibility | `"axes3d draw visible renderer"` | Domain term + method + symptom — different dimensions of the bug |
+| Checkout flow | `"checkout cart payment order"` | Covers all stages of the flow |
+**What to avoid:**
+- Single-word queries (`"login"`) — too broad, returns everything mentioning login
+- Synonyms (`"login authenticate signin"`) — redundant, wastes tokens without improving coverage
+- Full sentences (`"I need to find where user login happens"`) — stop words like `"I"`, `"need"`, `"to"` are filtered out
+### Using scope and grouping
+When you know which directory the code lives in, scope the search:
+```
+codebase_search(query="get_visible", scope="lib/mpl_toolkits/")
+```
+When you don't know, run a shell command to see where results cluster:
+```bash
+codexlr8 search . "get_visible" --grouped
+```
+This prints directories ranked by their highest-scoring file, with a `--scope` hint to copy into your next MCP call.
+### When results don't look right
+Check the `matched` field on each result. If a file you expected isn't showing, the missing token tells you what to adjust. If all results only match 1 of 4 tokens, your terms are too scattered — try removing one.
+For deeper diagnostics, run:
+```bash
+codexlr8 search . "your query" --explain
+```
+This shows per-token hit counts and flags zero-match terms so you can refine before calling `codebase_search` again.
 ## Interpreting results
@@ -418,9 +742,10 @@ Results include:
 | `score` | Relevance (higher = better) |
 | `summary` | Human-written description of the file's purpose |
 | `tags` | Curated keywords (auth, payment, cart, etc.) |
+| `matched` | Which query tokens the file matched — use this to debug failed searches |
 | `preview` | First ~10 lines around the best match |
-**Ranking:** Files with curated `.meta.yaml` (summary + tags) rank highest. Raw content matches rank lower. `__init__.py` re-exports are penalized.
+**Ranking:** Files with curated `.meta.yaml` (summary + tags) rank highest, followed by filename matches, then path directory matches. Raw content matches rank lowest. `__init__.py` re-exports are penalized.
 ## Maintaining the index
@@ -508,6 +833,11 @@ Exclude patterns are globs that match file paths. Use `*` for wildcards.
 | Task | Tool call |
 |---|---|
 | Find code for a feature | `codebase_search(query="...")` |
+| Search within a directory | `codebase_search(query="...", scope="src/")` |
+| Cluster results by directory | Shell: `codexlr8 search . "query" --grouped` |
+| Diagnose query terms | Shell: `codexlr8 search . "query" --explain` |
+| Measure search accuracy | Shell: `codexlr8 eval . --queries q.json` |
+| Fine-tune embeddings | Shell: `codexlr8 train .` (needs `[embeddings]` extra) |
 | Build/update index | `codebase_index(incremental=true)` |
 | Check metadata coverage | Shell: `codexlr8 status .` |
 | Bootstrap missing sidecars | Shell: `codexlr8 init .` |

{codexlr8-0.0.1 → codexlr8-0.0.3}/src/codexlr8/config.py RENAMED Viewed

@@ -24,6 +24,12 @@ def load_config(project_path: str) -> dict:
 def _defaults() -> dict:
     return {
         "root": ".",
+        "fuzzy": True,
+        "embeddings": {
+            "enabled": False,
+            "model": "all-MiniLM-L6-v2",
+            "bm25_weight": 0.6,
+        },
         "include": [],
         "exclude": [
             "tests/*",
@@ -32,6 +38,10 @@ def _defaults() -> dict:
             "__tests__/*",
             "test_*",
             "*_test.*",
+            "examples/*",
+            "docs/*",
+            "tutorials/*",
+            "benchmarks/*",
         ],
         "extensions": [
             ".py", ".js", ".ts", ".jsx", ".tsx", ".go", ".rs", ".rb",

codexlr8 0.0.1__tar.gz → 0.0.3__tar.gz

codexlr8 0.0.1tar.gz → 0.0.3tar.gz