PyPI - rlmgrep - Versions diffs - 0.1.10__tar.gz → 0.1.12__tar.gz - Mend

rlmgrep 0.1.10tar.gz → 0.1.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

{rlmgrep-0.1.10/rlmgrep.egg-info → rlmgrep-0.1.12}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: rlmgrep
-Version: 0.1.10
+Version: 0.1.12
 Summary: Grep-shaped CLI search powered by DSPy RLM
 Author: rlmgrep
 License: MIT
@@ -8,6 +8,7 @@ Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 Requires-Dist: dspy>=3.1.1
 Requires-Dist: markitdown[all]>=0.1.4
+Requires-Dist: pathspec>=0.12.1
 Requires-Dist: pypdf>=4.0.0
 # rlmgrep
@@ -22,10 +23,20 @@ uv tool install rlmgrep
 # uv tool install git+https://github.com/halfprice06/rlmgrep.git
 export OPENAI_API_KEY=...  # or set keys in ~/.rlmgrep
+```
+```sh
 rlmgrep --answer "What does this repo do and where are the entry points?" .
+```
+![Quickstart answer mode](docs/images/quickstart-answer.png)
+```sh
 rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
 ```
+![Quickstart context mode](docs/images/quickstart-context.png)
 ## Requirements
 - Python 3.11+
@@ -39,8 +50,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
 How it works:
 - **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
 - **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
-- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
-- **Audio** transcription is supported through OpenAI when enabled.
+- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
+- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
 Sidecar caching:
 - For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
@@ -48,7 +59,7 @@ Sidecar caching:
 ## Install Deno
-DSPy requires the Deno runtime. Install it with the official scripts:
+DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
 macOS/Linux:
@@ -83,6 +94,8 @@ Common options:
 - `-m N` max matching lines per file
 - `-g GLOB` include files matching glob (repeatable, comma-separated)
 - `--type T` include file types (repeatable, comma-separated)
+- `--hidden` include hidden files and directories
+- `--no-ignore` do not respect `.gitignore`
 - `--no-recursive` do not recurse directories
 - `-a`, `--text` treat binary files as text
 - `-y`, `--yes` skip file count confirmation
@@ -115,6 +128,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
 ## Input selection
 - Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
+- Hidden files and `.gitignore` rules are respected by default. Use `--hidden` or `--no-ignore` to include them.
 - `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
 - `-g/--glob` matches path globs against normalized paths (forward slashes).
 - Paths are printed relative to the current working directory when possible.

rlmgrep-0.1.10/PKG-INFO → rlmgrep-0.1.12/README.md RENAMED Viewed

@@ -1,15 +1,3 @@
-Metadata-Version: 2.4
-Name: rlmgrep
-Version: 0.1.10
-Summary: Grep-shaped CLI search powered by DSPy RLM
-Author: rlmgrep
-License: MIT
-Requires-Python: >=3.11
-Description-Content-Type: text/markdown
-Requires-Dist: dspy>=3.1.1
-Requires-Dist: markitdown[all]>=0.1.4
-Requires-Dist: pypdf>=4.0.0
 # rlmgrep
 Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format.
@@ -22,10 +10,20 @@ uv tool install rlmgrep
 # uv tool install git+https://github.com/halfprice06/rlmgrep.git
 export OPENAI_API_KEY=...  # or set keys in ~/.rlmgrep
+```
+```sh
 rlmgrep --answer "What does this repo do and where are the entry points?" .
+```
+![Quickstart answer mode](docs/images/quickstart-answer.png)
+```sh
 rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
 ```
+![Quickstart context mode](docs/images/quickstart-context.png)
 ## Requirements
 - Python 3.11+
@@ -39,8 +37,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
 How it works:
 - **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
 - **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
-- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
-- **Audio** transcription is supported through OpenAI when enabled.
+- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
+- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
 Sidecar caching:
 - For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
@@ -48,7 +46,7 @@ Sidecar caching:
 ## Install Deno
-DSPy requires the Deno runtime. Install it with the official scripts:
+DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
 macOS/Linux:
@@ -83,6 +81,8 @@ Common options:
 - `-m N` max matching lines per file
 - `-g GLOB` include files matching glob (repeatable, comma-separated)
 - `--type T` include file types (repeatable, comma-separated)
+- `--hidden` include hidden files and directories
+- `--no-ignore` do not respect `.gitignore`
 - `--no-recursive` do not recurse directories
 - `-a`, `--text` treat binary files as text
 - `-y`, `--yes` skip file count confirmation
@@ -115,6 +115,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
 ## Input selection
 - Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
+- Hidden files and `.gitignore` rules are respected by default. Use `--hidden` or `--no-ignore` to include them.
 - `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
 - `-g/--glob` matches path globs against normalized paths (forward slashes).
 - Paths are printed relative to the current working directory when possible.

{rlmgrep-0.1.10 → rlmgrep-0.1.12}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "rlmgrep"
-version = "0.1.10"
+version = "0.1.12"
 description = "Grep-shaped CLI search powered by DSPy RLM"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -9,6 +9,7 @@ license = { text = "MIT" }
 dependencies = [
   "dspy>=3.1.1",
   "markitdown[all]>=0.1.4",
+  "pathspec>=0.12.1",
   "pypdf>=4.0.0",
 ]

{rlmgrep-0.1.10 → rlmgrep-0.1.12}/rlmgrep/__init__.py RENAMED Viewed

@@ -1,2 +1,2 @@
 __all__ = ["__version__"]
-__version__ = "0.1.10"
+__version__ = "0.1.12"

{rlmgrep-0.1.10 → rlmgrep-0.1.12}/rlmgrep/cli.py RENAMED Viewed

@@ -9,7 +9,13 @@ import dspy
 from . import __version__
 from .config import ensure_default_config, load_config
 from .file_map import build_file_map
-from .ingest import FileRecord, collect_candidates, load_files, resolve_type_exts
+from .ingest import (
+    FileRecord,
+    build_gitignore_spec,
+    collect_candidates,
+    load_files,
+    resolve_type_exts,
+)
 from .rlm import Match, build_lm, run_rlm
 from .render import render_matches
@@ -81,6 +87,8 @@ def _parse_args(argv: list[str]) -> argparse.Namespace:
     parser.add_argument("-B", dest="before", type=int, default=None, help="Context lines before")
     parser.add_argument("-m", dest="max_count", type=int, default=None, help="Max matching lines per file")
     parser.add_argument("-a", "--text", dest="binary_as_text", action="store_true", help="Search binary files as text")
+    parser.add_argument("--hidden", action="store_true", help="Include hidden files and directories")
+    parser.add_argument("--no-ignore", dest="no_ignore", action="store_true", help="Do not respect .gitignore")
     parser.add_argument("--answer", action="store_true", help="Print a narrative answer before grep output")
     parser.add_argument("-y", "--yes", action="store_true", help="Skip file count confirmation")
     parser.add_argument(
@@ -139,6 +147,13 @@ def _pick(cli_value, config: dict, key: str, default=None):
     return default
+def _find_git_root(start: Path) -> Path | None:
+    for p in [start, *start.parents]:
+        if (p / ".git").is_dir():
+            return p
+    return None
 def _env_value(name: str) -> str | None:
     val = os.getenv(name)
     if val is None:
@@ -424,12 +439,21 @@ def main(argv: list[str] | None = None) -> int:
         if hard_max is not None and hard_max <= 0:
             hard_max = None
+        ignore_spec = None
+        ignore_root = None
+        if not args.no_ignore:
+            ignore_root = _find_git_root(cwd) or cwd
+            ignore_spec = build_gitignore_spec(ignore_root)
         candidates = collect_candidates(
             input_paths,
             cwd=cwd,
             recursive=args.recursive,
             include_globs=globs,
             type_exts=type_exts,
+            include_hidden=args.hidden,
+            ignore_spec=ignore_spec,
+            ignore_root=ignore_root,
         )
         candidate_count = len(candidates)
         if hard_max is not None and candidate_count > hard_max:

{rlmgrep-0.1.10 → rlmgrep-0.1.12}/rlmgrep/ingest.py RENAMED Viewed

@@ -2,8 +2,14 @@ from __future__ import annotations
 from dataclasses import dataclass
 from fnmatch import fnmatch
+import os
 from pathlib import Path, PurePosixPath
-from typing import Iterable, Any, Callable
+from typing import Any, Callable, Iterable
+try:
+    import pathspec
+except Exception:  # pragma: no cover - optional at import time
+    pathspec = None
 from pypdf import PdfReader
@@ -161,6 +167,64 @@ def collect_files(paths: Iterable[str], recursive: bool = True) -> list[Path]:
     return files
+def build_gitignore_spec(root: Path) -> "pathspec.PathSpec | None":
+    if pathspec is None:
+        return None
+    root = root.resolve()
+    gitignore_paths: list[Path] = []
+    for dirpath, dirnames, filenames in os.walk(root):
+        if ".git" in dirnames:
+            dirnames.remove(".git")
+        if ".gitignore" in filenames:
+            gitignore_paths.append(Path(dirpath) / ".gitignore")
+    if not gitignore_paths:
+        return None
+    def _sort_key(p: Path) -> tuple[int, str]:
+        try:
+            rel = p.parent.relative_to(root)
+            depth = len(rel.parts)
+            return depth, rel.as_posix()
+        except ValueError:
+            return 0, p.as_posix()
+    gitignore_paths.sort(key=_sort_key)
+    patterns: list[str] = []
+    for gi in gitignore_paths:
+        try:
+            rel_dir = gi.parent.relative_to(root).as_posix()
+        except ValueError:
+            rel_dir = ""
+        try:
+            raw_lines = gi.read_text(encoding="utf-8", errors="ignore").splitlines()
+        except Exception:
+            continue
+        for raw in raw_lines:
+            line = raw.rstrip("\n")
+            if not line:
+                continue
+            if line.startswith("\\#") or line.startswith("\\!"):
+                line = line[1:]
+            elif line.startswith("#"):
+                continue
+            negated = line.startswith("!")
+            if negated:
+                line = line[1:]
+            if line.startswith("/"):
+                line = line[1:]
+            if rel_dir:
+                line = f"{rel_dir}/{line}"
+            if negated:
+                line = "!" + line
+            patterns.append(line)
+    if not patterns:
+        return None
+    return pathspec.PathSpec.from_lines("gitwildmatch", patterns)
 TYPE_EXTS = {
     "bash": {".bash"},
     "c": {".c", ".h"},
@@ -237,21 +301,46 @@ def _matches_globs(path: str, globs: list[str]) -> bool:
     return False
+def _is_hidden_path(path: Path) -> bool:
+    return any(part.startswith(".") for part in path.parts if part)
 def collect_candidates(
     paths: Iterable[str],
     cwd: Path,
     recursive: bool = True,
     include_globs: list[str] | None = None,
     type_exts: set[str] | None = None,
+    include_hidden: bool = False,
+    ignore_spec: "pathspec.PathSpec | None" = None,
+    ignore_root: Path | None = None,
 ) -> list[Path]:
     files = collect_files(paths, recursive=recursive)
+    explicit_files: set[Path] = set()
+    for raw in paths:
+        p = Path(raw)
+        if p.exists() and p.is_file():
+            explicit_files.add(p.resolve())
     candidates: list[Path] = []
     for fp in files:
+        fp_resolved = fp.resolve()
+        is_explicit = fp_resolved in explicit_files
+        if not include_hidden and not is_explicit and _is_hidden_path(fp):
+            continue
         try:
             key = fp.relative_to(cwd).as_posix()
         except ValueError:
             key = fp.as_posix()
+        if ignore_spec is not None and ignore_root is not None and not is_explicit:
+            try:
+                rel = fp.relative_to(ignore_root).as_posix()
+            except ValueError:
+                rel = None
+            if rel and ignore_spec.match_file(rel):
+                continue
         if include_globs and not _matches_globs(key, include_globs):
             continue

rlmgrep-0.1.10/README.md → rlmgrep-0.1.12/rlmgrep.egg-info/PKG-INFO RENAMED Viewed

@@ -1,3 +1,16 @@
+Metadata-Version: 2.4
+Name: rlmgrep
+Version: 0.1.12
+Summary: Grep-shaped CLI search powered by DSPy RLM
+Author: rlmgrep
+License: MIT
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: dspy>=3.1.1
+Requires-Dist: markitdown[all]>=0.1.4
+Requires-Dist: pathspec>=0.12.1
+Requires-Dist: pypdf>=4.0.0
 # rlmgrep
 Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format.
@@ -10,10 +23,20 @@ uv tool install rlmgrep
 # uv tool install git+https://github.com/halfprice06/rlmgrep.git
 export OPENAI_API_KEY=...  # or set keys in ~/.rlmgrep
+```
+```sh
 rlmgrep --answer "What does this repo do and where are the entry points?" .
+```
+![Quickstart answer mode](docs/images/quickstart-answer.png)
+```sh
 rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
 ```
+![Quickstart context mode](docs/images/quickstart-context.png)
 ## Requirements
 - Python 3.11+
@@ -27,8 +50,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
 How it works:
 - **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
 - **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
-- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
-- **Audio** transcription is supported through OpenAI when enabled.
+- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
+- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
 Sidecar caching:
 - For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
@@ -36,7 +59,7 @@ Sidecar caching:
 ## Install Deno
-DSPy requires the Deno runtime. Install it with the official scripts:
+DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
 macOS/Linux:
@@ -71,6 +94,8 @@ Common options:
 - `-m N` max matching lines per file
 - `-g GLOB` include files matching glob (repeatable, comma-separated)
 - `--type T` include file types (repeatable, comma-separated)
+- `--hidden` include hidden files and directories
+- `--no-ignore` do not respect `.gitignore`
 - `--no-recursive` do not recurse directories
 - `-a`, `--text` treat binary files as text
 - `-y`, `--yes` skip file count confirmation
@@ -103,6 +128,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
 ## Input selection
 - Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
+- Hidden files and `.gitignore` rules are respected by default. Use `--hidden` or `--no-ignore` to include them.
 - `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
 - `-g/--glob` matches path globs against normalized paths (forward slashes).
 - Paths are printed relative to the current working directory when possible.