rlmgrep 0.1.8__tar.gz → 0.1.17__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/PKG-INFO +25 -9
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/README.md +23 -8
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/pyproject.toml +2 -1
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/__init__.py +1 -1
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/cli.py +92 -4
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/ingest.py +120 -1
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/render.py +0 -6
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/PKG-INFO +25 -9
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/requires.txt +1 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/__main__.py +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/config.py +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/file_map.py +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/interpreter.py +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep/rlm.py +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/SOURCES.txt +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/dependency_links.txt +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/entry_points.txt +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/rlmgrep.egg-info/top_level.txt +0 -0
- {rlmgrep-0.1.8 → rlmgrep-0.1.17}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: rlmgrep
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.17
|
|
4
4
|
Summary: Grep-shaped CLI search powered by DSPy RLM
|
|
5
5
|
Author: rlmgrep
|
|
6
6
|
License: MIT
|
|
@@ -8,11 +8,12 @@ Requires-Python: >=3.11
|
|
|
8
8
|
Description-Content-Type: text/markdown
|
|
9
9
|
Requires-Dist: dspy>=3.1.1
|
|
10
10
|
Requires-Dist: markitdown[all]>=0.1.4
|
|
11
|
+
Requires-Dist: pathspec>=0.12.1
|
|
11
12
|
Requires-Dist: pypdf>=4.0.0
|
|
12
13
|
|
|
13
14
|
# rlmgrep
|
|
14
15
|
|
|
15
|
-
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format.
|
|
16
|
+
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format. Use `--answer` to get a narrative response grounded in the selected files/directories.
|
|
16
17
|
|
|
17
18
|
## Quickstart
|
|
18
19
|
|
|
@@ -22,9 +23,20 @@ uv tool install rlmgrep
|
|
|
22
23
|
# uv tool install git+https://github.com/halfprice06/rlmgrep.git
|
|
23
24
|
|
|
24
25
|
export OPENAI_API_KEY=... # or set keys in ~/.rlmgrep
|
|
25
|
-
rlmgrep "where are API keys read" rlmgrep/
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
+
```sh
|
|
29
|
+
rlmgrep --answer "What does this repo do and where are the entry points?" .
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+

|
|
33
|
+
|
|
34
|
+
```sh
|
|
35
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+

|
|
39
|
+
|
|
28
40
|
## Requirements
|
|
29
41
|
|
|
30
42
|
- Python 3.11+
|
|
@@ -38,8 +50,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
|
|
|
38
50
|
How it works:
|
|
39
51
|
- **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
|
|
40
52
|
- **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
|
|
41
|
-
- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
|
|
42
|
-
- **Audio** transcription is supported through OpenAI when enabled.
|
|
53
|
+
- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
|
|
54
|
+
- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
|
|
43
55
|
|
|
44
56
|
Sidecar caching:
|
|
45
57
|
- For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
|
|
@@ -47,7 +59,7 @@ Sidecar caching:
|
|
|
47
59
|
|
|
48
60
|
## Install Deno
|
|
49
61
|
|
|
50
|
-
DSPy requires the Deno runtime. Install it with the official scripts:
|
|
62
|
+
DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
|
|
51
63
|
|
|
52
64
|
macOS/Linux:
|
|
53
65
|
|
|
@@ -75,12 +87,15 @@ rlmgrep [options] "query" [paths...]
|
|
|
75
87
|
|
|
76
88
|
Common options:
|
|
77
89
|
|
|
90
|
+
- `--answer` return a narrative answer before the grep output
|
|
78
91
|
- `-C N` context lines before/after (grep-style)
|
|
79
92
|
- `-A N` context lines after
|
|
80
93
|
- `-B N` context lines before
|
|
81
94
|
- `-m N` max matching lines per file
|
|
82
95
|
- `-g GLOB` include files matching glob (repeatable, comma-separated)
|
|
83
96
|
- `--type T` include file types (repeatable, comma-separated)
|
|
97
|
+
- `--hidden` include hidden files and directories
|
|
98
|
+
- `--no-ignore` do not respect `.gitignore`
|
|
84
99
|
- `--no-recursive` do not recurse directories
|
|
85
100
|
- `-a`, `--text` treat binary files as text
|
|
86
101
|
- `-y`, `--yes` skip file count confirmation
|
|
@@ -95,7 +110,7 @@ Examples:
|
|
|
95
110
|
|
|
96
111
|
```sh
|
|
97
112
|
# Natural-language query over a repo
|
|
98
|
-
rlmgrep -
|
|
113
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
99
114
|
|
|
100
115
|
# Restrict to Python files
|
|
101
116
|
rlmgrep "Where do we parse JWTs and enforce expiration?" --type py .
|
|
@@ -113,6 +128,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
113
128
|
## Input selection
|
|
114
129
|
|
|
115
130
|
- Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
|
|
131
|
+
- Hidden files and ignore files (`.gitignore`, `.ignore`, `.rgignore`) are respected by default. Use `--hidden` or `--no-ignore` to include them.
|
|
116
132
|
- `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
|
|
117
133
|
- `-g/--glob` matches path globs against normalized paths (forward slashes).
|
|
118
134
|
- Paths are printed relative to the current working directory when possible.
|
|
@@ -125,7 +141,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
125
141
|
- Output uses rg-style headings by default:
|
|
126
142
|
- A file header line like `./path/to/file`
|
|
127
143
|
- Then `line:\ttext` for matches, `line-\ttext` for context lines
|
|
128
|
-
- Line numbers are 1-based.
|
|
144
|
+
- Line numbers are always included and are 1-based.
|
|
129
145
|
- When context ranges are disjoint, a `--` line separates groups.
|
|
130
146
|
- Exit codes:
|
|
131
147
|
- `0` = at least one match
|
|
@@ -140,7 +156,7 @@ rlmgrep can interpret traditional regex-style patterns inside a natural-language
|
|
|
140
156
|
Example (best-effort regex semantics + extra context):
|
|
141
157
|
|
|
142
158
|
```sh
|
|
143
|
-
rlmgrep
|
|
159
|
+
rlmgrep "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
|
|
144
160
|
```
|
|
145
161
|
|
|
146
162
|
If you need strict, deterministic regex behavior, use `rg`/`grep`.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# rlmgrep
|
|
2
2
|
|
|
3
|
-
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format.
|
|
3
|
+
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format. Use `--answer` to get a narrative response grounded in the selected files/directories.
|
|
4
4
|
|
|
5
5
|
## Quickstart
|
|
6
6
|
|
|
@@ -10,9 +10,20 @@ uv tool install rlmgrep
|
|
|
10
10
|
# uv tool install git+https://github.com/halfprice06/rlmgrep.git
|
|
11
11
|
|
|
12
12
|
export OPENAI_API_KEY=... # or set keys in ~/.rlmgrep
|
|
13
|
-
rlmgrep "where are API keys read" rlmgrep/
|
|
14
13
|
```
|
|
15
14
|
|
|
15
|
+
```sh
|
|
16
|
+
rlmgrep --answer "What does this repo do and where are the entry points?" .
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+

|
|
20
|
+
|
|
21
|
+
```sh
|
|
22
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+

|
|
26
|
+
|
|
16
27
|
## Requirements
|
|
17
28
|
|
|
18
29
|
- Python 3.11+
|
|
@@ -26,8 +37,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
|
|
|
26
37
|
How it works:
|
|
27
38
|
- **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
|
|
28
39
|
- **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
|
|
29
|
-
- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
|
|
30
|
-
- **Audio** transcription is supported through OpenAI when enabled.
|
|
40
|
+
- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
|
|
41
|
+
- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
|
|
31
42
|
|
|
32
43
|
Sidecar caching:
|
|
33
44
|
- For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
|
|
@@ -35,7 +46,7 @@ Sidecar caching:
|
|
|
35
46
|
|
|
36
47
|
## Install Deno
|
|
37
48
|
|
|
38
|
-
DSPy requires the Deno runtime. Install it with the official scripts:
|
|
49
|
+
DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
|
|
39
50
|
|
|
40
51
|
macOS/Linux:
|
|
41
52
|
|
|
@@ -63,12 +74,15 @@ rlmgrep [options] "query" [paths...]
|
|
|
63
74
|
|
|
64
75
|
Common options:
|
|
65
76
|
|
|
77
|
+
- `--answer` return a narrative answer before the grep output
|
|
66
78
|
- `-C N` context lines before/after (grep-style)
|
|
67
79
|
- `-A N` context lines after
|
|
68
80
|
- `-B N` context lines before
|
|
69
81
|
- `-m N` max matching lines per file
|
|
70
82
|
- `-g GLOB` include files matching glob (repeatable, comma-separated)
|
|
71
83
|
- `--type T` include file types (repeatable, comma-separated)
|
|
84
|
+
- `--hidden` include hidden files and directories
|
|
85
|
+
- `--no-ignore` do not respect `.gitignore`
|
|
72
86
|
- `--no-recursive` do not recurse directories
|
|
73
87
|
- `-a`, `--text` treat binary files as text
|
|
74
88
|
- `-y`, `--yes` skip file count confirmation
|
|
@@ -83,7 +97,7 @@ Examples:
|
|
|
83
97
|
|
|
84
98
|
```sh
|
|
85
99
|
# Natural-language query over a repo
|
|
86
|
-
rlmgrep -
|
|
100
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
87
101
|
|
|
88
102
|
# Restrict to Python files
|
|
89
103
|
rlmgrep "Where do we parse JWTs and enforce expiration?" --type py .
|
|
@@ -101,6 +115,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
101
115
|
## Input selection
|
|
102
116
|
|
|
103
117
|
- Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
|
|
118
|
+
- Hidden files and ignore files (`.gitignore`, `.ignore`, `.rgignore`) are respected by default. Use `--hidden` or `--no-ignore` to include them.
|
|
104
119
|
- `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
|
|
105
120
|
- `-g/--glob` matches path globs against normalized paths (forward slashes).
|
|
106
121
|
- Paths are printed relative to the current working directory when possible.
|
|
@@ -113,7 +128,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
113
128
|
- Output uses rg-style headings by default:
|
|
114
129
|
- A file header line like `./path/to/file`
|
|
115
130
|
- Then `line:\ttext` for matches, `line-\ttext` for context lines
|
|
116
|
-
- Line numbers are 1-based.
|
|
131
|
+
- Line numbers are always included and are 1-based.
|
|
117
132
|
- When context ranges are disjoint, a `--` line separates groups.
|
|
118
133
|
- Exit codes:
|
|
119
134
|
- `0` = at least one match
|
|
@@ -128,7 +143,7 @@ rlmgrep can interpret traditional regex-style patterns inside a natural-language
|
|
|
128
143
|
Example (best-effort regex semantics + extra context):
|
|
129
144
|
|
|
130
145
|
```sh
|
|
131
|
-
rlmgrep
|
|
146
|
+
rlmgrep "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
|
|
132
147
|
```
|
|
133
148
|
|
|
134
149
|
If you need strict, deterministic regex behavior, use `rg`/`grep`.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "rlmgrep"
|
|
3
|
-
version = "0.1.
|
|
3
|
+
version = "0.1.17"
|
|
4
4
|
description = "Grep-shaped CLI search powered by DSPy RLM"
|
|
5
5
|
readme = "README.md"
|
|
6
6
|
requires-python = ">=3.11"
|
|
@@ -9,6 +9,7 @@ license = { text = "MIT" }
|
|
|
9
9
|
dependencies = [
|
|
10
10
|
"dspy>=3.1.1",
|
|
11
11
|
"markitdown[all]>=0.1.4",
|
|
12
|
+
"pathspec>=0.12.1",
|
|
12
13
|
"pypdf>=4.0.0",
|
|
13
14
|
]
|
|
14
15
|
|
|
@@ -1,2 +1,2 @@
|
|
|
1
1
|
__all__ = ["__version__"]
|
|
2
|
-
__version__ = "0.1.
|
|
2
|
+
__version__ = "0.1.17"
|
|
@@ -3,13 +3,21 @@ from __future__ import annotations
|
|
|
3
3
|
import argparse
|
|
4
4
|
import os
|
|
5
5
|
import sys
|
|
6
|
+
import shutil
|
|
7
|
+
import subprocess
|
|
6
8
|
from pathlib import Path
|
|
7
9
|
|
|
8
10
|
import dspy
|
|
9
11
|
from . import __version__
|
|
10
12
|
from .config import ensure_default_config, load_config
|
|
11
13
|
from .file_map import build_file_map
|
|
12
|
-
from .ingest import
|
|
14
|
+
from .ingest import (
|
|
15
|
+
FileRecord,
|
|
16
|
+
build_ignore_spec,
|
|
17
|
+
collect_candidates,
|
|
18
|
+
load_files,
|
|
19
|
+
resolve_type_exts,
|
|
20
|
+
)
|
|
13
21
|
from .rlm import Match, build_lm, run_rlm
|
|
14
22
|
from .render import render_matches
|
|
15
23
|
|
|
@@ -72,16 +80,22 @@ def _parse_args(argv: list[str]) -> argparse.Namespace:
|
|
|
72
80
|
parser.add_argument("pattern", nargs="?", help="Query string (interpreted by RLM)")
|
|
73
81
|
parser.add_argument("paths", nargs="*", help="Files or directories")
|
|
74
82
|
|
|
75
|
-
parser.add_argument("-n", dest="line_numbers", action="store_true", help="Show line numbers (default)")
|
|
76
83
|
parser.add_argument("-r", dest="recursive", action="store_true", help="Recursive (directories are searched recursively by default)")
|
|
77
84
|
parser.add_argument("--no-recursive", dest="recursive", action="store_false", help="Do not recurse directories")
|
|
78
|
-
parser.set_defaults(recursive=True
|
|
85
|
+
parser.set_defaults(recursive=True)
|
|
79
86
|
|
|
80
87
|
parser.add_argument("-C", dest="context", type=int, default=0, help="Context lines before/after")
|
|
81
88
|
parser.add_argument("-A", dest="after", type=int, default=None, help="Context lines after")
|
|
82
89
|
parser.add_argument("-B", dest="before", type=int, default=None, help="Context lines before")
|
|
83
90
|
parser.add_argument("-m", dest="max_count", type=int, default=None, help="Max matching lines per file")
|
|
84
91
|
parser.add_argument("-a", "--text", dest="binary_as_text", action="store_true", help="Search binary files as text")
|
|
92
|
+
parser.add_argument("--hidden", action="store_true", help="Include hidden files and directories")
|
|
93
|
+
parser.add_argument(
|
|
94
|
+
"--no-ignore",
|
|
95
|
+
dest="no_ignore",
|
|
96
|
+
action="store_true",
|
|
97
|
+
help="Do not respect ignore files (.gitignore/.ignore/.rgignore)",
|
|
98
|
+
)
|
|
85
99
|
parser.add_argument("--answer", action="store_true", help="Print a narrative answer before grep output")
|
|
86
100
|
parser.add_argument("-y", "--yes", action="store_true", help="Skip file count confirmation")
|
|
87
101
|
parser.add_argument(
|
|
@@ -140,6 +154,67 @@ def _pick(cli_value, config: dict, key: str, default=None):
|
|
|
140
154
|
return default
|
|
141
155
|
|
|
142
156
|
|
|
157
|
+
def _find_git_root(start: Path) -> tuple[Path | None, Path | None]:
|
|
158
|
+
for p in [start, *start.parents]:
|
|
159
|
+
git_path = p / ".git"
|
|
160
|
+
if git_path.is_dir():
|
|
161
|
+
return p, git_path
|
|
162
|
+
if git_path.is_file():
|
|
163
|
+
try:
|
|
164
|
+
raw = git_path.read_text(encoding="utf-8", errors="ignore").strip()
|
|
165
|
+
except Exception:
|
|
166
|
+
raw = ""
|
|
167
|
+
if raw.startswith("gitdir:"):
|
|
168
|
+
git_dir = raw.split(":", 1)[1].strip()
|
|
169
|
+
git_dir_path = Path(git_dir)
|
|
170
|
+
if not git_dir_path.is_absolute():
|
|
171
|
+
git_dir_path = (p / git_dir_path).resolve()
|
|
172
|
+
return p, git_dir_path
|
|
173
|
+
return p, None
|
|
174
|
+
return None, None
|
|
175
|
+
|
|
176
|
+
|
|
177
|
+
def _global_ignore_paths(cwd: Path | None = None) -> list[Path]:
|
|
178
|
+
paths: list[Path] = []
|
|
179
|
+
cwd = cwd or Path.cwd()
|
|
180
|
+
if shutil.which("git"):
|
|
181
|
+
try:
|
|
182
|
+
result = subprocess.run(
|
|
183
|
+
["git", "config", "--get", "--path", "core.excludesfile"],
|
|
184
|
+
cwd=cwd,
|
|
185
|
+
capture_output=True,
|
|
186
|
+
text=True,
|
|
187
|
+
check=False,
|
|
188
|
+
)
|
|
189
|
+
value = (result.stdout or "").strip()
|
|
190
|
+
except Exception:
|
|
191
|
+
value = ""
|
|
192
|
+
if value:
|
|
193
|
+
candidate = Path(value).expanduser()
|
|
194
|
+
if candidate.exists():
|
|
195
|
+
paths.append(candidate)
|
|
196
|
+
|
|
197
|
+
xdg_config = os.getenv("XDG_CONFIG_HOME")
|
|
198
|
+
if xdg_config:
|
|
199
|
+
default_path = Path(xdg_config) / "git" / "ignore"
|
|
200
|
+
else:
|
|
201
|
+
default_path = Path.home() / ".config" / "git" / "ignore"
|
|
202
|
+
if default_path.exists():
|
|
203
|
+
paths.append(default_path)
|
|
204
|
+
|
|
205
|
+
legacy = Path.home() / ".gitignore_global"
|
|
206
|
+
if legacy.exists():
|
|
207
|
+
paths.append(legacy)
|
|
208
|
+
|
|
209
|
+
seen: set[Path] = set()
|
|
210
|
+
unique: list[Path] = []
|
|
211
|
+
for p in paths:
|
|
212
|
+
if p not in seen:
|
|
213
|
+
seen.add(p)
|
|
214
|
+
unique.append(p)
|
|
215
|
+
return unique
|
|
216
|
+
|
|
217
|
+
|
|
143
218
|
def _env_value(name: str) -> str | None:
|
|
144
219
|
val = os.getenv(name)
|
|
145
220
|
if val is None:
|
|
@@ -425,12 +500,26 @@ def main(argv: list[str] | None = None) -> int:
|
|
|
425
500
|
if hard_max is not None and hard_max <= 0:
|
|
426
501
|
hard_max = None
|
|
427
502
|
|
|
503
|
+
ignore_spec = None
|
|
504
|
+
ignore_root = None
|
|
505
|
+
if not args.no_ignore:
|
|
506
|
+
git_root, git_dir = _find_git_root(cwd)
|
|
507
|
+
ignore_root = git_root or cwd
|
|
508
|
+
extra_ignores: list[Path] = []
|
|
509
|
+
if git_dir is not None:
|
|
510
|
+
extra_ignores.append(git_dir / "info" / "exclude")
|
|
511
|
+
extra_ignores.extend(_global_ignore_paths(ignore_root))
|
|
512
|
+
ignore_spec = build_ignore_spec(ignore_root, extra_paths=extra_ignores)
|
|
513
|
+
|
|
428
514
|
candidates = collect_candidates(
|
|
429
515
|
input_paths,
|
|
430
516
|
cwd=cwd,
|
|
431
517
|
recursive=args.recursive,
|
|
432
518
|
include_globs=globs,
|
|
433
519
|
type_exts=type_exts,
|
|
520
|
+
include_hidden=args.hidden,
|
|
521
|
+
ignore_spec=ignore_spec,
|
|
522
|
+
ignore_root=ignore_root,
|
|
434
523
|
)
|
|
435
524
|
candidate_count = len(candidates)
|
|
436
525
|
if hard_max is not None and candidate_count > hard_max:
|
|
@@ -565,7 +654,6 @@ def main(argv: list[str] | None = None) -> int:
|
|
|
565
654
|
output_lines = render_matches(
|
|
566
655
|
files=files,
|
|
567
656
|
matches=verified,
|
|
568
|
-
show_line_numbers=args.line_numbers,
|
|
569
657
|
before=before,
|
|
570
658
|
after=after,
|
|
571
659
|
use_color=use_color,
|
|
@@ -2,8 +2,11 @@ from __future__ import annotations
|
|
|
2
2
|
|
|
3
3
|
from dataclasses import dataclass
|
|
4
4
|
from fnmatch import fnmatch
|
|
5
|
+
import os
|
|
5
6
|
from pathlib import Path, PurePosixPath
|
|
6
|
-
from typing import
|
|
7
|
+
from typing import Any, Callable, Iterable
|
|
8
|
+
|
|
9
|
+
import pathspec
|
|
7
10
|
|
|
8
11
|
from pypdf import PdfReader
|
|
9
12
|
|
|
@@ -161,6 +164,97 @@ def collect_files(paths: Iterable[str], recursive: bool = True) -> list[Path]:
|
|
|
161
164
|
return files
|
|
162
165
|
|
|
163
166
|
|
|
167
|
+
IGNORE_FILENAMES = {".gitignore", ".ignore", ".rgignore"}
|
|
168
|
+
|
|
169
|
+
|
|
170
|
+
def build_ignore_spec(
|
|
171
|
+
root: Path, extra_paths: Iterable[Path] | None = None
|
|
172
|
+
) -> "pathspec.PathSpec | None":
|
|
173
|
+
root = root.resolve()
|
|
174
|
+
ignore_paths: list[Path] = []
|
|
175
|
+
extra_paths = list(extra_paths or [])
|
|
176
|
+
|
|
177
|
+
for dirpath, dirnames, filenames in os.walk(root):
|
|
178
|
+
if ".git" in dirnames:
|
|
179
|
+
dirnames.remove(".git")
|
|
180
|
+
for name in filenames:
|
|
181
|
+
if name in IGNORE_FILENAMES:
|
|
182
|
+
ignore_paths.append(Path(dirpath) / name)
|
|
183
|
+
|
|
184
|
+
for extra in extra_paths:
|
|
185
|
+
if extra.exists():
|
|
186
|
+
ignore_paths.append(extra)
|
|
187
|
+
|
|
188
|
+
if not ignore_paths:
|
|
189
|
+
return None
|
|
190
|
+
|
|
191
|
+
def _sort_key(p: Path) -> tuple[int, str]:
|
|
192
|
+
try:
|
|
193
|
+
rel = p.parent.relative_to(root)
|
|
194
|
+
depth = len(rel.parts)
|
|
195
|
+
return depth, rel.as_posix()
|
|
196
|
+
except ValueError:
|
|
197
|
+
return 0, p.as_posix()
|
|
198
|
+
|
|
199
|
+
ignore_paths.sort(key=_sort_key)
|
|
200
|
+
|
|
201
|
+
patterns: list[str] = []
|
|
202
|
+
for gi in ignore_paths:
|
|
203
|
+
try:
|
|
204
|
+
rel_dir = gi.parent.relative_to(root).as_posix()
|
|
205
|
+
except ValueError:
|
|
206
|
+
rel_dir = ""
|
|
207
|
+
if rel_dir in {".", ""}:
|
|
208
|
+
rel_dir = ""
|
|
209
|
+
try:
|
|
210
|
+
raw_lines = gi.read_text(encoding="utf-8", errors="ignore").splitlines()
|
|
211
|
+
except Exception:
|
|
212
|
+
continue
|
|
213
|
+
for raw in raw_lines:
|
|
214
|
+
line = raw.rstrip("\n")
|
|
215
|
+
if not line:
|
|
216
|
+
continue
|
|
217
|
+
escaped = False
|
|
218
|
+
if line.startswith("\\#") or line.startswith("\\!"):
|
|
219
|
+
line = line[1:]
|
|
220
|
+
escaped = True
|
|
221
|
+
if not escaped and line.startswith("#"):
|
|
222
|
+
continue
|
|
223
|
+
|
|
224
|
+
negated = False
|
|
225
|
+
if not escaped and line.startswith("!"):
|
|
226
|
+
negated = True
|
|
227
|
+
line = line[1:]
|
|
228
|
+
if not line:
|
|
229
|
+
continue
|
|
230
|
+
|
|
231
|
+
anchored = False
|
|
232
|
+
if line.startswith("/"):
|
|
233
|
+
anchored = True
|
|
234
|
+
line = line[1:]
|
|
235
|
+
if not line:
|
|
236
|
+
continue
|
|
237
|
+
|
|
238
|
+
if rel_dir:
|
|
239
|
+
if anchored:
|
|
240
|
+
line = f"{rel_dir}/{line}"
|
|
241
|
+
elif "/" in line:
|
|
242
|
+
line = f"{rel_dir}/{line}"
|
|
243
|
+
else:
|
|
244
|
+
line = f"{rel_dir}/**/{line}"
|
|
245
|
+
else:
|
|
246
|
+
if anchored:
|
|
247
|
+
line = f"/{line}"
|
|
248
|
+
|
|
249
|
+
if negated:
|
|
250
|
+
line = "!" + line
|
|
251
|
+
patterns.append(line)
|
|
252
|
+
|
|
253
|
+
if not patterns:
|
|
254
|
+
return None
|
|
255
|
+
return pathspec.PathSpec.from_lines("gitwildmatch", patterns)
|
|
256
|
+
|
|
257
|
+
|
|
164
258
|
TYPE_EXTS = {
|
|
165
259
|
"bash": {".bash"},
|
|
166
260
|
"c": {".c", ".h"},
|
|
@@ -237,21 +331,46 @@ def _matches_globs(path: str, globs: list[str]) -> bool:
|
|
|
237
331
|
return False
|
|
238
332
|
|
|
239
333
|
|
|
334
|
+
def _is_hidden_path(path: Path) -> bool:
|
|
335
|
+
return any(part.startswith(".") for part in path.parts if part)
|
|
336
|
+
|
|
337
|
+
|
|
240
338
|
def collect_candidates(
|
|
241
339
|
paths: Iterable[str],
|
|
242
340
|
cwd: Path,
|
|
243
341
|
recursive: bool = True,
|
|
244
342
|
include_globs: list[str] | None = None,
|
|
245
343
|
type_exts: set[str] | None = None,
|
|
344
|
+
include_hidden: bool = False,
|
|
345
|
+
ignore_spec: "pathspec.PathSpec | None" = None,
|
|
346
|
+
ignore_root: Path | None = None,
|
|
246
347
|
) -> list[Path]:
|
|
247
348
|
files = collect_files(paths, recursive=recursive)
|
|
349
|
+
explicit_files: set[Path] = set()
|
|
350
|
+
for raw in paths:
|
|
351
|
+
p = Path(raw)
|
|
352
|
+
if p.exists() and p.is_file():
|
|
353
|
+
explicit_files.add(p.resolve())
|
|
248
354
|
candidates: list[Path] = []
|
|
249
355
|
for fp in files:
|
|
356
|
+
fp_resolved = fp.resolve()
|
|
357
|
+
is_explicit = fp_resolved in explicit_files
|
|
358
|
+
if not include_hidden and not is_explicit and _is_hidden_path(fp):
|
|
359
|
+
continue
|
|
360
|
+
|
|
250
361
|
try:
|
|
251
362
|
key = fp.relative_to(cwd).as_posix()
|
|
252
363
|
except ValueError:
|
|
253
364
|
key = fp.as_posix()
|
|
254
365
|
|
|
366
|
+
if ignore_spec is not None and ignore_root is not None and not is_explicit:
|
|
367
|
+
try:
|
|
368
|
+
rel = fp.relative_to(ignore_root).as_posix()
|
|
369
|
+
except ValueError:
|
|
370
|
+
rel = None
|
|
371
|
+
if rel and ignore_spec.match_file(rel):
|
|
372
|
+
continue
|
|
373
|
+
|
|
255
374
|
if include_globs and not _matches_globs(key, include_globs):
|
|
256
375
|
continue
|
|
257
376
|
|
|
@@ -23,13 +23,10 @@ def _format_line(
|
|
|
23
23
|
line_no: int,
|
|
24
24
|
text: str,
|
|
25
25
|
is_match: bool,
|
|
26
|
-
show_line_numbers: bool,
|
|
27
26
|
use_color: bool,
|
|
28
27
|
heading: bool,
|
|
29
28
|
) -> str:
|
|
30
29
|
delim = ":" if is_match else "-"
|
|
31
|
-
if not show_line_numbers:
|
|
32
|
-
return text
|
|
33
30
|
prefix = _colorize(str(line_no), COLOR_LINE_NO, use_color)
|
|
34
31
|
sep = "\t" if heading else ""
|
|
35
32
|
return f"{prefix}{delim}{sep}{text}"
|
|
@@ -52,7 +49,6 @@ def _merge_ranges(ranges: list[tuple[int, int]]) -> list[tuple[int, int]]:
|
|
|
52
49
|
def render_matches(
|
|
53
50
|
files: dict[str, FileRecord],
|
|
54
51
|
matches: dict[str, list[int]],
|
|
55
|
-
show_line_numbers: bool,
|
|
56
52
|
before: int,
|
|
57
53
|
after: int,
|
|
58
54
|
use_color: bool = False,
|
|
@@ -86,7 +82,6 @@ def render_matches(
|
|
|
86
82
|
line_no,
|
|
87
83
|
text,
|
|
88
84
|
True,
|
|
89
|
-
show_line_numbers,
|
|
90
85
|
use_color,
|
|
91
86
|
heading,
|
|
92
87
|
)
|
|
@@ -111,7 +106,6 @@ def render_matches(
|
|
|
111
106
|
line_no,
|
|
112
107
|
text,
|
|
113
108
|
is_match,
|
|
114
|
-
show_line_numbers,
|
|
115
109
|
use_color,
|
|
116
110
|
heading,
|
|
117
111
|
)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: rlmgrep
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.17
|
|
4
4
|
Summary: Grep-shaped CLI search powered by DSPy RLM
|
|
5
5
|
Author: rlmgrep
|
|
6
6
|
License: MIT
|
|
@@ -8,11 +8,12 @@ Requires-Python: >=3.11
|
|
|
8
8
|
Description-Content-Type: text/markdown
|
|
9
9
|
Requires-Dist: dspy>=3.1.1
|
|
10
10
|
Requires-Dist: markitdown[all]>=0.1.4
|
|
11
|
+
Requires-Dist: pathspec>=0.12.1
|
|
11
12
|
Requires-Dist: pypdf>=4.0.0
|
|
12
13
|
|
|
13
14
|
# rlmgrep
|
|
14
15
|
|
|
15
|
-
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format.
|
|
16
|
+
Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, scans the files you point at, and prints matching lines in a grep-like format. Use `--answer` to get a narrative response grounded in the selected files/directories.
|
|
16
17
|
|
|
17
18
|
## Quickstart
|
|
18
19
|
|
|
@@ -22,9 +23,20 @@ uv tool install rlmgrep
|
|
|
22
23
|
# uv tool install git+https://github.com/halfprice06/rlmgrep.git
|
|
23
24
|
|
|
24
25
|
export OPENAI_API_KEY=... # or set keys in ~/.rlmgrep
|
|
25
|
-
rlmgrep "where are API keys read" rlmgrep/
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
+
```sh
|
|
29
|
+
rlmgrep --answer "What does this repo do and where are the entry points?" .
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+

|
|
33
|
+
|
|
34
|
+
```sh
|
|
35
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+

|
|
39
|
+
|
|
28
40
|
## Requirements
|
|
29
41
|
|
|
30
42
|
- Python 3.11+
|
|
@@ -38,8 +50,8 @@ One of rlmgrep’s most useful features is that it can “grep” **PDFs and Off
|
|
|
38
50
|
How it works:
|
|
39
51
|
- **PDFs** are parsed with `pypdf`. Each page gets a marker line like `===== Page N =====`, and output lines include a `page=N` suffix. Line numbers refer to the extracted text (not PDF coordinates).
|
|
40
52
|
- **Office & binary docs** (`.docx`, `.pptx`, `.xlsx`, `.html`, `.zip`, etc.) are converted to Markdown via **MarkItDown**. This happens during ingestion, so rlmgrep can search them like any other text file.
|
|
41
|
-
- **Images** can be described by a vision model through MarkItDown (OpenAI/Anthropic/Gemini).
|
|
42
|
-
- **Audio** transcription is supported through OpenAI when enabled.
|
|
53
|
+
- **Images** can be described by a vision model and then searched through MarkItDown (OpenAI/Anthropic/Gemini), enable and configure in config.toml.
|
|
54
|
+
- **Audio** transcription is supported through OpenAI when enabled, configure in config.toml.
|
|
43
55
|
|
|
44
56
|
Sidecar caching:
|
|
45
57
|
- For images/audio, converted text is cached next to the original file as `<original>.<ext>.md` and reused on later runs.
|
|
@@ -47,7 +59,7 @@ Sidecar caching:
|
|
|
47
59
|
|
|
48
60
|
## Install Deno
|
|
49
61
|
|
|
50
|
-
DSPy requires the Deno runtime. Install it with the official scripts:
|
|
62
|
+
DSPy's default implementation of RLM requires the Deno runtime. Install it with the official scripts:
|
|
51
63
|
|
|
52
64
|
macOS/Linux:
|
|
53
65
|
|
|
@@ -75,12 +87,15 @@ rlmgrep [options] "query" [paths...]
|
|
|
75
87
|
|
|
76
88
|
Common options:
|
|
77
89
|
|
|
90
|
+
- `--answer` return a narrative answer before the grep output
|
|
78
91
|
- `-C N` context lines before/after (grep-style)
|
|
79
92
|
- `-A N` context lines after
|
|
80
93
|
- `-B N` context lines before
|
|
81
94
|
- `-m N` max matching lines per file
|
|
82
95
|
- `-g GLOB` include files matching glob (repeatable, comma-separated)
|
|
83
96
|
- `--type T` include file types (repeatable, comma-separated)
|
|
97
|
+
- `--hidden` include hidden files and directories
|
|
98
|
+
- `--no-ignore` do not respect `.gitignore`
|
|
84
99
|
- `--no-recursive` do not recurse directories
|
|
85
100
|
- `-a`, `--text` treat binary files as text
|
|
86
101
|
- `-y`, `--yes` skip file count confirmation
|
|
@@ -95,7 +110,7 @@ Examples:
|
|
|
95
110
|
|
|
96
111
|
```sh
|
|
97
112
|
# Natural-language query over a repo
|
|
98
|
-
rlmgrep -
|
|
113
|
+
rlmgrep -C 2 "Where is retry/backoff configured and what are the defaults?" .
|
|
99
114
|
|
|
100
115
|
# Restrict to Python files
|
|
101
116
|
rlmgrep "Where do we parse JWTs and enforce expiration?" --type py .
|
|
@@ -113,6 +128,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
113
128
|
## Input selection
|
|
114
129
|
|
|
115
130
|
- Directories are searched recursively by default. Use `--no-recursive` to stop recursion.
|
|
131
|
+
- Hidden files and ignore files (`.gitignore`, `.ignore`, `.rgignore`) are respected by default. Use `--hidden` or `--no-ignore` to include them.
|
|
116
132
|
- `--type` uses built-in type mappings (e.g., `py`, `js`, `md`); unknown values are treated as file extensions.
|
|
117
133
|
- `-g/--glob` matches path globs against normalized paths (forward slashes).
|
|
118
134
|
- Paths are printed relative to the current working directory when possible.
|
|
@@ -125,7 +141,7 @@ rg -l "token" . | rlmgrep --files-from-stdin --answer "What does this token cont
|
|
|
125
141
|
- Output uses rg-style headings by default:
|
|
126
142
|
- A file header line like `./path/to/file`
|
|
127
143
|
- Then `line:\ttext` for matches, `line-\ttext` for context lines
|
|
128
|
-
- Line numbers are 1-based.
|
|
144
|
+
- Line numbers are always included and are 1-based.
|
|
129
145
|
- When context ranges are disjoint, a `--` line separates groups.
|
|
130
146
|
- Exit codes:
|
|
131
147
|
- `0` = at least one match
|
|
@@ -140,7 +156,7 @@ rlmgrep can interpret traditional regex-style patterns inside a natural-language
|
|
|
140
156
|
Example (best-effort regex semantics + extra context):
|
|
141
157
|
|
|
142
158
|
```sh
|
|
143
|
-
rlmgrep
|
|
159
|
+
rlmgrep "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
|
|
144
160
|
```
|
|
145
161
|
|
|
146
162
|
If you need strict, deterministic regex behavior, use `rg`/`grep`.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|