rlmgrep 0.1.0__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: rlmgrep
3
- Version: 0.1.0
3
+ Version: 0.1.2
4
4
  Summary: Grep-shaped CLI search powered by DSPy RLM
5
5
  Author: rlmgrep
6
6
  License: MIT
@@ -17,7 +17,7 @@ Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, sca
17
17
  ## Quickstart
18
18
 
19
19
  ```sh
20
- uv tool install --python 3.11 .
20
+ uv tool install --python 3.11 rlmgrep
21
21
  # or from GitHub:
22
22
  # uv tool install --python 3.11 git+https://github.com/halfprice06/rlmgrep.git
23
23
 
@@ -71,6 +71,8 @@ Common options:
71
71
  - `--type T` include file types (repeatable, comma-separated)
72
72
  - `--no-recursive` do not recurse directories
73
73
  - `-a`, `--text` treat binary files as text
74
+ - `-y`, `--yes` skip file count confirmation
75
+ - `--stdin-files` treat stdin as newline-delimited file paths
74
76
  - `--model`, `--sub-model` override model names
75
77
  - `--api-key`, `--api-base`, `--model-type` override provider settings
76
78
  - `--max-iterations`, `--max-llm-calls` cap RLM search effort
@@ -90,6 +92,9 @@ rlmgrep "error handling" -g "**/*.py" -g "**/*.md" .
90
92
 
91
93
  # Read from stdin (only when no paths are provided)
92
94
  cat README.md | rlmgrep "install"
95
+
96
+ # Use rg/grep to find candidate files, then rlmgrep over that list
97
+ rg -l "token" . | rlmgrep --stdin-files --answer "what does this token control?"
93
98
  ```
94
99
 
95
100
  ## Input selection
@@ -99,6 +104,7 @@ cat README.md | rlmgrep "install"
99
104
  - `-g/--glob` matches path globs against normalized paths (forward slashes).
100
105
  - Paths are printed relative to the current working directory when possible.
101
106
  - If no paths are provided, rlmgrep reads from stdin and uses the synthetic path `<stdin>`; if stdin is empty, it exits with code 2.
107
+ - rlmgrep asks for confirmation when more than 200 files would be loaded (use `-y/--yes` to skip), and aborts when more than 1000 files would be loaded.
102
108
 
103
109
  ## Output contract (stable for agents)
104
110
 
@@ -116,6 +122,18 @@ cat README.md | rlmgrep "install"
116
122
 
117
123
  Agent tip: use `-n -H` and no context for parse-friendly output, then key off exit codes.
118
124
 
125
+ ## Regex-style queries (best effort)
126
+
127
+ rlmgrep can interpret traditional regex-style patterns inside a natural-language prompt. The RLM may use Python (including `re`) in its internal REPL to approximate regex logic, but it is **not guaranteed** to behave exactly like `grep`/`rg`.
128
+
129
+ Example (best-effort regex semantics + extra context):
130
+
131
+ ```sh
132
+ rlmgrep -n "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
133
+ ```
134
+
135
+ If you need strict, deterministic regex behavior, use `rg`/`grep`.
136
+
119
137
  ## Configuration
120
138
 
121
139
  rlmgrep creates a default config automatically if missing. The config path is:
@@ -133,6 +151,8 @@ temperature = 1.0
133
151
  max_tokens = 64000
134
152
  max_iterations = 10
135
153
  max_llm_calls = 20
154
+ file_warn_threshold = 200
155
+ file_hard_max = 1000
136
156
  # markitdown_enable_images = false
137
157
  # markitdown_image_llm_model = "gpt-5-mini"
138
158
  # markitdown_image_llm_provider = "openai"
@@ -168,10 +188,8 @@ If more than one provider key is set and the model does not make the provider ob
168
188
 
169
189
  - Prefer narrow corpora (globs/types) to reduce token usage.
170
190
  - Use `--max-llm-calls` to cap costs; combine with small `--max-iterations` for safety.
171
- - Always read stderr for warnings (skipped files, config issues, ambiguous API keys).
172
191
  - For reproducible parsing, use `-n -H` and avoid context (`-C/-A/-B`).
173
- - RLM results are verified against real file lines; invalid or duplicate matches are dropped and reported.
174
-
192
+
175
193
  ## Development
176
194
 
177
195
  - Install locally: `pip install -e .` or `uv tool install .`
@@ -5,7 +5,7 @@ Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, sca
5
5
  ## Quickstart
6
6
 
7
7
  ```sh
8
- uv tool install --python 3.11 .
8
+ uv tool install --python 3.11 rlmgrep
9
9
  # or from GitHub:
10
10
  # uv tool install --python 3.11 git+https://github.com/halfprice06/rlmgrep.git
11
11
 
@@ -59,6 +59,8 @@ Common options:
59
59
  - `--type T` include file types (repeatable, comma-separated)
60
60
  - `--no-recursive` do not recurse directories
61
61
  - `-a`, `--text` treat binary files as text
62
+ - `-y`, `--yes` skip file count confirmation
63
+ - `--stdin-files` treat stdin as newline-delimited file paths
62
64
  - `--model`, `--sub-model` override model names
63
65
  - `--api-key`, `--api-base`, `--model-type` override provider settings
64
66
  - `--max-iterations`, `--max-llm-calls` cap RLM search effort
@@ -78,6 +80,9 @@ rlmgrep "error handling" -g "**/*.py" -g "**/*.md" .
78
80
 
79
81
  # Read from stdin (only when no paths are provided)
80
82
  cat README.md | rlmgrep "install"
83
+
84
+ # Use rg/grep to find candidate files, then rlmgrep over that list
85
+ rg -l "token" . | rlmgrep --stdin-files --answer "what does this token control?"
81
86
  ```
82
87
 
83
88
  ## Input selection
@@ -87,6 +92,7 @@ cat README.md | rlmgrep "install"
87
92
  - `-g/--glob` matches path globs against normalized paths (forward slashes).
88
93
  - Paths are printed relative to the current working directory when possible.
89
94
  - If no paths are provided, rlmgrep reads from stdin and uses the synthetic path `<stdin>`; if stdin is empty, it exits with code 2.
95
+ - rlmgrep asks for confirmation when more than 200 files would be loaded (use `-y/--yes` to skip), and aborts when more than 1000 files would be loaded.
90
96
 
91
97
  ## Output contract (stable for agents)
92
98
 
@@ -104,6 +110,18 @@ cat README.md | rlmgrep "install"
104
110
 
105
111
  Agent tip: use `-n -H` and no context for parse-friendly output, then key off exit codes.
106
112
 
113
+ ## Regex-style queries (best effort)
114
+
115
+ rlmgrep can interpret traditional regex-style patterns inside a natural-language prompt. The RLM may use Python (including `re`) in its internal REPL to approximate regex logic, but it is **not guaranteed** to behave exactly like `grep`/`rg`.
116
+
117
+ Example (best-effort regex semantics + extra context):
118
+
119
+ ```sh
120
+ rlmgrep -n "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
121
+ ```
122
+
123
+ If you need strict, deterministic regex behavior, use `rg`/`grep`.
124
+
107
125
  ## Configuration
108
126
 
109
127
  rlmgrep creates a default config automatically if missing. The config path is:
@@ -121,6 +139,8 @@ temperature = 1.0
121
139
  max_tokens = 64000
122
140
  max_iterations = 10
123
141
  max_llm_calls = 20
142
+ file_warn_threshold = 200
143
+ file_hard_max = 1000
124
144
  # markitdown_enable_images = false
125
145
  # markitdown_image_llm_model = "gpt-5-mini"
126
146
  # markitdown_image_llm_provider = "openai"
@@ -156,10 +176,8 @@ If more than one provider key is set and the model does not make the provider ob
156
176
 
157
177
  - Prefer narrow corpora (globs/types) to reduce token usage.
158
178
  - Use `--max-llm-calls` to cap costs; combine with small `--max-iterations` for safety.
159
- - Always read stderr for warnings (skipped files, config issues, ambiguous API keys).
160
179
  - For reproducible parsing, use `-n -H` and avoid context (`-C/-A/-B`).
161
- - RLM results are verified against real file lines; invalid or duplicate matches are dropped and reported.
162
-
180
+
163
181
  ## Development
164
182
 
165
183
  - Install locally: `pip install -e .` or `uv tool install .`
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "rlmgrep"
3
- version = "0.1.0"
3
+ version = "0.1.2"
4
4
  description = "Grep-shaped CLI search powered by DSPy RLM"
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.11"
@@ -8,7 +8,7 @@ from pathlib import Path
8
8
  import dspy
9
9
  from .config import ensure_default_config, load_config
10
10
  from .file_map import build_file_map
11
- from .ingest import FileRecord, load_files, resolve_type_exts
11
+ from .ingest import FileRecord, collect_candidates, load_files, resolve_type_exts
12
12
  from .rlm import Match, build_lm, run_rlm
13
13
  from .render import render_matches
14
14
 
@@ -17,6 +17,23 @@ def _warn(msg: str) -> None:
17
17
  print(f"rlmgrep: {msg}", file=sys.stderr)
18
18
 
19
19
 
20
+ def _confirm_over_limit(count: int, threshold: int) -> bool:
21
+ prompt = (
22
+ f"rlmgrep: {count} files to load (over {threshold}). Continue? [y/N] "
23
+ )
24
+ try:
25
+ with open("/dev/tty", "r+") as tty:
26
+ print(prompt, file=tty, end="", flush=True)
27
+ response = tty.readline()
28
+ except Exception:
29
+ if not sys.stdin.isatty():
30
+ _warn("refusing to prompt for confirmation; use --yes to proceed")
31
+ return False
32
+ print(prompt, file=sys.stderr, end="", flush=True)
33
+ response = sys.stdin.readline()
34
+ return response.strip().lower() in {"y", "yes"}
35
+
36
+
20
37
  def verify_matches(
21
38
  matches: list[Match],
22
39
  files: dict[str, FileRecord],
@@ -65,6 +82,12 @@ def _parse_args(argv: list[str]) -> argparse.Namespace:
65
82
  parser.add_argument("-m", dest="max_count", type=int, default=None, help="Max matching lines per file")
66
83
  parser.add_argument("-a", "--text", dest="binary_as_text", action="store_true", help="Search binary files as text")
67
84
  parser.add_argument("--answer", action="store_true", help="Print a narrative answer before grep output")
85
+ parser.add_argument("-y", "--yes", action="store_true", help="Skip file count confirmation")
86
+ parser.add_argument(
87
+ "--stdin-files",
88
+ action="store_true",
89
+ help="Treat stdin as newline-delimited file paths",
90
+ )
68
91
 
69
92
  parser.add_argument("-g", "--glob", dest="globs", action="append", default=[], help="Include files matching glob (may repeat)")
70
93
  parser.add_argument("--type", dest="types", action="append", default=[], help="Include file types (py, js, md, etc.). May repeat")
@@ -318,22 +341,65 @@ def main(argv: list[str] | None = None) -> int:
318
341
  for w in md_warnings:
319
342
  _warn(w)
320
343
 
321
- if not args.paths:
344
+ input_paths: list[str] | None = None
345
+ stdin_text: str | None = None
346
+ if args.paths:
347
+ input_paths = list(args.paths)
348
+ elif args.stdin_files:
349
+ if sys.stdin.isatty():
350
+ _warn("no input paths and stdin is empty")
351
+ return 2
352
+ raw = sys.stdin.read()
353
+ input_paths = [line.strip() for line in raw.splitlines() if line.strip()]
354
+ if not input_paths:
355
+ _warn("stdin contained no file paths")
356
+ return 2
357
+ else:
322
358
  if sys.stdin.isatty():
323
359
  _warn("no input paths and stdin is empty")
324
360
  return 2
325
- text = sys.stdin.read()
361
+ stdin_text = sys.stdin.read()
362
+
363
+ if input_paths is None:
364
+ text = stdin_text or ""
326
365
  files = {
327
366
  "<stdin>": FileRecord(path="<stdin>", text=text, lines=text.split("\n"))
328
367
  }
329
368
  warnings: list[str] = []
330
369
  else:
331
- files, warnings = load_files(
332
- args.paths,
370
+ warn_threshold = _parse_num(
371
+ _pick(None, config, "file_warn_threshold", 200), int
372
+ )
373
+ hard_max = _parse_num(_pick(None, config, "file_hard_max", 1000), int)
374
+ if warn_threshold is not None and warn_threshold <= 0:
375
+ warn_threshold = None
376
+ if hard_max is not None and hard_max <= 0:
377
+ hard_max = None
378
+
379
+ candidates = collect_candidates(
380
+ input_paths,
333
381
  cwd=cwd,
334
382
  recursive=args.recursive,
335
383
  include_globs=globs,
336
384
  type_exts=type_exts,
385
+ )
386
+ candidate_count = len(candidates)
387
+ if hard_max is not None and candidate_count > hard_max:
388
+ _warn(
389
+ f"{candidate_count} files to load (over {hard_max}); aborting"
390
+ )
391
+ return 2
392
+ if (
393
+ warn_threshold is not None
394
+ and candidate_count > warn_threshold
395
+ and not args.yes
396
+ ):
397
+ if not _confirm_over_limit(candidate_count, warn_threshold):
398
+ return 2
399
+
400
+ files, warnings = load_files(
401
+ candidates,
402
+ cwd=cwd,
337
403
  markitdown=markitdown,
338
404
  enable_images=md_enable_images,
339
405
  enable_audio=md_enable_audio,
@@ -3,10 +3,7 @@ from __future__ import annotations
3
3
  from pathlib import Path
4
4
  from typing import Any
5
5
 
6
- try: # Python 3.11+
7
- import tomllib as _tomllib # type: ignore
8
- except Exception: # pragma: no cover - fallback
9
- import tomli as _tomllib # type: ignore
6
+ import tomllib
10
7
 
11
8
 
12
9
  DEFAULT_CONFIG_TEXT = "\n".join(
@@ -19,6 +16,8 @@ DEFAULT_CONFIG_TEXT = "\n".join(
19
16
  "max_tokens = 64000",
20
17
  "max_iterations = 10",
21
18
  "max_llm_calls = 20",
19
+ "file_warn_threshold = 200",
20
+ "file_hard_max = 1000",
22
21
  "# markitdown_enable_images = false",
23
22
  "# markitdown_image_llm_model = \"gpt-5-mini\"",
24
23
  "# markitdown_image_llm_provider = \"openai\"",
@@ -65,7 +64,7 @@ def load_config(path: Path | None = None) -> tuple[dict[str, Any], list[str]]:
65
64
  return {}, [f"config path is not a file: {config_path}"]
66
65
 
67
66
  try:
68
- data = _tomllib.loads(config_path.read_text())
67
+ data = tomllib.loads(config_path.read_text())
69
68
  except Exception as exc: # pragma: no cover - defensive
70
69
  return {}, [f"failed to read config {config_path}: {exc}"]
71
70
 
@@ -237,12 +237,34 @@ def _matches_globs(path: str, globs: list[str]) -> bool:
237
237
  return False
238
238
 
239
239
 
240
- def load_files(
240
+ def collect_candidates(
241
241
  paths: Iterable[str],
242
242
  cwd: Path,
243
243
  recursive: bool = True,
244
244
  include_globs: list[str] | None = None,
245
245
  type_exts: set[str] | None = None,
246
+ ) -> list[Path]:
247
+ files = collect_files(paths, recursive=recursive)
248
+ candidates: list[Path] = []
249
+ for fp in files:
250
+ try:
251
+ key = fp.relative_to(cwd).as_posix()
252
+ except ValueError:
253
+ key = fp.as_posix()
254
+
255
+ if include_globs and not _matches_globs(key, include_globs):
256
+ continue
257
+
258
+ if type_exts and fp.suffix.lower() not in type_exts:
259
+ continue
260
+
261
+ candidates.append(fp)
262
+ return candidates
263
+
264
+
265
+ def load_files(
266
+ candidates: Iterable[Path],
267
+ cwd: Path,
246
268
  markitdown: Any | None = None,
247
269
  enable_images: bool = False,
248
270
  enable_audio: bool = False,
@@ -254,20 +276,12 @@ def load_files(
254
276
  image_convert_count = 0
255
277
  audio_convert_count = 0
256
278
 
257
- files = collect_files(paths, recursive=recursive)
258
- for fp in files:
279
+ for fp in candidates:
259
280
  try:
260
281
  key = fp.relative_to(cwd).as_posix()
261
282
  except ValueError:
262
283
  key = fp.as_posix()
263
284
 
264
- if include_globs and not _matches_globs(key, include_globs):
265
- continue
266
-
267
- if type_exts:
268
- if fp.suffix.lower() not in type_exts:
269
- continue
270
-
271
285
  suffix = fp.suffix.lower()
272
286
  if markitdown is not None and not binary_as_text:
273
287
  if enable_images and suffix in IMAGE_EXTS:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: rlmgrep
3
- Version: 0.1.0
3
+ Version: 0.1.2
4
4
  Summary: Grep-shaped CLI search powered by DSPy RLM
5
5
  Author: rlmgrep
6
6
  License: MIT
@@ -17,7 +17,7 @@ Grep-shaped search powered by DSPy RLM. It accepts a natural-language query, sca
17
17
  ## Quickstart
18
18
 
19
19
  ```sh
20
- uv tool install --python 3.11 .
20
+ uv tool install --python 3.11 rlmgrep
21
21
  # or from GitHub:
22
22
  # uv tool install --python 3.11 git+https://github.com/halfprice06/rlmgrep.git
23
23
 
@@ -71,6 +71,8 @@ Common options:
71
71
  - `--type T` include file types (repeatable, comma-separated)
72
72
  - `--no-recursive` do not recurse directories
73
73
  - `-a`, `--text` treat binary files as text
74
+ - `-y`, `--yes` skip file count confirmation
75
+ - `--stdin-files` treat stdin as newline-delimited file paths
74
76
  - `--model`, `--sub-model` override model names
75
77
  - `--api-key`, `--api-base`, `--model-type` override provider settings
76
78
  - `--max-iterations`, `--max-llm-calls` cap RLM search effort
@@ -90,6 +92,9 @@ rlmgrep "error handling" -g "**/*.py" -g "**/*.md" .
90
92
 
91
93
  # Read from stdin (only when no paths are provided)
92
94
  cat README.md | rlmgrep "install"
95
+
96
+ # Use rg/grep to find candidate files, then rlmgrep over that list
97
+ rg -l "token" . | rlmgrep --stdin-files --answer "what does this token control?"
93
98
  ```
94
99
 
95
100
  ## Input selection
@@ -99,6 +104,7 @@ cat README.md | rlmgrep "install"
99
104
  - `-g/--glob` matches path globs against normalized paths (forward slashes).
100
105
  - Paths are printed relative to the current working directory when possible.
101
106
  - If no paths are provided, rlmgrep reads from stdin and uses the synthetic path `<stdin>`; if stdin is empty, it exits with code 2.
107
+ - rlmgrep asks for confirmation when more than 200 files would be loaded (use `-y/--yes` to skip), and aborts when more than 1000 files would be loaded.
102
108
 
103
109
  ## Output contract (stable for agents)
104
110
 
@@ -116,6 +122,18 @@ cat README.md | rlmgrep "install"
116
122
 
117
123
  Agent tip: use `-n -H` and no context for parse-friendly output, then key off exit codes.
118
124
 
125
+ ## Regex-style queries (best effort)
126
+
127
+ rlmgrep can interpret traditional regex-style patterns inside a natural-language prompt. The RLM may use Python (including `re`) in its internal REPL to approximate regex logic, but it is **not guaranteed** to behave exactly like `grep`/`rg`.
128
+
129
+ Example (best-effort regex semantics + extra context):
130
+
131
+ ```sh
132
+ rlmgrep -n "Find Python functions that look like `def test_\\w+` and are marked as slow or flaky in nearby comments." .
133
+ ```
134
+
135
+ If you need strict, deterministic regex behavior, use `rg`/`grep`.
136
+
119
137
  ## Configuration
120
138
 
121
139
  rlmgrep creates a default config automatically if missing. The config path is:
@@ -133,6 +151,8 @@ temperature = 1.0
133
151
  max_tokens = 64000
134
152
  max_iterations = 10
135
153
  max_llm_calls = 20
154
+ file_warn_threshold = 200
155
+ file_hard_max = 1000
136
156
  # markitdown_enable_images = false
137
157
  # markitdown_image_llm_model = "gpt-5-mini"
138
158
  # markitdown_image_llm_provider = "openai"
@@ -168,10 +188,8 @@ If more than one provider key is set and the model does not make the provider ob
168
188
 
169
189
  - Prefer narrow corpora (globs/types) to reduce token usage.
170
190
  - Use `--max-llm-calls` to cap costs; combine with small `--max-iterations` for safety.
171
- - Always read stderr for warnings (skipped files, config issues, ambiguous API keys).
172
191
  - For reproducible parsing, use `-n -H` and avoid context (`-C/-A/-B`).
173
- - RLM results are verified against real file lines; invalid or duplicate matches are dropped and reported.
174
-
192
+
175
193
  ## Development
176
194
 
177
195
  - Install locally: `pip install -e .` or `uv tool install .`
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes