llmlint-cli 0.3.4__py3-none-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,789 @@
1
+ Metadata-Version: 2.4
2
+ Name: llmlint-cli
3
+ Version: 0.3.4
4
+ Classifier: Development Status :: 4 - Beta
5
+ Classifier: Environment :: Console
6
+ Classifier: Intended Audience :: Developers
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Operating System :: OS Independent
9
+ Classifier: Programming Language :: Rust
10
+ Classifier: Topic :: Software Development :: Quality Assurance
11
+ License-File: LICENSE
12
+ Summary: LLM-as-judge linter: enforce code-quality checks deterministic linters can't express, by driving real coding harnesses through oneharness.
13
+ Keywords: cli,linter,llm,code-review,oneharness
14
+ Home-Page: https://github.com/nickderobertis/llmlint
15
+ Requires-Python: >=3.8
16
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
17
+ Project-URL: Changelog, https://github.com/nickderobertis/llmlint/blob/main/CHANGELOG.md
18
+ Project-URL: Repository, https://github.com/nickderobertis/llmlint
19
+
20
+ # llmlint
21
+
22
+ ![llmlint's live progress view: rules resolving one by one as their judges return, then clearing to reveal the report](docs/screenshots/demo.gif)
23
+
24
+ **The next generation of linting: an LLM as a judge.** `llmlint` enforces the
25
+ code-quality checks a human reviewer normally makes — adherence to architectural
26
+ patterns, coding-style intent, alignment to organization objectives — that
27
+ deterministic linters can't express. It is **additive** to your existing linters,
28
+ not a replacement: keep using deterministic tools for everything they can already
29
+ check, and reach for llmlint only for the judgment calls.
30
+
31
+ Each check is a **rule**: a statement about your code that is judged `true`
32
+ (holds) or `false` (a violation). A single fast Rust binary, llmlint **batches**
33
+ your rules into as few harness calls as it can, then drives a real coding harness
34
+ (Claude Code, Codex, Cursor, …) through
35
+ [`oneharness`](https://github.com/nickderobertis/oneharness) to read the relevant
36
+ files and decide, and reports the violations — with file and line numbers where
37
+ they can be pinned down. Because the gate is "just a config file," llmlint drops
38
+ into CI next to your other linters.
39
+
40
+ By default llmlint reports the failing rules (with the locations it could pin
41
+ down) and a one-line summary — passing, skipped, and not-relevant rules are just
42
+ counted:
43
+
44
+ ![llmlint's default report: a red FAIL with its pinned violation and a colorized summary line counting passed, failed, skipped, and not-relevant rules](docs/screenshots/lint-default.svg)
45
+
46
+ Add `-v` to itemize *every* rule (passed, skipped, and not relevant too) and to
47
+ print the oneharness debug view — the exact `oneharness run …` command and the
48
+ raw result for each judge — to **stderr**, so the report on stdout stays clean:
49
+
50
+ ![llmlint's verbose report: green PASS, red FAIL, yellow SKIP, and dim N/A not-relevant lines above the summary](docs/screenshots/lint-verbose.svg)
51
+
52
+ <details>
53
+ <summary>The <code>-v</code> debug view (oneharness command + raw result per judge, on stderr)</summary>
54
+
55
+ ![llmlint's -v oneharness debug view: the exact oneharness run command and the raw JSON result for each judge](docs/screenshots/lint-debug.svg)
56
+
57
+ </details>
58
+
59
+ > These are real captures of the CLI, rendered from the actual colorized output
60
+ > by [`just screenshots`](screenshots/AGENTS.md) and gated by
61
+ > [screencomp](https://github.com/nickderobertis/screencomp).
62
+
63
+ The exit code is unaffected by verbosity (`0` all-pass, `1` a violation, `2`
64
+ the run couldn't complete); operational errors are always shown. Use
65
+ `--format json` for the full machine-readable report.
66
+
67
+ The human report is **colorized** — green `PASS`, red `FAIL`/`ERROR` — when
68
+ stdout is a terminal. Coloring follows the [`NO_COLOR`](https://no-color.org)
69
+ convention and a `--color <auto|always|never>` flag: `auto` (the default) colors
70
+ only an interactive terminal, `always` forces it (e.g. through a pager or to
71
+ capture a screenshot), `never` disables it. `--format json` is never colorized.
72
+
73
+ While the judges run, llmlint draws a **live progress view** on stderr — rules
74
+ resolving one by one as their judges return (the GIF above) — then clears it and
75
+ prints the report. Like the color, it is audience-aware: `--progress
76
+ <auto|always|never>` (default `auto`) shows it only for an interactive human — a
77
+ terminal, not CI, and **not an AI coding agent** (Claude Code, Codex, Cursor,
78
+ detected via their environment variables). Piped, redirected, in CI, or captured
79
+ by an agent, the view is fully suppressed so it never spams a log or an agent's
80
+ context — the report on stdout, and `--format json`, are byte-for-byte the same
81
+ either way.
82
+
83
+ ## How it works
84
+
85
+ 1. You declare **rules** (and optionally **agents** that group them) in a YAML
86
+ config — like any other linter.
87
+ 2. For each agent, llmlint renders a system prompt from a template (the rules +
88
+ the target file paths) and calls `oneharness run` with a generated **JSON
89
+ Schema** for structured output. oneharness constrains and validates the
90
+ harness's answer, so llmlint gets a checked verdict per rule, not prose.
91
+ 3. The harness reads the target files on demand with its own tools to gather
92
+ evidence, then returns `{ "rule_name": { "holds": bool, "violations": [...] } }`.
93
+ 4. llmlint aggregates (majority vote across judges when configured), reports, and
94
+ exits non-zero if any rule was violated.
95
+
96
+ llmlint **shells out to oneharness** — it is a runtime prerequisite (see Install).
97
+
98
+ ## Install
99
+
100
+ `llmlint` needs the `oneharness` binary on your `PATH`.
101
+
102
+ ```console
103
+ # 1) oneharness (the harness driver)
104
+ curl -fsSL https://raw.githubusercontent.com/nickderobertis/oneharness/main/scripts/install.sh | sh
105
+ # (or: cargo install --git https://github.com/nickderobertis/oneharness --locked)
106
+
107
+ # 2) llmlint
108
+ curl -fsSL https://raw.githubusercontent.com/nickderobertis/llmlint/main/scripts/install.sh | sh
109
+ # (or: pip install llmlint-cli — prebuilt binary wheel, no Rust toolchain needed)
110
+ # (or: cargo install llmlint --locked)
111
+ # (or, without a crates.io release: cargo install --git https://github.com/nickderobertis/llmlint --locked)
112
+
113
+ llmlint doctor # confirms oneharness is reachable
114
+ ```
115
+
116
+ The installer honors `LLMLINT_VERSION` / `LLMLINT_INSTALL_DIR` (or the `--version`
117
+ / `--to` flags), works on Linux, macOS, and Windows under a POSIX shell
118
+ (Git Bash / MSYS / WSL), and refuses a binary it cannot verify. Each tagged
119
+ release publishes prebuilt binaries for those platforms, each with a SHA-256
120
+ checksum and a keyless [Sigstore](https://www.sigstore.dev/) build-provenance
121
+ attestation bundle (`.sigstore.json`); on native Windows PowerShell, use
122
+ `cargo install llmlint --locked`.
123
+
124
+ **Via pip.** Each release also ships per-platform wheels wrapping the same
125
+ prebuilt binary, so anywhere Python is present, `pip install llmlint-cli` (or
126
+ `uv tool install llmlint-cli` / `pipx install llmlint-cli`) is a seconds-fast
127
+ binary install with no Rust toolchain — handy in restricted-egress environments
128
+ where package registries are reachable but `github.com` is not. (The PyPI
129
+ package is `llmlint-cli` — PyPI reserves names too similar to existing projects
130
+ — but the installed binary is `llmlint`.) Wheels are published
131
+ with PyPI [Trusted Publishing](https://docs.pypi.org/trusted-publishers/) and
132
+ carry [PEP 740](https://peps.python.org/pep-0740/) attestations — the same
133
+ Sigstore build provenance as the GitHub release assets.
134
+
135
+ **Behind a mirror.** In a network that can reach a release-proxy mirror but not
136
+ `github.com`, point the archive download at it:
137
+
138
+ ```console
139
+ LLMLINT_RELEASE_BASE_URL=https://mirror.example/llmlint \
140
+ curl -fsSL https://raw.githubusercontent.com/nickderobertis/llmlint/main/scripts/install.sh | sh
141
+ ```
142
+
143
+ The archive comes from the mirror, but its integrity is checked against a trust
144
+ root the mirror does not control. If a Sigstore verifier is installed —
145
+ [`cosign`](https://github.com/sigstore/cosign), the official
146
+ [`sigstore`](https://pypi.org/project/sigstore/) Python client, or
147
+ [`gh`](https://cli.github.com/) — the installer downloads the `.sigstore.json`
148
+ bundle from the mirror and verifies it **offline** — no GitHub API — against the
149
+ keyless signature bound to this repo's release workflow; the trusted digest comes
150
+ from the *signed* attestation, so a mirror cannot forge it. With no verifier
151
+ installed it falls back to a `.sha256` fetched from canonical GitHub
152
+ (`LLMLINT_CHECKSUM_BASE_URL` overrides that root) — but it **refuses** a checksum
153
+ that shares the mirror's origin, since a tampered mirror would just serve a
154
+ matching tampered checksum. If nothing independent of the mirror can vouch for
155
+ the archive, the install aborts.
156
+
157
+ A verifier is one install away even where `github.com` itself is unreachable,
158
+ because all three ship through package registries:
159
+
160
+ ```console
161
+ pip install sigstore # PyPI
162
+ npm install -g @sigstore/cli # npm
163
+ go install github.com/sigstore/cosign/v2/cmd/cosign@latest # Go module proxy
164
+ ```
165
+
166
+ (`go install` fetches through `proxy.golang.org` and integrity-checks against the
167
+ `sum.golang.org` transparency log — it never contacts github.com.) So the pattern
168
+ for a restricted-egress host or sandbox image is: provision one verifier from
169
+ whichever registry is reachable, then run the installer with
170
+ `LLMLINT_RELEASE_BASE_URL` pointing at your mirror — every byte comes from the
171
+ mirror or a registry, and the signature still chains to the Sigstore root.
172
+ Elsewhere, `brew install cosign`, a distro sigstore package, or the
173
+ [`sigstore/cosign-installer`](https://github.com/sigstore/cosign-installer)
174
+ GitHub Action in CI all work too.
175
+
176
+ You also need a coding harness installed and authenticated (e.g. Claude Code).
177
+ See `oneharness list` / `oneharness detect --all`.
178
+
179
+ ## Quick start
180
+
181
+ ```console
182
+ llmlint init # write a starter llmlint.yml (config-lint plugin on)
183
+ llmlint init --with-template # ...and embed the prompt template to customize
184
+ $EDITOR llmlint.yml # write your rules
185
+ llmlint # lint the configured files
186
+ llmlint src/api/**/*.rs # ...or lint specific files
187
+ llmlint --format json # machine-readable output
188
+ ```
189
+
190
+ ## Configuration
191
+
192
+ `llmlint.yml` (discovered by walking up from the working directory; override with
193
+ `-c/--config`, repeatable). Discovery is **nested** in both directions. Walking
194
+ **up**, *every* config found (one per directory) is merged, nearest first, so a
195
+ config beside the files being linted, a project config above it, and a user-level
196
+ config higher still layer together — the most-local config wins each top-level
197
+ scalar, every config contributes its rules, and a more distant config fills only
198
+ the gaps (the same nearest-wins precedence as [plugins](#plugins-shared-rule-sets)).
199
+ Walking **down**, a config in a subdirectory governs *its own* part of the project:
200
+ its `files` globs are rooted at that directory (a `frontend/llmlint.yml` with
201
+ `*.txt` matches `frontend/`'s files, never a same-named file elsewhere), so you can
202
+ keep per-area rules next to the code they check. A subtree config scopes *rules*,
203
+ not session-wide settings (model, timeout, template, rationales come from the
204
+ working directory and up); `--config` replaces the whole walk with no cascade.
205
+ `llmlint init` writes it with a leading
206
+ `# yaml-language-server: $schema=…` modeline pointing at llmlint's
207
+ [published JSON Schema](assets/llmlint.schema.json), so editors with the YAML
208
+ language server (e.g. VS Code's [YAML extension](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml))
209
+ give completion and validation as you write. Add the same line to a hand-written
210
+ config to opt in.
211
+
212
+ ```yaml
213
+ version: 1 # this config's published version (used when it is consumed as a plugin)
214
+
215
+ # Files linted when none are passed on the CLI. Omit the whole block (or leave
216
+ # `include` empty) to lint every file in the tree from the current directory;
217
+ # `exclude` and `.gitignore` still apply.
218
+ files:
219
+ include: ["src/**/*.rs"]
220
+ exclude: ["**/generated/**"]
221
+
222
+ # Require a short `rationale` for every verdict (default true). See Rationales below.
223
+ rationales: true
224
+
225
+ # Default base for `--diff` (when `--diff-base` isn't passed). Any git revision —
226
+ # a branch, tag, commit, or `A..B`/`A...B` range. Set it to your default branch so
227
+ # a CI quality gate reviews whatever the current branch changed; unset = `HEAD`.
228
+ diff_base: main
229
+
230
+ # Pull in shared rule sets / plugins with one line each. An entry is a local
231
+ # path or a URL (`http(s)://`, `file://`); pin a URL to a version with `@`.
232
+ plugins:
233
+ - "https://raw.githubusercontent.com/nickderobertis/llmlint/main/assets/config_lint.yml@1" # bundled: lints this config's own rules
234
+ - "https://example.com/org-rules.yml@1.2.3" # pinned; fetched + cached once
235
+ - "./team-rules.yml"
236
+
237
+ # Agents group rules and add reviewer context + harness/model/batch config.
238
+ # YAML anchors let you share prompt text with zero framework support.
239
+ agents:
240
+ architecture:
241
+ harness: claude-code # any id from `oneharness list`; omit to use oneharness's own default
242
+ model: opus
243
+ batch_size: 15 # rules per judge run (default 20)
244
+ prompt_template: | # appended to the master template before render
245
+ You are a senior software architect reviewing service boundaries.
246
+
247
+ rules:
248
+ - name: handlers_delegate_to_services # unique, terse, descriptive
249
+ description: |
250
+ true when every HTTP handler delegates business logic to a service layer.
251
+ false when a handler performs business logic (DB queries, domain rules)
252
+ inline.
253
+ agent: architecture # optional; omit to use the default agent
254
+ # override: true # optional; extend a same-named plugin rule, inheriting unset fields
255
+ judges: 3 # optional; independent judges, majority wins (default 1)
256
+ rationale: true # optional; override the session-wide `rationales` for this rule
257
+ relevance: true # optional; when to evaluate — see Relevance below (default true)
258
+ files: # optional; override the target files for this rule
259
+ include: ["src/api/**"]
260
+ ```
261
+
262
+ ### Nested & per-directory configs
263
+
264
+ Configs **nest** — discovery walks both up from the working directory and down
265
+ into its subtree, merging every `llmlint.yml` it finds. This lets you layer a
266
+ user-level config, a project config, and per-area configs that live next to the
267
+ code they govern, with no extra wiring.
268
+
269
+ ```
270
+ ~/.llmlint.yml # user-level defaults (model, rationales…)
271
+ my-project/
272
+ ├── llmlint.yml # project rules + settings (run from here)
273
+ ├── backend/
274
+ │ └── llmlint.yml # rules for backend/**, globs rooted at backend/
275
+ │ # files: { include: ["**/*.py"] }
276
+ │ # rules: [{ name: no_print_debugging, … }]
277
+ └── frontend/
278
+ └── llmlint.yml # rules for frontend/**, globs rooted at frontend/
279
+ # files: { include: ["**/*.ts"] }
280
+ # rules: [{ name: no_inline_styles, … }]
281
+ ```
282
+
283
+ Running `llmlint` from `my-project/` evaluates **all** of these together:
284
+
285
+ - `no_print_debugging` runs only on `backend/**/*.py`, and `no_inline_styles`
286
+ only on `frontend/**/*.ts` — each subtree config's `files` globs are **rooted
287
+ at its own directory**, so `**/*.py` under `backend/llmlint.yml` means
288
+ `backend/**/*.py`, never a stray `.py` elsewhere.
289
+ - The project's own rules and settings apply across the whole run; a more-local
290
+ config **wins** each setting and can `override` a rule from a config above it.
291
+ - **Session settings** (model, timeout, prompt template, rationales) come from
292
+ the working directory and up — a subtree config scopes *rules*, it doesn't
293
+ retune the whole run. Run from `my-project/backend/` instead and that config
294
+ becomes the most-local one, layering under the project and user configs.
295
+
296
+ Passing **explicit files** (`llmlint backend/svc.py`) narrows this to just the
297
+ configs that govern those files: a subtree config is consulted only when a passed
298
+ file lives under it, and each rule judges only the passed files inside its own
299
+ directory. So `llmlint backend/svc.py` never loads `frontend/llmlint.yml` (nor
300
+ fetches its plugins, nor collides with a rule it happens to share a name with) —
301
+ you get exactly the rules that apply to what you asked to lint.
302
+
303
+ Use `llmlint config` to see the merged result and `llmlint config --sources`
304
+ (or `llmlint where rules.<name>`) to trace any rule, agent, or setting back to
305
+ the exact file it came from. To bypass discovery entirely, pass explicit
306
+ configs with `-c/--config` (repeatable) — that roots every glob at the working
307
+ directory with no cascade.
308
+
309
+ ### Writing good rules
310
+
311
+ - **Phrase each rule as a positive invariant.** `holds = true` means the code
312
+ complies; `holds = false` is a violation that llmlint reports and fails on.
313
+ - **Make the verdict unambiguous.** Often a plain statement of the property is
314
+ enough — `every public item has a doc comment` already says what passes and,
315
+ by negation, what fails; there's no need to bolt on a "false when…" clause that
316
+ only restates the inverse. Spell out the violating case *only* when it carries
317
+ meaningful detail — concrete examples, easily-confused edge cases — that the
318
+ positive statement leaves unclear. When you do state both, keep them mutually
319
+ exclusive. The bundled config-lint plugin (the `config_lint.yml` URL above)
320
+ lints your config for exactly this, plus descriptive (non-placeholder) names
321
+ that match what each rule checks.
322
+ - **Names** are unique, terse, and descriptive (`^[A-Za-z][A-Za-z0-9_]*$`); they
323
+ become the JSON keys of the structured output.
324
+ - **Scope a rule to the changes it applies to with `relevance`** (see below)
325
+ instead of bolting "…or not applicable" onto the description — that keeps the
326
+ true/false outcome clean and lets llmlint tell "didn't apply" apart from "true".
327
+ - **Keep each `description` and `relevance` concise.** A judge call batches an
328
+ agent's rules into one prompt, so tokens one bloated rule spends dilute every
329
+ other rule in that batch. State the invariant in the fewest words that keep it
330
+ unambiguous; add length only when it buys clarity (concrete examples, tricky
331
+ edge cases). config-lint checks this too.
332
+
333
+ ### The prompt template
334
+
335
+ llmlint renders the judge's system prompt from a
336
+ [minijinja](https://docs.rs/minijinja) (Jinja2-style) template. The bundled
337
+ default lives in [`assets/default_template.md`](assets/default_template.md); embed
338
+ a copy to customize with `llmlint init --with-template`, or set `prompt_template`
339
+ yourself. The top-level `prompt_template` *replaces* the master template; an
340
+ agent's `prompt_template` is **appended** to it before rendering, so reviewer
341
+ context you add per-agent sees the same variables.
342
+
343
+ These variables are in scope when a template renders:
344
+
345
+ | Variable | Type | Description |
346
+ | --- | --- | --- |
347
+ | `files` | list of strings | The target file paths for this run — relative to the working directory, always forward-slashed (so a Windows run reads the same as Linux/macOS). |
348
+ | `rules` | list of objects | The rules in this batch. Each has `.name` (the identifier, also the JSON key in the structured output), `.description` (the invariant to judge), `.rationale` (whether this rule wants a justification), `.relevance` (the relevance condition string, or unset for an always-evaluated rule), `.require_line_attribution` (whether every violation must cite a `file` + `line`), and `.files` (the subset of `files` this rule applies to). |
349
+ | `file_rules` | list of objects | Per-file applicability — one entry per target file, in the same order as `files`. Each has `.file` (the path), `.mode` (`"include"` or `"exclude"`), `.rules` (the rule names to apply or, when `mode == "exclude"`, to skip — whichever list is shorter), and `.diff` (that file's unified diff, present only under `--diff` when the file changed). The default template's "Target files" section is built from this. |
350
+ | `diffs` | list of objects | Per-file changed-line diffs — one entry per *changed* file, present only under `--diff` (empty otherwise). Each has `.file` (matching its entry in `files`) and `.diff` (the unified diff text). The default template inlines these per file via `file_rules`; kept separately for custom templates. |
351
+ | `rationales` | bool | True when any rule in this batch wants a rationale — gate the rationale guidance on it. |
352
+ | `relevance` | bool | True when any rule in this batch carries a relevance condition — gate the relevance guidance on it. |
353
+ | `line_attribution` | bool | True when any rule in this batch requires line attribution — gate the line-attribution guidance on it. |
354
+
355
+ ```jinja
356
+ ## Target files
357
+ {% for f in files %}- {{ f }}
358
+ {% endfor %}
359
+ ## Rules to evaluate
360
+ {% for r in rules %}### {{ r.name }}
361
+
362
+ {{ r.description }}
363
+ {% endfor %}
364
+ ```
365
+
366
+ A run is one `(agent, file set, judge)` batch, so `rules` is that batch's slice
367
+ (see `batch_size`), not necessarily every rule in the config.
368
+
369
+ ### Rationales
370
+
371
+ By default each judge must justify every verdict with a short **rationale**. The
372
+ structured output for each rule is ordered deliberately — the judge echoes the
373
+ rule **name**, writes the **rationale**, then commits to the **result**
374
+ (`holds` + any `violations`):
375
+
376
+ ```jsonc
377
+ {
378
+ "no_inline_sql": {
379
+ "name": "no_inline_sql", // 1. anchor on the rule
380
+ "rationale": "raw SQL built inline in db.rs:42, not via the query layer", // 2. reason
381
+ "holds": false, // 3. conclude
382
+ "violations": [{ "file": "src/db.rs", "line": 42, "message": "inline SQL" }]
383
+ }
384
+ }
385
+ ```
386
+
387
+ Reasoning *before* concluding (and naming the rule first) keeps each verdict
388
+ consistent and targeted — it leans on the model's next-token prediction so the
389
+ `holds` follows from the evidence just written, not the other way round. Beyond
390
+ that, rationales buy you:
391
+
392
+ - **Auditability** — a durable record of *why* each verdict landed, carried in
393
+ `--format json` for every rule (pass or fail).
394
+ - **Debugging** — when a verdict looks wrong, you see the judge's reasoning, not
395
+ just a bare pass/fail.
396
+ - **Reliability** — verdicts are measurably steadier when the judge must commit
397
+ to evidence first.
398
+
399
+ The cost is **extra output tokens on every request**. Turn rationales off to
400
+ save tokens:
401
+
402
+ ```yaml
403
+ rationales: false # session-wide default (CLI --no-rationales overrides it)
404
+
405
+ rules:
406
+ - name: handlers_delegate_to_services
407
+ description: ...
408
+ rationale: true # …but keep them for this high-stakes rule
409
+ ```
410
+
411
+ Precedence, lowest to highest: the session default `rationales` (default `true`)
412
+ → a per-rule `rationale` → the `--rationales` / `--no-rationales` CLI flags
413
+ (which set the session default for the run; a per-rule `rationale` still wins).
414
+ In the human report, a rule's rationale is shown for every **failure** by
415
+ default, and for **every evaluated rule** at `-v`. The default prompt template
416
+ asks for rationales that are terse and pithy — the fewest tokens that still cite
417
+ the evidence — so the token cost stays small. See
418
+ [Cost vs performance](#cost-vs-performance-token-usage) to trade it for a cheaper run.
419
+
420
+ For a **multi-judge** rule (`judges: N`), the report and `--format json` show
421
+ **each judge's** result and rationale, not just one representative — so you can
422
+ see exactly where the judges agreed or split:
423
+
424
+ ![llmlint's multi-judge report: a FAIL headed "1/3 judges held" with each judge's held/violated line and rationale, then the pinned violation](docs/screenshots/multi-judge.svg)
425
+
426
+ ### Relevance
427
+
428
+ Not every rule applies to every change. Rather than make each `description`
429
+ carry its own "…or not applicable" escape hatch — which muddies the true/false
430
+ outcome and hides *why* a rule passed — declare when a rule should be evaluated
431
+ with **`relevance`**:
432
+
433
+ ```yaml
434
+ rules:
435
+ # Always evaluated (the default). The judge may not opt out.
436
+ - name: public_items_are_documented
437
+ description: ...
438
+ # relevance: true # implicit
439
+
440
+ # Never evaluated — disabled deterministically, with no judge call.
441
+ - name: legacy_only_check
442
+ description: ...
443
+ relevance: false
444
+
445
+ # Conditionally evaluated. The judge decides whether the condition holds for
446
+ # the change *before* the verdict; if it doesn't, the rule is "not relevant".
447
+ - name: errors_are_contextualized
448
+ description: |
449
+ TRUE when every returned error adds context about the operation that
450
+ failed. FALSE when an error is propagated with no added context.
451
+ relevance: the change adds or modifies error handling
452
+ ```
453
+
454
+ For a conditional rule the structured output gains a `relevant` boolean, decided
455
+ before the verdict — so a not-applicable rule is distinguishable from a true one:
456
+
457
+ ```jsonc
458
+ // Not relevant: the object ends after `relevant`; the rationale explains why.
459
+ { "errors_are_contextualized": {
460
+ "name": "errors_are_contextualized",
461
+ "rationale": "the change only renames a struct field; no error handling touched",
462
+ "relevant": false } }
463
+
464
+ // Relevant: proceed to the verdict as usual.
465
+ { "errors_are_contextualized": {
466
+ "name": "errors_are_contextualized",
467
+ "rationale": "every `?` propagation wraps with `.context(...)`",
468
+ "relevant": true,
469
+ "holds": true } }
470
+ ```
471
+
472
+ A **not-relevant** rule is neither a pass nor a violation — it never fails the
473
+ build. The human report counts it in a `… not relevant` summary segment and, at
474
+ `-v`, itemizes it as a dim `N/A <rule> (not relevant)` line with the reason;
475
+ `--format json` carries `"outcome": "not_relevant"` and a `not_relevant` summary
476
+ count. For a multi-judge rule, relevance is decided by majority first, then the
477
+ verdict is tallied over the judges that found it relevant.
478
+
479
+ ### Ignore directives
480
+
481
+ Suppress a rule at a specific place with an inline comment in the target file —
482
+ the same idea as `# noqa` / `// eslint-disable`, but **strict**: a directive must
483
+ name the specific rule(s) and give a reason.
484
+
485
+ ```rust
486
+ let q = format!("SELECT * FROM users WHERE id = {id}"); // llmlint: ignore[no_inline_sql] one-off migration, not user-facing
487
+ ```
488
+
489
+ ```python
490
+ # llmlint: ignore-file[public_items_are_documented] generated stubs, documented upstream
491
+ ```
492
+
493
+ ```rust
494
+ // llmlint: ignore-block[no_inline_sql] legacy query layer, migration tracked in JIRA-42
495
+ fn legacy_queries() { /* … */ }
496
+ // llmlint: ignore-end[no_inline_sql]
497
+ ```
498
+
499
+ - `llmlint: ignore[rule, ...] <reason>` is **line-scoped** — it covers the line it
500
+ sits on (a trailing comment) or the line right below it (a comment on its own line).
501
+ - `llmlint: ignore-file[rule, ...] <reason>` is **file-scoped** — it covers the
502
+ whole file.
503
+ - `llmlint: ignore-block[rule, ...] <reason>` … `llmlint: ignore-end[rule, ...]` is
504
+ **block-scoped** — it covers every line between the open and its matching close.
505
+ The closing `ignore-end` names the same rule(s) and needs no reason. Blocks track
506
+ each rule independently, so rules opened together may be closed at different points
507
+ and blocks for different rules may overlap.
508
+
509
+ Use whatever comment syntax the file's language uses (`//`, `#`, `/* … */`, `<!-- … -->`);
510
+ llmlint keys off the reserved `llmlint: ignore` / `llmlint: ignore-file` /
511
+ `llmlint: ignore-block` / `llmlint: ignore-end` prefix.
512
+
513
+ **Two layers, by design.** llmlint deterministically validates each directive's
514
+ *structure* before any judge runs — it must name **specific, configured** rule(s)
515
+ and carry a **reason** (except `ignore-end`, which only closes a block). A directive
516
+ with no brackets, an empty list, an unknown or misspelled rule, or no reason is a
517
+ hard `file:line:` error (exit 2), so a typo fails loudly instead of silently
518
+ suppressing nothing. Block pairing is checked too: an unclosed `ignore-block`, an
519
+ `ignore-end` with no open block, or re-opening a rule already open is a hard error. Actually *honoring* a
520
+ well-formed directive is the judge's job: the default prompt tells it to skip a
521
+ named rule's violation at the directive's location. (A custom `prompt_template`
522
+ should carry the same guidance if you want directives honored.) Because the
523
+ prefix is reserved, a *linted* file that merely documents the feature must use
524
+ real rule names or avoid the literal `llmlint: ignore[…]` form.
525
+
526
+ This structural check is **deterministic and free** — no model call — so it is
527
+ also exposed as its own command, [`llmlint check-ignores`](#commands--exit-codes).
528
+ Run it in your tight, fast linter loop (next to `cargo fmt` / `clippy`, in a
529
+ pre-commit hook, or as a quick CI step), where it catches a typo'd or
530
+ reason-less directive in milliseconds. The full `llmlint` run performs the same
531
+ check as a pre-flight, so the two never disagree — `check-ignores` just gives you
532
+ the fast feedback without waiting on (or paying for) a judge.
533
+
534
+ ### Batching
535
+
536
+ Model calls are the slow, paid part, so llmlint packs rules into as few as it can.
537
+ Rules group **by agent**, then split into batches of at most `batch_size`
538
+ (default 20) — one `oneharness run` per batch, over the union of its files, each
539
+ rule scoped to its own files in the prompt. Multi-judge rules fan out per judge
540
+ (judge `j` runs the rules with `judges >= j`). Fewer, fuller batches, fewer
541
+ round-trips.
542
+
543
+ ### Judges and voting
544
+
545
+ `judges: N` runs a rule through `N` independent judges and takes the **majority**
546
+ verdict. `N` must be **odd** (1, 3, 5, …) so the vote can't tie — an even count is
547
+ a config error. Only rules that opt in pay the extra cost: judge 1 runs all rules,
548
+ judge 2 only the rules with `judges >= 2`, and so on.
549
+
550
+ ### Cost vs performance (token usage)
551
+
552
+ Defaults favor **judgment quality** over cost: rationales on, a thorough prompt,
553
+ every file read in full. Trade some back for fewer tokens, roughly by impact:
554
+
555
+ - **`judges`** — each extra judge is a full extra pass ([Judges](#judges-and-voting)).
556
+ Keep it at 1 except for high-stakes rules.
557
+ - **`rationales: false`** — drops the per-verdict justification, output tokens on
558
+ *every* rule ([Rationales](#rationales)). Re-enable `rationale: true` per rule;
559
+ `--no-rationales` for one run.
560
+ - **Fewer agents, bigger `batch_size`** — every batch re-sends the prompt and
561
+ re-reads its files ([Batching](#batching)). Merge rules onto one agent; split
562
+ only for a different harness, model, or reviewer context.
563
+ - **Read less** — narrow `files.include`/`exclude`; `--diff` reviews only changed
564
+ lines; `FILES`/`--rule`/`--agent` lint a subset.
565
+ - **`require_line_attribution`** off unless you need pinned locations — on can
566
+ trigger localize re-prompts.
567
+ - **`oneharness.schema_max_retries`** — caps re-asks on a schema-invalid answer.
568
+ - **`model`** — dollars, not tokens: a cheaper model per agent or run.
569
+
570
+ `llmlint check-ignores` spends no tokens at all.
571
+
572
+ ### oneharness passthrough
573
+
574
+ llmlint lets oneharness discover its own `oneharness.toml` by default. To force a
575
+ specific oneharness config, use `--oneharness-config <path>` (or `oneharness.config`
576
+ in the llmlint config); it is forwarded via oneharness's `--config`. Override the
577
+ binary with `--oneharness-bin` or `$LLMLINT_ONEHARNESS_BIN`.
578
+
579
+ ### Plugins (shared rule sets)
580
+
581
+ `plugins` pulls other llmlint configs into this one — their rules and agents are
582
+ merged in. For the **top-level settings** (template, files, oneharness,
583
+ rationales), **the nearer config to the root wins**: your config's settings take
584
+ precedence over a plugin's, a plugin's over its own plugins', and an
585
+ earlier-listed plugin over a later sibling. A plugin only *fills in* a setting
586
+ the including config left unset, so a shared plugin can ship sensible defaults
587
+ without overriding what you set locally. The CLI overrides all of them (see
588
+ Commands). Each entry is a config file:
589
+
590
+ - a **local path** (`./team-rules.yml`), resolved relative to the including file;
591
+ - a **URL** — `http(s)://` (fetched over HTTPS) or `file://` (read directly).
592
+
593
+ Resolution is **transitive**: a pulled-in config's own `plugins` are pulled in
594
+ turn, and so on. Diamonds and cycles are de-duplicated (each config loads once),
595
+ and the chain is bounded at a depth of 100 to fail fast on a pathological graph.
596
+
597
+ By default a rule name is **unique** across the whole merged config — declaring
598
+ the same name twice is an error. To **adjust** a rule a plugin gave you without
599
+ restating it, re-declare it with `override: true` and set only the fields you
600
+ want to change; every other field (including the `description`) is inherited from
601
+ the plugin's rule:
602
+
603
+ ```yaml
604
+ plugins:
605
+ - "https://example.com/org-rules.yml@1" # ships `no_inline_sql`, 1 judge
606
+
607
+ rules:
608
+ # Keep the org rule's text, but vote it across 3 judges and scope it tighter.
609
+ - name: no_inline_sql
610
+ override: true
611
+ judges: 3
612
+ files:
613
+ include: ["src/db/**"]
614
+ ```
615
+
616
+ The override must be set on the **nearer-root** config, and there must be exactly
617
+ one base rule (the same name declared *without* `override`) for it to extend — an
618
+ `override` with nothing to override is an error, so a typo'd name can't silently
619
+ do nothing. When several configs override the same base, the nearest-root
620
+ override wins each field.
621
+
622
+ URL fetching is built in (a pure-Rust HTTPS client — no `curl` or other external
623
+ tools, no system OpenSSL) and honors the standard `HTTP(S)_PROXY` / `NO_PROXY`
624
+ env vars. The bundled config-lint plugin ships inside the binary and resolves
625
+ **offline**.
626
+
627
+ A URL may be **pinned to a version** with an `@` suffix matching the plugin
628
+ config's own top-level `version`: `@1` accepts any `1.x`, `@1.2` any `1.2.x`,
629
+ `@1.2.3` exactly that. The pin is both an assertion (a mismatch is a hard error)
630
+ and the **cache key**: a pinned URL is fetched once into the cache and reused on
631
+ later runs without refetching — bump the pin to pull a new version. An *unpinned*
632
+ URL is fetched every run.
633
+
634
+ The cache lives under `$XDG_CACHE_HOME/llmlint/plugins` (override with
635
+ `LLMLINT_CACHE_DIR`). Set `LLMLINT_PLUGIN_REFRESH=1` to force a refetch.
636
+
637
+ ### Linting your llmlint configs
638
+
639
+ llmlint ships with a **config-lint** rule set that lints llmlint config files
640
+ themselves — that every rule's `description` yields a clear, unambiguous verdict,
641
+ its `name` is descriptive (non-placeholder) and matches what the description
642
+ checks, and a conditional rule uses `relevance` instead of bolting "…or not
643
+ applicable" onto the description. It's the [Writing good rules](#writing-good-rules)
644
+ guidance, enforced (and each rule is phrased to pass its own checks). Each finding
645
+ cites the config file + the offending rule's line (`require_line_attribution`).
646
+ There are two ways to use it:
647
+
648
+ - **As a plugin** — add the bundled URL to your `plugins` (it's on by default in
649
+ `llmlint init`), and its rules run against your config files on every normal
650
+ `llmlint` run, alongside your own rules:
651
+
652
+ ```yaml
653
+ plugins:
654
+ - "https://raw.githubusercontent.com/nickderobertis/llmlint/main/assets/config_lint.yml@1"
655
+ ```
656
+
657
+ The URL ships **inside the binary** and resolves offline (no network, no
658
+ cache), so it works disconnected and needs no pin bump to stay current.
659
+
660
+ - **As a subcommand** — `llmlint lint-config` is the `lint` command with that
661
+ plugin included by default, so you don't have to add it to your config. It
662
+ first runs the deterministic ignore-directive (comment) check over the config
663
+ files, then judges each config's rules. Point it at specific files
664
+ (`llmlint lint-config path/to/llmlint.yml`) or let it discover every llmlint
665
+ config in the tree. Handy in CI as a standalone "is my config well-authored?"
666
+ gate.
667
+
668
+ ### Finding where something is defined
669
+
670
+ Once configs merge across files and plugins, a rule, agent, or setting in the
671
+ effective config can come from any of them. Two commands trace an item back to
672
+ the file (or plugin URL) you'd edit to change it.
673
+
674
+ `llmlint where <path>` answers one lookup and prints **just the source**, so it
675
+ composes in scripts. The path mirrors the config structure:
676
+
677
+ ```console
678
+ $ llmlint where oneharness.model # a top-level setting
679
+ ./shared/team.yml
680
+ $ llmlint where agents.security # an agent
681
+ ./shared/team.yml
682
+ $ llmlint where rules.no_inline_sql # where a rule is defined
683
+ https://example.com/org-rules.yml@1
684
+ $ llmlint where rules.no_inline_sql.judges # the file an override set a field in
685
+ ./llmlint.yml
686
+ $ editor "$(llmlint where rules.no_inline_sql.judges)"
687
+ ```
688
+
689
+ Because an `override` resolves **field by field**, a single rule can draw its
690
+ `description` from the plugin that defined it and its `judges` from your config —
691
+ `where rules.<name>.<field>` points at the file that actually set that field (or
692
+ the definition site when no override did). An unknown name lists what's available,
693
+ and a setting left at its built-in default says so, both exiting non-zero.
694
+
695
+ For the whole picture at once, `llmlint config --sources` adds a `sources` block:
696
+
697
+ ```jsonc
698
+ {
699
+ "config_files": ["./llmlint.yml", "./shared/team.yml", "https://example.com/org-rules.yml@1"],
700
+ "sources": {
701
+ "settings": { "version": "./llmlint.yml", "oneharness.model": "./shared/team.yml" },
702
+ "agents": { "security": "./shared/team.yml" },
703
+ "rules": {
704
+ "no_inline_sql": {
705
+ "source": "https://example.com/org-rules.yml@1", // where the rule is defined
706
+ "fields": { "judges": "./llmlint.yml" } // a field an override moved
707
+ }
708
+ }
709
+ },
710
+ "config": { /* … the merged config … */ }
711
+ }
712
+ ```
713
+
714
+ A rule with no cross-file override has no `fields` entry; settings and agents are
715
+ each kept whole from the nearest-root config that set them, so they have a single
716
+ source.
717
+
718
+ ## Commands & exit codes
719
+
720
+ - `llmlint [FILES...]` — lint (the default). `--format human|json`, `--agent`,
721
+ `--rule`, `--max-parallel`, `--timeout`, `--cwd`. Target individual rules with
722
+ `--rule NAME` (repeatable) or a whole group with `--agent NAME`; an unknown
723
+ rule/agent name is an exit-2 error that lists the available names. Every
724
+ top-level setting also has a flag that wins over the config:
725
+ `--rationales`/`--no-rationales`, `--model NAME`, `--schema-max-retries N`,
726
+ `--prompt-template PATH`, plus `--oneharness-bin`/`--oneharness-config`. Pass
727
+ `--diff [<backend>]` to add each changed file's diff to the judge prompt so it
728
+ reviews only the changed lines; bare `--diff` uses the `git` backend (compared
729
+ against `HEAD`). Add `--diff-base <REF>` to compare against a different git
730
+ revision instead of `HEAD` — a branch, tag, commit, or `A..B`/`A...B` range —
731
+ so `--diff --diff-base main` reviews exactly what the current branch changed
732
+ versus `main` (the PR-review case). The base can also be set once in config as
733
+ `diff_base:` (the flag overrides it).
734
+ - `llmlint check-ignores [FILES...]` — validate the *structure* of inline
735
+ `llmlint: ignore` directives in the target files, **deterministically and with
736
+ no model call** (`-c/--config`, `--cwd`; pass `FILES` to scope it, e.g. the
737
+ changed files in a pre-commit hook). This is the same pre-flight `lint` runs,
738
+ split out for the fast static-check loop: exit `0` when every directive is
739
+ well-formed, exit `2` (located `file:line:`) on a typo'd / reason-less /
740
+ unbalanced one.
741
+ - `llmlint lint-config [FILES...]` — lint llmlint config files with the bundled
742
+ [config-lint](#linting-your-llmlint-configs) rules, without adding the plugin to
743
+ your own config. It's the `lint` engine with that plugin forced on: it first
744
+ runs the deterministic comment (ignore-directive) check, then judges each
745
+ config's rules. Shares `lint`'s flags (`--format`, `--model`, `--timeout`,
746
+ `--cwd`, `--diff`, …); the config source is fixed, so `--config`/`--agent`/
747
+ `--rule` aren't taken. Same exit codes as `lint`.
748
+ - `llmlint init` — write a starter config (`--with-template`, `--global`, `--force`).
749
+
750
+ ![llmlint init writing a starter llmlint.yml](docs/screenshots/init.svg)
751
+ - `llmlint config` — print the merged config and the ordered list of sources that
752
+ contributed, as JSON. Add `--sources` to also trace every rule, agent, and
753
+ setting back to the file (or plugin URL) it came from — see
754
+ [Finding where something is defined](#finding-where-something-is-defined).
755
+
756
+ ![llmlint config printing the merged config and its sources as JSON](docs/screenshots/config.svg)
757
+ - `llmlint where <path>` — print the single source of one config item: a setting
758
+ (`oneharness.model`, `version`), `agents.<name>`, `rules.<name>`, or a rule
759
+ field `rules.<name>.<field>`. See
760
+ [Finding where something is defined](#finding-where-something-is-defined).
761
+ - `llmlint doctor` — check that oneharness is installed and reachable.
762
+
763
+ ![llmlint doctor reporting the resolved oneharness version](docs/screenshots/doctor.svg)
764
+
765
+ Exit codes: `0` all rules hold · `1` at least one violation · `2` usage,
766
+ configuration, or harness error (could not complete the lint).
767
+
768
+ ## Development
769
+
770
+ ```console
771
+ just bootstrap # toolchain components + fetch (from a clean clone)
772
+ just check # full gate: fmt, clippy -D warnings, tests + 95% coverage, docs
773
+ just test-e2e # the e2e binary journeys in isolation
774
+ just deps-check # cargo deny + cargo machete
775
+ just lint-live # opt-in: ad-hoc lint against the REAL oneharness + a real harness
776
+ just live-claude # opt-in: live e2e — built llmlint → real oneharness → real harness
777
+ ```
778
+
779
+ Tests drive the real `llmlint` binary against a hermetic mock-oneharness fixture.
780
+ The live tier (`just live-claude`, and the ad-hoc `just lint-live`) drives the
781
+ whole stack end to end against a real, authenticated harness — the only thing that
782
+ makes real model calls, and out of the `check` gate. It runs on PRs in its own
783
+ workflow across Linux/macOS/Windows, so a missing CLI, auth, or oneharness is a
784
+ hard failure, not a skip. See `AGENTS.md` and `tests/AGENTS.md`.
785
+
786
+ ## License
787
+
788
+ MIT — see [LICENSE](LICENSE).
789
+