@bfirestone45/opencode-slop-review 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,762 @@
1
+ ---
2
+ name: ai-slop-review
3
+ description: >
4
+ Use when reviewing code quality, AI-generated patterns, idiom drift, architecture fit,
5
+ wrong-problem solutions, abstraction boundaries, PR strategy, or reviewer concerns about
6
+ whether an approach should exist. Trigger for "review this for slop", "is this idiomatic?",
7
+ "does this look AI-generated?", "audit code quality", "find AI patterns", or general
8
+ quality reviews where local code hygiene or solution fit is the concern. Supports Go,
9
+ Python, Rust, and Svelte/TypeScript references; universal signals apply to any language.
10
+ compatibility: opencode
11
+ license: MIT
12
+ ---
13
+
14
+ # AI Slop Review
15
+
16
+ Identify low-quality, likely AI-generated code and solution-level slop through a parallel
17
+ review architecture. Specialized agents scan for AI authorship signals, idiom violations,
18
+ code quality issues, and whether the implementation is the right solution to the problem.
19
+ A calibration agent then scores and filters findings. Only findings that survive
20
+ calibration appear in the final report.
21
+
22
+ ## Why multiple lenses matter
23
+
24
+ A single reviewer either blurs concerns together (mixing "is this AI-generated?" with
25
+ "is this good code?" and "is this the right solution?") or anchors too heavily on one
26
+ dimension. The multi-lens architecture separates these concerns so each agent can focus
27
+ deeply:
28
+
29
+ - **Phase 1a (AI Authorship Detection):** Looks for patterns that betray machine
30
+ generation -- contextual blindness, boilerplate residue, aspirational documentation,
31
+ mechanical uniformity. Not a code review; a forensic analysis.
32
+ - **Phase 1b (Idiom Fluency):** Checks whether the code reads like it was written by
33
+ someone fluent in the language and its ecosystem. Compares against the project's own
34
+ idiom baseline, not abstract ideals.
35
+ - **Phase 1c (Code Quality):** Traditional quality review -- dead code, stale docs,
36
+ debug artifacts, test quality, security, DRY violations. Deliberately agnostic about
37
+ whether code is AI-generated.
38
+ - **Phase 1d (Architecture and Solution-Fit):** Asks whether the implementation should
39
+ exist in this shape. Locally clean code can still be slop if it patches a symptom,
40
+ chooses the wrong owner, or ignores an existing tool or framework mechanism.
41
+
42
+ After the parallel scan, a calibration agent scores every finding on a 0-100 scale,
43
+ cross-references across lenses, and produces a filtered, verdict-bearing report.
44
+
45
+ The review must answer two separate questions:
46
+
47
+ - Is the code locally slop?
48
+ - Is the solution itself slop?
49
+
50
+ ## Execution Guidance
51
+
52
+ Use the lightest reasonable pass for Step 0 and stronger reasoning passes for Phase 1a,
53
+ Phase 1b, Phase 1d, and Phase 2 if your runtime exposes model selection.
54
+
55
+ | Step | Agent | Suggested execution |
56
+ |------|-------|---------------------|
57
+ | Step 0 | Scope, problem reconstruction, and context gathering | `explore` subagent or equivalent lightweight scan |
58
+ | Phase 1a | AI Authorship Detection | `general` subagent |
59
+ | Phase 1b | Idiom Fluency | `general` subagent |
60
+ | Phase 1c | Code Quality | `general` subagent |
61
+ | Phase 1d | Architecture and Solution-Fit | `general` subagent for PRs and non-trivial changes |
62
+ | Phase 2 | Calibration | `general` subagent |
63
+ | Phase 3 | Synthesis | inline (no subagent) |
64
+
65
+ If model selection is available, prefer:
66
+
67
+ - a fast, low-cost model for Step 0
68
+ - a stronger reasoning model for Phase 1a, Phase 1b, Phase 1d, and Phase 2
69
+ - a balanced model for Phase 1c
70
+
71
+ ---
72
+
73
+ ## Workflow
74
+
75
+ ### Step 0: Determine scope, reconstruct the problem, gather context, and build idiom baseline
76
+
77
+ Use an `explore` subagent or an equivalent lightweight scan for this step.
78
+
79
+ **Scope:** Determine what to review based on the user's request:
80
+ - If the user specifies files or directories, use those
81
+ - If the user says "review this PR" or "review my changes", use `git diff` to identify changed files
82
+ - If the user says "review the codebase" or similar broad request, scan `src/` or the main
83
+ source directory, excluding vendored code, generated files, and test fixtures
84
+
85
+ **Phase 0 problem reconstruction** (do this before any review -- it prevents solution-level false negatives):
86
+
87
+ For PRs and non-trivial changes, produce a short problem statement before launching Phase 1:
88
+
89
+ 1. Identify the stated problem from PR title, description, linked issues, commits, and
90
+ human reviewer comments
91
+ 2. Identify the inferred actual failure mode from changed code, tests, logs, commands,
92
+ and reproduction evidence
93
+ 3. Identify existing mechanisms that already own the problem area: framework features,
94
+ package managers, build tools, platform APIs, repo scripts, or established team flows
95
+ 4. Identify the minimal solution that would solve the problem without new abstractions
96
+ 5. Record unanswered questions where the PR does not explain why the chosen approach is necessary
97
+
98
+ For PR reviews, always read human reviewer comments before final grading. Treat comments
99
+ as context signals about requirements, missing evidence, tool mental models, and
100
+ solution-level objections -- not just as line-level code review inputs.
101
+
102
+ When PR comments include phrases like "why", "what problem", "anti-pattern", "wrong
103
+ layer", "should just work", "too much baggage", "AI fix this", or "do we need this",
104
+ route them to Phase 1d. These are usually architecture or solution-fit objections.
105
+
106
+ **Context gathering** (do this before any review -- it prevents false positives):
107
+
108
+ 1. Read any project guidance files in the repo root and relevant subdirectories, especially
109
+ `AGENTS.md`, `CLAUDE.md`, `README.md`, and contributor docs that define conventions,
110
+ style rules, or architectural decisions
111
+ 2. Sample 2-3 existing files in the same directory or package as the code under review to
112
+ establish the project's baseline patterns:
113
+ - Error handling style (how does this project handle errors?)
114
+ - Import conventions (aliased? grouped? sorted?)
115
+ - Naming patterns (camelCase? snake_case? abbreviations?)
116
+ - Logging approach (which logger? structured? what level conventions?)
117
+ - Test style (table-driven? fixtures? mocks? what framework?)
118
+ 3. Detect the primary language(s) and load the appropriate reference file(s) from
119
+ `.opencode/skills/ai-slop-review/references/` -- only read reference files for
120
+ languages actually present in the review scope
121
+
122
+ **Idiom baseline** (document this explicitly so Phase 1b has a concrete reference):
123
+
124
+ Produce a structured idiom baseline for each language in scope. This baseline is the
125
+ authority for Phase 1b -- anything matching it is NOT flagged. Include:
126
+
127
+ - **Language version:** e.g., Go 1.22, Python 3.12, Rust 2021 edition
128
+ - **Modern features in use:** e.g., `slog` vs `log`, `itertools` usage, `?` operator patterns
129
+ - **Stdlib preferences:** which standard library packages the project favors over third-party alternatives
130
+ - **Error handling convention:** e.g., sentinel errors vs custom types, `errors.Is` or `As` usage, bare `except` policy
131
+ - **Test framework:** e.g., `testing` + `testify`, `pytest`, `rstest`
132
+ - **Import conventions:** grouping order, aliasing patterns, relative vs absolute
133
+ - **Naming conventions:** abbreviation norms, exported or unexported patterns, file naming
134
+
135
+ **Scope adaptation for PR reviews:**
136
+
137
+ When reviewing a PR, also gather the base branch versions of changed files so that
138
+ Phase 1 agents can distinguish between pre-existing patterns and newly introduced ones.
139
+ Use `git show <base>:<path>` for each changed file.
140
+
141
+ Also gather the PR title, description, linked issues, commit list, changed-file list, and
142
+ human reviewer comments. If using GitHub, prefer `gh pr view --comments` plus the
143
+ appropriate `gh api` review-comment endpoints when available.
144
+
145
+ Store all gathered context (problem reconstruction, codebase context, idiom baseline,
146
+ base branch files, and reviewer comments) -- all Phase 1 agents and Phase 2 need it.
147
+
148
+ ---
149
+
150
+ ### Phase 1: Parallel multi-lens scan
151
+
152
+ Launch the applicable `general` subagents in parallel. Each receives the files under
153
+ review, the problem reconstruction, codebase context, the idiom baseline, reviewer
154
+ comments, and the relevant language reference files from Step 0.
155
+
156
+ **Important:** Always use general-purpose subagents for this review. Do not use specialized
157
+ review bots or additional repo-specific review prompts that blend their own methodology into
158
+ this one.
159
+
160
+ For large reviews (>10 files), split each lens across multiple parallel subagents by
161
+ directory or module. Phase 1d should stay cross-cutting unless the PR spans genuinely
162
+ independent systems.
163
+
164
+ #### Phase 1a: AI Authorship Detection
165
+
166
+ > You are an AI authorship forensic analyst. Your sole job is to identify code that was
167
+ > likely generated by an AI assistant rather than written by a human developer. You are
168
+ > NOT doing a general code review. Ignore human-style mistakes -- typos, inconsistent
169
+ > spacing, TODO hacks, quick-and-dirty solutions. Those are human signals, not problems
170
+ > for you to flag.
171
+ >
172
+ > Focus exclusively on these AI authorship signals:
173
+ >
174
+ > 1. **Contextual blindness** -- code that is locally coherent but unaware of its
175
+ > surroundings: different error handling than the file it lives in, a utility that
176
+ > duplicates one nearby, an abstraction that ignores established patterns, a different
177
+ > logger or serializer or HTTP client than everything else uses. This is the strongest signal.
178
+ > 2. **Boilerplate residue** -- scaffolding, placeholder comments, template structure that
179
+ > was never customized. Code that looks like it was accepted from a suggestion without
180
+ > adaptation.
181
+ > 3. **Aspirational documentation** -- docstrings or comments that describe what the code
182
+ > *should* do rather than what it *does*. README sections that describe features not
183
+ > yet implemented. Comments that are more detailed than the code warrants.
184
+ > 4. **Over-engineering** -- abstractions with one implementation, factory patterns used
185
+ > once, configuration for single-use code, defensive checks for impossible conditions.
186
+ > AI models build for generality; humans build for the case at hand.
187
+ > 5. **Uniform mechanical style** -- suspiciously consistent formatting, identical
188
+ > try or catch shapes across unrelated functions, uniform comment density. Human code
189
+ > has texture and variation.
190
+ >
191
+ > For each finding, report:
192
+ > - **File** and **line number(s)**
193
+ > - The specific **code snippet**
194
+ > - **Signal category** (one of the five above)
195
+ > - **Reasoning** -- why this pattern indicates AI generation rather than human authorship
196
+ > - **Confidence** (0-100)
197
+ >
198
+ > At the end, produce a **per-file authorship assessment**:
199
+ > | File | AI Likelihood (0-100) | Primary Signals | Notes |
200
+ > |------|----------------------|-----------------|-------|
201
+ >
202
+ > Tag every finding with `[AI_AUTHORSHIP]`.
203
+
204
+ #### Phase 1b: Idiom Fluency
205
+
206
+ > You are a language idiom expert. Your job is to identify code that is not idiomatic
207
+ > for its language, framework, and project context. You have the project's idiom baseline
208
+ > -- do NOT flag patterns that match the project's idiom baseline. Only flag deviations
209
+ > from established project conventions or from modern language best practices that the
210
+ > project has adopted.
211
+ >
212
+ > Focus on:
213
+ >
214
+ > 1. **Modern language features** -- using old patterns when the project's language version
215
+ > supports better alternatives (e.g., `os.Open` error handling without `errors.Is` in a
216
+ > Go 1.20+ project, manual loops instead of comprehensions in Python 3.10+)
217
+ > 2. **Stdlib usage** -- using third-party libraries for things the stdlib handles well,
218
+ > or using deprecated stdlib APIs when modern replacements exist in the project's version
219
+ > 3. **Error handling** -- patterns that deviate from the project's established convention
220
+ > (not from abstract ideals)
221
+ > 4. **Framework conventions** -- using a framework against its grain (e.g., fighting
222
+ > Dagster's asset model, bypassing Django's ORM patterns when the project uses them)
223
+ > 5. **Naming and structure** -- names that do not follow the project's conventions,
224
+ > file organization that breaks the established module structure
225
+ >
226
+ > For each finding, report:
227
+ > - **File** and **line number(s)**
228
+ > - The specific **code snippet**
229
+ > - **Signal category** (one of the five above)
230
+ > - **Idiomatic alternative** -- what the code should look like
231
+ > - **Reasoning** -- why the current code is non-idiomatic in this project's context
232
+ > - **Confidence** (0-100)
233
+ >
234
+ > Tag every finding with `[IDIOM]`.
235
+
236
+ #### Phase 1c: Code Quality
237
+
238
+ > You are a code quality reviewer. Your job is to find concrete quality issues -- dead
239
+ > code, stale documentation, debug artifacts, test problems, security concerns, and DRY
240
+ > violations. Tag every finding as `[CODE_QUALITY]`. Do not speculate about whether code
241
+ > is AI-generated -- that is another reviewer's job. Focus only on whether the code is
242
+ > correct, maintainable, secure, and well-tested.
243
+ >
244
+ > Focus on:
245
+ >
246
+ > 1. **Dead code** -- unused imports, unreachable branches, commented-out code, unused
247
+ > variables or functions
248
+ > 2. **Stale documentation** -- comments or docstrings that do not match the current code
249
+ > behavior, outdated README sections, wrong parameter descriptions
250
+ > 3. **Debug artifacts** -- leftover print statements, hardcoded test values, disabled
251
+ > tests, temporary workarounds marked TODO with no tracking
252
+ > 4. **Test quality** -- tests that do not test behavior, missing edge case coverage,
253
+ > mocks that mock too much, tests that would pass even if the code were broken
254
+ > 5. **Security** -- SQL injection, path traversal, hardcoded secrets, unsafe
255
+ > deserialization, missing input validation on external boundaries
256
+ > 6. **DRY violations** -- copy-pasted logic that should be extracted, duplicated
257
+ > constants, repeated patterns that indicate missing abstractions
258
+ >
259
+ > For each finding, report:
260
+ > - **File** and **line number(s)**
261
+ > - The specific **code snippet**
262
+ > - **Signal category** (one of the six above)
263
+ > - **Reasoning** -- what the concrete quality issue is
264
+ > - **Confidence** (0-100)
265
+ >
266
+ > Tag every finding with `[CODE_QUALITY]`.
267
+
268
+ #### Phase 1d: Architecture and Solution-Fit Review
269
+
270
+ Required for PRs and non-trivial changes. Optional for tiny single-file edits where the
271
+ user only asks about local code style and no architecture or workflow choice is involved.
272
+
273
+ > You are an adversarial architecture and solution-fit reviewer. Your job is to decide
274
+ > whether the implementation is the right solution to the problem, regardless of whether
275
+ > the changed code is locally correct.
276
+ >
277
+ > Do NOT focus on formatting, style, or small bugs. Focus on whether the PR should exist
278
+ > in this shape.
279
+ >
280
+ > Review these dimensions:
281
+ >
282
+ > 1. **Problem fit** -- Does the PR solve the actual problem, or only a symptom?
283
+ > 2. **Abstraction boundary** -- Is the solution implemented at the right layer, or does
284
+ > it bypass the component, tool, or owner that should own the behavior?
285
+ > 3. **Existing mechanisms** -- Does the repo, framework, platform, package manager, or
286
+ > third-party tool already provide a better solution?
287
+ > 4. **Scope control** -- Does the PR spread one issue across too many files, docs,
288
+ > scripts, configs, workflows, or user surfaces?
289
+ > 5. **Maintenance cost** -- Does the solution create custom code that must track external
290
+ > behavior, file formats, CLI output, or conventions unnecessarily?
291
+ > 6. **Operational behavior** -- Does the solution change user workflows, CI behavior,
292
+ > failure modes, or target semantics in ways not justified by the problem?
293
+ > 7. **Evidence quality** -- Does the PR prove the problem and chosen solution, or does it
294
+ > look like an "AI fix this" response to a guessed root cause?
295
+ > 8. **Education opportunity** -- If the author seems to misunderstand a tool, framework,
296
+ > or architecture boundary, identify the missing mental model factually and
297
+ > non-personally.
298
+ >
299
+ > For each finding, report:
300
+ > - **File(s) or PR area involved**
301
+ > - The **claimed or inferred problem**
302
+ > - Why the solution is **mismatched or over-scoped**
303
+ > - The **existing mechanism or simpler alternative**
304
+ > - **Evidence** from the repo, docs, commands, or reviewer comments
305
+ > - **Confidence** (0-100)
306
+ > - **Severity**: Low, Medium, High
307
+ >
308
+ > At the end, produce:
309
+ >
310
+ > | Dimension | Score (0-100) | Finding | Better Direction |
311
+ > |-----------|--------------:|---------|------------------|
312
+ >
313
+ > Tag every finding with `[SOLUTION_FIT]`.
314
+
315
+ ---
316
+
317
+ ### Phase 2: Calibration review
318
+
319
+ Launch a **separate, independent** `general` subagent. This agent receives ALL findings
320
+ from all Phase 1 lenses, the original files, the problem reconstruction, reviewer
321
+ comments, the codebase context, and the idiom baseline.
322
+
323
+ > You are a senior staff engineer performing calibration review. You are fair, precise,
324
+ > and allergic to false positives. Your job is to take findings from the parallel
325
+ > reviewers (AI Authorship, Idiom Fluency, Code Quality, Architecture and Solution-Fit)
326
+ > and produce a unified,
327
+ > calibrated assessment.
328
+ >
329
+ > **For each finding, you must:**
330
+ >
331
+ > 1. Read the actual code at the referenced file:line
332
+ > 2. Read the surrounding context (the full function, the file's imports, nearby code)
333
+ > 3. Check the codebase context and idiom baseline -- does this project have a convention
334
+ > that makes this OK?
335
+ > 4. Assign a **confidence score (0-100)** using this rubric:
336
+ > - **0-25:** False positive. The finding is wrong or irrelevant.
337
+ > - **26-50:** Nitpick. Technically true but not worth acting on.
338
+ > - **51-70:** Low severity. Real issue but minor impact.
339
+ > - **71-85:** Verified real. Clear problem that should be fixed.
340
+ > - **86-100:** Confirmed critical. Significant issue affecting correctness, security,
341
+ > or maintainability.
342
+ > 5. Render a **verdict**:
343
+ > - **CONFIRMED** -- this is a real finding. Explain why it survives scrutiny.
344
+ > - **DOWNGRADED** -- real but less severe than the scanner claimed. Adjust score and explain.
345
+ > - **DISMISSED** -- false positive or nitpick. Explain what the scanner got wrong.
346
+ > - **ESCALATED** -- worse than the scanner realized. Explain the additional concern.
347
+ > 6. **Re-tag** if the finding was categorized under the wrong lens (e.g., an idiom
348
+ > finding tagged `[CODE_QUALITY]` should be re-tagged `[IDIOM]`).
349
+ > 7. Explicitly answer the solution-fit questions:
350
+ > - Could this code be locally acceptable but still the wrong solution?
351
+ > - Did the implementation choose the wrong owner or abstraction boundary?
352
+ > - Did reviewer comments reveal a system-level objection the code lenses missed?
353
+ > - Are there signs the engineer or AI assistant misunderstood a tool, framework, or
354
+ > repo convention?
355
+ > - Should the grade change because the solution is strategically poor even if the diff
356
+ > is small?
357
+ >
358
+ > **Cross-finding analysis:**
359
+ >
360
+ > After processing individual findings, perform cross-lens analysis:
361
+ > - **Missed findings:** Flag anything the Phase 1 scanners missed that you notice while
362
+ > verifying. The scanners may have been so focused on their checklists that they
363
+ > overlooked issues hiding in plain sight.
364
+ > - **Cross-lens patterns:** Identify cases where findings from different lenses
365
+ > reinforce each other (e.g., an `[AI_AUTHORSHIP]` contextual blindness finding
366
+ > combined with an `[IDIOM]` finding on the same code strongly suggests AI generation).
367
+ > Note these correlations explicitly.
368
+ > - **Solution-fit patterns:** Do not treat `[SOLUTION_FIT]` findings as optional
369
+ > appendices. If the implementation strategy is wrong, it must affect the top-line grade.
370
+ > - **Reviewer comment classification:** Classify each substantive human reviewer comment:
371
+ >
372
+ > | Status | Meaning |
373
+ > |--------|---------|
374
+ > | Supported | Evidence confirms the reviewer is raising a real solution or code issue. |
375
+ > | Partially supported | The concern is directionally right, but narrower or lower severity. |
376
+ > | Not supported | The reviewer concern does not hold after checking repo reality. |
377
+ > | Needs clarification | The PR does not contain enough evidence to decide. |
378
+ >
379
+ > **File-level authorship table:**
380
+ >
381
+ > Produce a per-file authorship assessment for EVERY file in scope, incorporating
382
+ > Phase 1a's assessments and your own calibration:
383
+ >
384
+ > | File | AI Likelihood (0-100) | Calibrated Confidence | Key Signals | Verdict |
385
+ > |------|----------------------|----------------------|-------------|---------|
386
+ >
387
+ > Your output is the complete calibrated finding list with scores, verdicts, reasoning,
388
+ > cross-lens correlations, reviewer-comment classifications, solution_fit_score, and the
389
+ > file-level authorship table.
390
+
391
+ Provide the subagent with:
392
+ - All Phase 1a, 1b, 1c, and 1d findings
393
+ - The original files under review (so it can re-read them independently)
394
+ - The problem reconstruction, reviewer comments, codebase context, and idiom baseline from Step 0
395
+
396
+ ---
397
+
398
+ ### Phase 3: Synthesize, grade, and report
399
+
400
+ Merge the calibrated findings into the output format below. Apply these thresholds
401
+ for finding inclusion:
402
+
403
+ - **Score >= 70:** Include in the main report sections
404
+ - **Score 50-69:** Include in a borderline appendix
405
+ - **Score < 50:** Include in the dismissed findings section
406
+
407
+ #### Grading algorithm
408
+
409
+ Compute local code scores first, then combine them with solution-fit for the final grade.
410
+
411
+ **Step 1: Per-file dimension scores**
412
+
413
+ - **AI Likelihood** -- use the calibrated per-file score from Phase 2 (0-100)
414
+ - **Idiom Score** -- aggregate confirmed `[IDIOM]` findings for the file using
415
+ density-weighted mean: `mean(finding_scores) * (1 + log2(count))`, capped at 100.
416
+ If no idiom findings, score is 0.
417
+ - **Quality Score** -- aggregate confirmed `[CODE_QUALITY]` findings the same way:
418
+ `mean(finding_scores) * (1 + log2(count))`, capped at 100. If no quality findings,
419
+ score is 0.
420
+
421
+ **Step 2: Weighted file score**
422
+
423
+ ```
424
+ file_score = (0.10 * ai_likelihood) + (0.40 * idiom_score) + (0.50 * quality_score)
425
+ ```
426
+
427
+ Weights reflect that this is a *slop* review, not an *authorship* review. Good
428
+ AI-written code that follows idioms and has no quality issues should score well.
429
+ Authorship signals serve as corroborating evidence, not a primary driver.
430
+
431
+ **Step 3: Local code rollup**
432
+
433
+ ```
434
+ code_local_score = Σ(file_score * file_loc) / Σ(file_loc)
435
+ ```
436
+
437
+ Weight by lines of code so a 500-line file with issues matters more than a 10-line
438
+ utility.
439
+
440
+ **Step 4: Solution-fit score**
441
+
442
+ Use the calibrated Phase 1d and Phase 2 result as `solution_fit_score` (0-100). If Phase
443
+ 1d was not applicable because the scope was a tiny local edit, omit `solution_fit_score`
444
+ and use `code_local_score` as the final score.
445
+
446
+ For PRs and non-trivial changes:
447
+
448
+ ```
449
+ final_score = (0.60 * code_local_score) + (0.40 * solution_fit_score)
450
+ ```
451
+
452
+ For PRs whose purpose is architecture, tooling, workflows, infrastructure, developer
453
+ experience, or process, solution fit matters equally:
454
+
455
+ ```
456
+ final_score = (0.50 * code_local_score) + (0.50 * solution_fit_score)
457
+ ```
458
+
459
+ This matters because AI-generated PRs often have clean syntax and decent local hygiene
460
+ while choosing the wrong overall approach.
461
+
462
+ **Step 5: Letter grade and verdict**
463
+
464
+ | Grade | Score | Verdict |
465
+ |-------|-------|---------|
466
+ | A | 0-20 | Clean |
467
+ | B | 21-40 | Mild concerns |
468
+ | C | 41-60 | Significant concerns |
469
+ | D | 61-80 | Strong slop signals |
470
+ | F | 81-100 | Pervasive slop |
471
+
472
+ ---
473
+
474
+ ## Universal Slop Signals
475
+
476
+ These apply to every language. The language-specific reference files add to these,
477
+ they do not replace them.
478
+
479
+ ### Structural tells
480
+ - Functions named after *what they do* rather than *what they represent*
481
+ (`processDataAndValidateInput`, `handleRequestAndReturnResponse`)
482
+ - Comments that restate the code verbatim -- no "why", only "what"
483
+ - Abstractions with exactly one implementation (premature interface or protocol or trait invention)
484
+ - Happy-path-only logic -- edge cases (nil or null or empty or zero or overflow) simply absent
485
+ - Hardcoded values that belong in config or named constants
486
+ - Inconsistent error message casing or formatting vs. the rest of the codebase
487
+
488
+ ### Defensive over-engineering
489
+ - `try/except` or error handling around operations that cannot fail in context
490
+ - Redundant nil or null checks on values the type system or caller already guarantees
491
+ - Validation of internal function arguments that are only called from trusted code
492
+ - Feature flags, backwards-compatibility shims, or configuration for single-use code
493
+ - Factory or builder or strategy patterns used exactly once
494
+
495
+ ### Documentation noise
496
+ - Docstrings that restate the function signature in prose ("Takes an X and returns a Y")
497
+ - `# increment counter` above `counter += 1`
498
+ - Module-level docstrings that describe what the file contains rather than why it exists
499
+ - Every function documented even when the name + signature is self-explanatory
500
+ - Type annotations in docstrings that duplicate the actual type annotations
501
+
502
+ ### Copy-paste signatures
503
+ - Multiple functions with near-identical parameter lists suggesting generated boilerplate
504
+ - Repeated structural patterns (same try/catch shape, same logging preamble) across
505
+ unrelated functions -- human code tends to vary more
506
+ - Suspiciously uniform formatting that does not match the rest of the file
507
+
508
+ ### Test quality signals
509
+ - Tests named `TestSuccess` or `TestFailure` or `test_basic` with no scenario specificity
510
+ - Mocks that mock so much they do not test anything real
511
+ - No property-based, table-driven, or parametrized tests where the problem calls for them
512
+ - Assertions that only check happy-path return values, never error payloads or side effects
513
+ - Missing coverage for concurrency, timeout, and cancellation paths
514
+ - Test functions that verify the code compiles or runs, not that it *behaves* correctly
515
+
516
+ ### The strongest signal: contextual blindness
517
+
518
+ Code that would pass review in isolation but is clearly unaware of its surroundings:
519
+ - Different error handling style than the file it lives in
520
+ - A new utility function that duplicates one nearby
521
+ - A new abstraction that ignores the established codebase pattern
522
+ - A different logger or serializer or HTTP client or ORM pattern than everything else uses
523
+ - Import style that does not match the rest of the project
524
+
525
+ AI generates locally coherent code. It rarely generates *contextually* coherent code.
526
+ This is the single most reliable signal and should be weighted heavily.
527
+
528
+ ### Solution-level slop signals
529
+
530
+ Generated work can look competent file-by-file while still choosing the wrong solution.
531
+ Flag these as `[SOLUTION_FIT]` when evidence supports them:
532
+
533
+ | Signal | Description |
534
+ |--------|-------------|
535
+ | Symptom patching | The PR fixes the observed error but not the root cause. |
536
+ | Wrong owner | Logic is added outside the component, tool, or layer that should own it. |
537
+ | Custom wrapper over managed tool | New scripts parse or enforce behavior already owned by a package manager, framework, or platform. |
538
+ | Multi-surface workaround | One issue is patched in code, scripts, docs, and CI without proving why all are needed. |
539
+ | Evidence-free root cause | The PR assumes a cause but does not reproduce or verify it. |
540
+ | Defensive generality | A generic framework is created before there is a repeated need. |
541
+ | Policy split | Two commands or code paths now enforce different rules for the same concern. |
542
+ | Documentation as retrofit | Docs are updated to justify the new workaround rather than explain established team workflow. |
543
+
544
+ Concrete regression scenario: `BACtrack/bacstack#430`
545
+ (`https://github.com/BACtrack/bacstack/pull/430`) should be treated as a pressure test.
546
+ The improved review should identify PATH/tool resolution drift as the actual problem,
547
+ check whether `mise exec -- ...` already provides the command execution boundary, mark a
548
+ custom `scripts/check_tool_version.sh` wrapper as the wrong solution boundary if evidence
549
+ confirms it, classify reviewer comments as solution-level signals, and downgrade the
550
+ overall grade even if local shell quality is acceptable.
551
+
552
+ When identifying a skill or mental-model gap, phrase it as an education opportunity, not
553
+ personal criticism. Good: "The PR suggests a mise mental-model gap: `mise.toml` was
554
+ treated as a manifest to parse manually rather than making `mise exec` the execution
555
+ boundary for managed tools." Bad: "The author does not understand mise."
556
+
557
+ ---
558
+
559
+ ## Output Format
560
+
561
+ ```markdown
562
+ ## AI Slop Review: <filename, directory, or PR scope>
563
+
564
+ **Scope:** <what was reviewed -- files, line count, language(s)>
565
+ **Grade:** [A-F] (<final_score>/100)
566
+ **Local Code Score:** <code_local_score>/100
567
+ **Solution-Fit Score:** <solution_fit_score>/100 or "Not applicable for this scope"
568
+ **Verdict:** [Clean / Mild concerns / Significant concerns / Strong slop signals / Pervasive slop]
569
+ **Confidence:** [High / Medium / Low] -- how confident the review is in its verdict
570
+
571
+ ### Solution-Level Assessment
572
+
573
+ | Dimension | Score | Finding | Better Direction |
574
+ |-----------|------:|---------|------------------|
575
+ | Problem understanding | 70 | ... | ... |
576
+ | Solution fit | 86 | ... | ... |
577
+ | Maintenance burden | 82 | ... | ... |
578
+ | Target ownership | 76 | ... | ... |
579
+ | Documentation scope | 65 | ... | ... |
580
+
581
+ ### Evidence Checked
582
+
583
+ | Check | Observed Result | Assessment |
584
+ |-------|-----------------|------------|
585
+ | command, repo fact, reviewer comment, or code path | output/result | why it matters |
586
+
587
+ ### Reviewer Comment Classification
588
+
589
+ | Comment | Status | Evidence | Assessment |
590
+ |---------|--------|----------|------------|
591
+ | reviewer concern | Supported / Partially supported / Not supported / Needs clarification | checked fact | what it means |
592
+
593
+ ### Education Opportunity
594
+
595
+ <If the author appears to misunderstand a tool, framework, or architecture boundary,
596
+ call it out factually and non-personally. Focus on the missing mental model and how to
597
+ teach it. Omit this section if there is no evidence of a teachable misunderstanding.>
598
+
599
+ ### Solution-Fit Findings
600
+
601
+ | # | Area | Signal | Finding | Better Direction | Confidence | Verdict |
602
+ |---|------|--------|---------|------------------|------------|---------|
603
+ | 1 | Makefile/scripts/docs | Wrong owner | description | use existing mechanism | 86 | CONFIRMED |
604
+
605
+ ### File-Level Assessment
606
+
607
+ | File | LOC | AI (0.10) | Idiom (0.40) | Quality (0.50) | Score | Grade |
608
+ |------|-----|-----------|--------------|----------------|-------|-------|
609
+ | path/to/file.go | 245 | 72 | 65 | 80 | 73.2 | D |
610
+
611
+ ### AI Authorship Signals
612
+ | # | File:Line | Signal | Finding | Confidence | Verdict |
613
+ |---|-----------|--------|---------|------------|---------|
614
+ | 1 | path:42 | Contextual blindness | description | 85 | CONFIRMED |
615
+
616
+ ### Idiom Violations
617
+ | # | File:Line | Signal | Finding | Idiomatic Alternative | Confidence | Verdict |
618
+ |---|-----------|--------|---------|----------------------|------------|---------|
619
+ | 1 | path:17 | Modern features | description | what it should look like | 78 | CONFIRMED |
620
+
621
+ ### Code Quality
622
+ | # | File:Line | Signal | Finding | Confidence | Verdict |
623
+ |---|-----------|--------|---------|------------|---------|
624
+ | 1 | path:99 | Dead code | description | 90 | CONFIRMED |
625
+
626
+ ### Positive Signals
627
+ - <things done well that indicate human authorship or good AI-assisted practice>
628
+
629
+ ### Borderline Findings (score 50-69)
630
+ | # | File:Line | Lens | Finding | Confidence | Verdict |
631
+ |---|-----------|------|---------|------------|---------|
632
+
633
+ ### Dismissed Findings
634
+ <collapsed or brief -- shows what the scanners flagged but calibration removed,
635
+ so the user can see the review was thorough without being noisy>
636
+ ```
637
+
638
+ If the code is clean or only has minor issues, say so directly. The goal is an honest,
639
+ calibrated assessment -- not finding problems for their own sake.
640
+
641
+ ---
642
+
643
+ ## Language Reference Files
644
+
645
+ Language-specific signals live in `.opencode/skills/ai-slop-review/references/`.
646
+ Only read the ones relevant to the code under review. Each reference file includes a
647
+ "What Idiomatic Looks Like" section that Phase 1b uses alongside the project's idiom baseline:
648
+
649
+ - `.opencode/skills/ai-slop-review/references/go.md` -- Go idioms, error handling, context propagation, concurrency
650
+ - `.opencode/skills/ai-slop-review/references/python.md` -- Python idioms, type hints, async, common footguns
651
+ - `.opencode/skills/ai-slop-review/references/rust.md` -- Rust ownership, error handling, type system, unsafe
652
+ - `.opencode/skills/ai-slop-review/references/svelte-ts.md` -- Svelte reactivity, SvelteKit patterns, TypeScript usage
653
+
654
+ If the code is in a language not covered by a reference file, rely on the universal
655
+ signals and your general knowledge of that language's idioms.
656
+
657
+ ---
658
+
659
+ ## Adapting to the codebase
660
+
661
+ Every codebase has its own conventions. Before flagging something as slop, check:
662
+
663
+ 1. **Project guidance** -- Do `AGENTS.md`, `CLAUDE.md`, `README.md`, or nearby contributor docs make this pattern OK?
664
+ 2. **Existing code** -- Is this pattern used elsewhere in the project? If yes, it is a
665
+ convention, not slop -- even if it would not be idiomatic in a greenfield project.
666
+ 3. **Framework conventions** -- Some frameworks encourage patterns that look odd in
667
+ isolation (e.g., Dagster's `@asset` decorators, Django's class-based views).
668
+ Do not flag framework-conventional code as slop.
669
+ 4. **Team size and stage** -- A 2-person startup codebase has different quality norms
670
+ than a 50-person team's production system. Calibrate accordingly.
671
+
672
+ The Phase 1 scanners should flag potential issues regardless. The Phase 2 calibration
673
+ reviewer is where this nuance gets applied.
674
+
675
+ ---
676
+
677
+ ## Step 4: Output Actions
678
+
679
+ After the review is synthesized, determine how to surface the findings based on the
680
+ review scope. Apply these defaults, then let the user override:
681
+
682
+ - **PR review:** Default to Option B (PR inline comments)
683
+ - **Codebase audit:** Default to Option A (review branch with markdown)
684
+
685
+ Present the user with the options below and the recommended default. Explain the
686
+ trade-offs and let them choose.
687
+
688
+ ### Option A: Review branch with markdown report
689
+
690
+ Best for: full-codebase audits, team-wide visibility, archival.
691
+
692
+ 1. Create a new branch from the current HEAD: `<user>/ai-slop-review`
693
+ 2. Write the full review to `AI_SLOP_REVIEW.md` in the repo root
694
+ 3. Commit and push the branch
695
+ 4. Tell the user the branch is ready -- they can open a PR for team discussion
696
+ or keep it as a reference artifact
697
+
698
+ This creates a durable record that does not clutter the main branch but is
699
+ accessible to the whole team.
700
+
701
+ ### Option B: PR inline comments
702
+
703
+ Best for: PR-scoped reviews, when findings map to specific changed lines,
704
+ when the team does code review via GitHub.
705
+
706
+ 1. Identify the PR (from user input or current branch's open PR)
707
+ 2. For each confirmed finding, post an inline review comment at the exact
708
+ file and line using `gh api` to create a pull request review:
709
+ ```bash
710
+ gh api repos/{owner}/{repo}/pulls/{pr}/reviews -f event=COMMENT \
711
+ -f body="AI Slop Review: found N issues" \
712
+ -f 'comments[][path]=...' -f 'comments[][line]=...' \
713
+ -f 'comments[][body]=...'
714
+ ```
715
+ 3. Group related findings into a single review submission
716
+ 4. Include the verdict and confidence in the review summary comment
717
+
718
+ Format each inline comment as:
719
+
720
+ ```text
721
+ **[Signal: <category>]** <finding description>
722
+
723
+ <why this matters and what idiomatic code would look like>
724
+ ```
725
+
726
+ Keep comments concise -- a reviewer, not an essay writer.
727
+
728
+ ### Option C: GitHub Issues
729
+
730
+ Best for: tech debt tracking, when findings need to be assigned and scheduled,
731
+ when the team uses issues for work management.
732
+
733
+ 1. Ask the user if they want a milestone created (e.g., "AI Slop Cleanup")
734
+ 2. Create labels if they do not exist: `ai-slop`, plus severity labels
735
+ 3. For each confirmed finding (or group of related findings), create a
736
+ GitHub issue with:
737
+ - Descriptive title
738
+ - SHA-pinned permalink(s) to the offending code
739
+ - The signal category and severity from the review
740
+ - Suggested fix approach
741
+ - The appropriate labels and milestone
742
+ 4. Pin critical issues if there are 3 or fewer
743
+ 5. Report the created issue numbers back to the user
744
+
745
+ Group related findings into single issues where it makes sense (e.g.,
746
+ "4 instances of bare except Exception: pass" is one issue, not four).
747
+
748
+ ### Option D: Combined (Review branch + one of the above)
749
+
750
+ The user may want both the archival markdown AND actionable items. If they
751
+ choose this, do the review branch first (Option A), then apply Option B or C.
752
+ Update the issue or comment links to point at the review markdown for full context.
753
+
754
+ ### Asking the user
755
+
756
+ After presenting the review summary, ask:
757
+
758
+ > How would you like to surface these findings?
759
+ > - **Branch** -- commit the review to a `<user>/ai-slop-review` branch
760
+ > - **PR comments** -- post inline comments on a PR
761
+ > - **Issues** -- create GitHub issues for tracking
762
+ > - **Branch + Issues** or **Branch + PR comments** -- both