npm - swift-code-reviewer-skill - Versions diffs - 1.2.1 → 1.3.0 - Mend

swift-code-reviewer-skill 1.2.1 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +8 -0
package/SKILL.md +122 -4
package/package.json +1 -1
package/references/agent-loop-feedback.md +148 -0
package/references/spec-adherence.md +157 -0

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [1.3.0] - 2026-05-07
+### Added
+- **Spec adherence review** (`references/spec-adherence.md`) — validates implementation against PR description and linked issues, flagging scope drift and unimplemented requirements
+- **Agent-loop feedback** (`references/agent-loop-feedback.md`) — meta-review layer that identifies recurring patterns suggesting gaps in the agent's own instructions, improving future AI-generated code quality
 ## [1.2.1] - 2026-04-21
 ### Fixed
@@ -113,6 +120,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## Version History Summary
+- **1.3.0** (2026-05-07): Add spec adherence review and agent-loop meta-feedback layer
 - **1.2.1** (2026-04-21): Fix installer not copying `skills/` and `templates/` directories
 - **1.2.0** (2026-04-21): Bundle five companion Swift skills, add `init` scaffolding command, skill-review CI action, SKILL.md condensed 71%
 - **1.1.1** (2026-03-24): Fix incorrect `install-skill.sh` (was XcodeBuildMCP installer)

package/SKILL.md CHANGED Viewed

@@ -1,26 +1,68 @@
 ---
 name: swift-code-reviewer
-description: "Multi-layer code review agent for Swift and SwiftUI projects. Analyzes PRs, diffs, and files across six dimensions: Swift 6+ concurrency safety, SwiftUI state management and modern APIs, performance (view updates, ForEach identity, lazy loading), security (force unwraps, Keychain, input validation), architecture compliance (MVVM/MVI/TCA, dependency injection), and project-specific standards from .claude/CLAUDE.md. Outputs structured reports with Critical/High/Medium/Low severity, positive feedback, and prioritized action items with file:line references. Use when the user says review this PR, review my code, review my changes, check this file, code review, audit this codebase, check code quality, review uncommitted changes, review all ViewModels, or mentions reviewing .swift files, navigation, sheets, theming, or async patterns."
+description: "Perform thorough code reviews for Swift/SwiftUI code, including spec adherence (PR description + linked issues), code quality, architecture, performance, security, Swift 6+ best practices, project standards from .claude/CLAUDE.md, and meta-feedback on recurring patterns that suggest gaps in the agent's instructions. Use when reviewing PRs/MRs (especially AI-generated ones), performing quality audits, validating against original spec, or providing structured feedback with severity levels and improvement suggestions for both the code and the agent loop that produced it."
 ---
 # Swift/SwiftUI Code Review Skill
 Multi-layer review covering Swift 6+ concurrency, SwiftUI patterns, performance, security, architecture, and project-specific standards. Reads `.claude/CLAUDE.md` and outputs Critical/High/Medium/Low severity findings with `file:line` references and before/after code examples.
+## When to Use This Skill
+- "Review this PR"
+- "Review my code" / "Review my changes" / "Review uncommitted changes"
+- "Code review for [component]"
+- "Audit this codebase" / "Check code quality"
+- "Review against .claude/CLAUDE.md" / "Check if this follows our coding standards"
+- "Architecture review" / "Performance audit" / "Security review"
+- "Review this PR against the spec"
+- "Did the agent miss anything from issue #123?"
+- "What rules am I missing in CLAUDE.md based on this PR?"
+- "Review this AI-generated PR"
 ## Workflow
 ### Phase 1 — Context Gathering
-1. Try to load `.claude/CLAUDE.md`.
+1. **Read the Spec**
+   - For PRs: `gh pr view <num> --json title,body,closingIssuesReferences,labels`
+   - For linked issues: `gh issue view <num> --json title,body,labels`
+   - For MRs: `glab mr view <num>` and `glab issue view <num>`
+   - Extract:
+     - Stated goal / problem being solved
+     - Explicit acceptance criteria (look for checkboxes, "should", "must", "Given/When/Then")
+     - Edge cases or non-goals mentioned
+     - Out-of-scope items
+   - If no PR/issue context is available, note this and fall back to inferring intent from the diff.
+2. Try to load `.claude/CLAUDE.md`.
    - **If missing**: add a note to the report — _"No project standards file found — review uses default Apple guidelines"_ — then continue.
-2. Obtain the changeset: `git diff`, `git diff --cached`, or `gh pr diff <n>`.
+3. Obtain the changeset: `git diff`, `git diff --cached`, or `gh pr diff <n>`.
    - **If diff is empty**: stop and ask the user to specify files, a PR number, or a directory.
-3. Read each changed file plus key related files (imports, protocols it conforms to, corresponding test file if present).
+4. Read each changed file plus key related files (imports, protocols it conforms to, corresponding test file if present).
 ### Phase 2 — Analysis
 For each category, load the reference file before writing findings:
+#### 0. Spec Adherence
+Reference: `references/spec-adherence.md`
+- **Requirement Coverage**
+  - Does each acceptance criterion map to a concrete code change?
+  - Are edge cases mentioned in the spec handled?
+  - Are tests covering the scenarios described?
+- **Scope Discipline**
+  - Flag changes outside the stated scope (scope creep)
+  - Flag unrelated refactors bundled into the PR
+- **Missing Work**
+  - TODOs, `fatalError("not implemented")`, empty function bodies
+  - Stubbed mocks that should be real implementations
+  - Acceptance criteria with no corresponding diff
+- **Intent Drift**
+  - Code solves a *similar* but different problem than stated
+  - Naming/structure suggests a different mental model than the spec
 1. **Swift Quality** — concurrency, error handling, optionals, naming → `references/swift-quality-checklist.md`; for concurrency findings also read `skills/swift-concurrency/references/sendable.md` and `actors.md`
 2. **SwiftUI Patterns** — property wrappers, state management, deprecated APIs → `references/swiftui-review-checklist.md`; for wrapper selection read `skills/swiftui-expert-skill/references/state-management.md`
 3. **Performance** — view body cost, ForEach identity, lazy loading, retain cycles → `references/performance-review.md`
@@ -31,6 +73,28 @@ For each category, load the reference file before writing findings:
 For test file findings, consult `skills/swift-testing/references/test-organization.md`.
 For navigation/routing findings, consult `skills/swiftui-ui-patterns/references/navigationstack.md`.
+### Phase 2.5 — Pattern Detection (for Agent Loop Feedback)
+**Objective**: Identify recurring issues that point to gaps in the agent's
+instructions, not just the code.
+After collecting per-file findings, aggregate them:
+1. Group findings by rule (e.g., "force-unwrap", "deprecated NavigationView",
+   "missing @MainActor on UI mutation").
+2. Mark any rule that fires **≥2 times across the diff** as a recurring pattern.
+3. For each recurring pattern, draft a one-line rule suitable for
+   `.claude/CLAUDE.md` or an agent system prompt — written as a directive,
+   not a description.
+4. If the same recurring pattern appeared in past reviews (check git log of
+   `.claude/CLAUDE.md`), escalate priority — the existing rule isn't strong
+   enough or isn't being read.
+Threshold rationale: one occurrence is a slip; two is a pattern; three+ means
+the agent's instructions are silent on this and need an explicit rule.
+Reference: `references/agent-loop-feedback.md`.
 ### Phase 3 — Report
 Group findings by file → sort by severity within each file → write prioritized action items.
@@ -102,6 +166,22 @@ Also migrate from `ObservableObject`/`@Published` to `@Observable` (iOS 17+) —
 ## Summary
 Files: N | Critical: N | High: N | Medium: N | Low: N
+## Spec Adherence
+**Source**: PR #123 / Issue #456
+| Requirement | Status | Location |
+|-------------|--------|----------|
+| User can log in with email | ✅ Implemented | LoginView.swift:23 |
+| Show error on invalid credentials | ⚠️ Partial — missing 401 case | LoginViewModel.swift:67 |
+| Persist session in Keychain | ❌ Not implemented | — |
+| Rate limit retries | ❌ Not implemented | — |
+**Scope creep**: 1 unrelated change (UserSettings.swift refactor) — recommend
+splitting into a separate PR.
+---
 ## <Filename.swift>
 [Severity] **<Category>** (line N)
@@ -115,6 +195,33 @@ Fix: <explanation + corrected snippet>
 - [Must fix] ...
 - [Should fix] ...
 - [Consider] ...
+---
+## Agent Loop Feedback
+Recurring patterns suggest the following rules are missing or under-emphasized
+in `.claude/CLAUDE.md`:
+### Pattern: Force-unwraps (4 occurrences)
+**Files**: LoginView.swift:89, NetworkService.swift:34, UserRepo.swift:12,78
+**Suggested rule**:
+> Never use `!`, `try!`, or `as!`. Use `guard let` with explicit early return,
+> typed throws, or `as?` with handling. Force-unwraps are crashes waiting to happen.
+### Pattern: Deprecated NavigationView (2 occurrences)
+**Files**: ProfileView.swift:15, SettingsView.swift:22
+**Suggested rule**:
+> Use `NavigationStack` exclusively. `NavigationView` is deprecated as of iOS 16.
+### Pattern: Business logic in View body (3 occurrences)
+**Files**: LoginView.swift:45, ProfileView.swift:78, FeedView.swift:34
+**Suggested rule**:
+> Views must not contain business logic, network calls, or data transformations.
+> Move all such work into the @Observable view model.
 ```
 Full templates and severity classification: `references/feedback-templates.md`.
@@ -149,10 +256,21 @@ git diff HEAD~1      # last commit
 git diff -- path/to/file.swift
 ```
+## Limitations
+- Spec adherence checks require an accessible PR description or linked issue.
+  When reviewing local changes with no PR context, mark spec adherence as
+  "not assessed" rather than guessing intent.
+- Agent loop feedback assumes the code was AI-generated or AI-assisted. For
+  fully human-written code, recurring patterns are still useful but should be
+  framed as team coding standards rather than agent instructions.
 ## Reference Files
 - `references/review-workflow.md` — detailed process, diff parsing, git commands
 - `references/feedback-templates.md` — output templates, severity classification
+- `references/spec-adherence.md` — parsing PR/issue specs, requirement coverage tables, scope creep classification
+- `references/agent-loop-feedback.md` — recurring-pattern threshold, directive phrasing, suggested-rule template
 - `references/swift-quality-checklist.md` — Swift 6+, concurrency, optionals, naming
 - `references/swiftui-review-checklist.md` — property wrappers, state, modern APIs
 - `references/performance-review.md` — view optimization, ForEach, resource management

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "swift-code-reviewer-skill",
-  "version": "1.2.1",
+  "version": "1.3.0",
   "description": "Claude Code skill for comprehensive Swift/SwiftUI code reviews with multi-layer analysis",
   "keywords": [
     "claude",

package/references/agent-loop-feedback.md ADDED Viewed

@@ -0,0 +1,148 @@
+# Agent Loop Feedback Reference
+When the code under review was generated by an AI agent, recurring mistakes
+are not just *code* problems — they are *instruction* problems. This document
+defines how the reviewer aggregates per-finding signals into rule suggestions
+that can be added to `.claude/CLAUDE.md` or an agent system prompt to prevent
+the same class of issue next time.
+---
+## 1. The ≥2 Threshold
+A single instance of a mistake is a slip. Two is a pattern. Three or more
+means the agent's instructions are silent on the topic and need an explicit
+rule.
+Rules of thumb:
+| Occurrences in diff | Treatment                                                                         |
+| ------------------- | --------------------------------------------------------------------------------- |
+| 1                   | Per-file finding only. Do not surface in Agent Loop Feedback.                     |
+| 2                   | Recurring pattern. Suggest a rule. Mark priority **medium**.                      |
+| 3+                  | Strong signal. Suggest a rule. Mark priority **high**.                            |
+| 2+ across PRs       | If the same rule has been suggested before (see §3), escalate to **high**.        |
+Count occurrences by **rule**, not by raw findings. For example, four force-unwrap
+findings in different files count as four occurrences of the rule
+*"never force-unwrap"*, not four separate rules.
+---
+## 2. Phrasing Rules as Directives
+Rules go in an instruction file the agent will *read*. Write them so the
+reader knows what to do without further interpretation.
+### Strong forms
+- **Never X.** — bans an action outright. Best for safety/security/crashes.
+- **Always Y.** — mandates an action. Best for required patterns.
+- **Prefer X over Y.** — gives a default with an implicit escape hatch. Best
+  for stylistic or modernization rules.
+- **Use X. Y is deprecated / forbidden.** — adds the *why* in five words.
+### Weak forms (avoid)
+- **"Try to..."** — agents will skip it under pressure.
+- **"It's a good idea to..."** — descriptive, not directive.
+- **"Consider..."** — fine in code review prose, useless as a rule.
+- **"X is bad"** — diagnostic, not prescriptive. Doesn't tell the agent what
+  to do instead.
+### Examples
+| Weak                                                   | Strong                                                                                              |
+| ------------------------------------------------------ | --------------------------------------------------------------------------------------------------- |
+| Force unwraps are dangerous.                           | Never use `!`, `try!`, or `as!`. Use `guard let` with an early return, typed throws, or `as?`.      |
+| It's better to use `NavigationStack`.                  | Use `NavigationStack` exclusively. `NavigationView` is deprecated as of iOS 16.                     |
+| Try to keep views simple.                              | Views must not contain business logic, network calls, or data transformations. Move all such work into the `@Observable` view model. |
+| Make sure UI updates happen on the main thread.        | Always annotate types that mutate `@Observable`/`@Published` state with `@MainActor`.               |
+| Don't put secrets in logs.                             | Never log values from `KeychainService`, `URLRequest.httpBody`, or types annotated `@Sensitive`.    |
+A good rule answers three questions in one sentence: *what is forbidden*,
+*what is the alternative*, and (briefly) *why*.
+---
+## 3. Checking Past Reviews
+Before suggesting a rule, check whether something similar was already
+suggested. If yes, the existing wording is not landing — escalate priority
+and consider strengthening the wording rather than restating it.
+```bash
+# Has anyone touched the rules file recently, and how?
+git log --oneline --follow .claude/CLAUDE.md
+git log -p --follow .claude/CLAUDE.md | grep -i "<keyword from new rule>"
+# Search for existing wording on the topic
+grep -in "force.unwrap\|navigationview\|mainactor" .claude/CLAUDE.md
+```
+If a rule on the same topic exists:
+1. Quote the current rule in the suggestion block.
+2. Explain why it is not preventing the pattern (too soft, too narrow,
+   buried, conditional).
+3. Propose a replacement, not an addition.
+If no rule exists, propose adding one in the most relevant section
+(`Concurrency`, `SwiftUI`, `Security`, `Architecture`, etc.).
+---
+## 4. Suggested-Rule Block — Template
+One block per recurring pattern. Place all blocks under a single
+`## Agent Loop Feedback` heading at the bottom of the report.
+```markdown
+### Pattern: <short name> (<N> occurrences)
+**Files**: <file:line>, <file:line>, ...
+**Suggested rule**:
+> <One-sentence directive in strong form. What is forbidden, what to do
+> instead, and one-clause why.>
+**Existing rule** (if any): <quote, with line reference into `.claude/CLAUDE.md`>
+**Why it's not landing** (only if existing rule): <too soft / too narrow / buried / etc.>
+**Priority**: <medium | high>
+```
+Worked example:
+```markdown
+### Pattern: Force-unwraps (4 occurrences)
+**Files**: LoginView.swift:89, NetworkService.swift:34, UserRepo.swift:12, UserRepo.swift:78
+**Suggested rule**:
+> Never use `!`, `try!`, or `as!`. Use `guard let` with explicit early return,
+> typed throws, or `as?` with handling. Force-unwraps are crashes waiting to happen.
+**Existing rule**: _.claude/CLAUDE.md:42_ — "Avoid force unwrapping when possible."
+**Why it's not landing**: "When possible" gives the agent a built-in opt-out.
+The replacement above bans the syntax outright and names the alternatives.
+**Priority**: high
+```
+---
+## 5. Human-Authored Code
+If the PR was written by a human (no AI assistance disclosed, no agent
+session metadata in commit messages), the same recurring patterns are still
+useful — but frame them as **team coding standards**, not agent instructions:
+- Replace "Suggested rule for the agent" with "Suggested team standard".
+- Drop the "Why it's not landing" clause; humans benefit more from a short
+  rationale than from instruction-tuning analysis.
+- Leave the directive phrasing intact — strong forms read better in human
+  style guides too.
+When unsure whether the code is AI-generated, default to the team-standards
+framing.

package/references/spec-adherence.md ADDED Viewed

@@ -0,0 +1,157 @@
+# Spec Adherence Reference
+This document describes how the reviewer extracts the *intent* of a change from
+its PR description and linked issues, and how it then judges whether the code
+delivers on that intent. Spec adherence runs before the language- and
+framework-level checks: a clean diff that misses the point shouldn't pass.
+---
+## 1. Parsing `gh` / `glab` JSON Output
+Always prefer the JSON output of the platform CLI over scraping the web UI —
+it is stable, scriptable, and includes linked-issue metadata.
+### GitHub
+```bash
+# PR body, title, labels, and the issues this PR closes
+gh pr view <num> --json title,body,closingIssuesReferences,labels
+# Linked issue (one per closing reference)
+gh issue view <num> --json title,body,labels
+# Reviewer-friendly summary in one shot
+gh pr view <num> --json title,body,closingIssuesReferences \
+  --jq '{title, body, issues: [.closingIssuesReferences[].number]}'
+```
+Fields to read:
+| Field                       | Use                                               |
+| --------------------------- | ------------------------------------------------- |
+| `title`                     | Short statement of intent — start here.           |
+| `body`                      | Acceptance criteria, scope, non-goals.            |
+| `closingIssuesReferences`   | Numbers of issues that this PR will close.        |
+| `labels`                    | `bug`, `feature`, `tech-debt` shape expectations. |
+### GitLab
+```bash
+glab mr view <num>           # human-readable; pipe to less
+glab mr view <num> --output json
+glab issue view <num> --output json
+```
+GitLab's MR description and linked issues serve the same role as GitHub's PR
+body and `closingIssuesReferences`.
+---
+## 2. Finding Acceptance Criteria
+Acceptance criteria are rarely labeled as such. Look for these patterns, in
+roughly this order:
+1. **Markdown checkboxes** — `- [ ] ...` or `- [x] ...`. The most reliable
+   signal. Each box is a discrete requirement.
+2. **Gherkin / Given-When-Then** — phrases starting with `Given`, `When`,
+   `Then`, or `And`. Common in BDD-flavored teams.
+3. **Modal verbs** — `must`, `should`, `shall`, `will`, `needs to`. Each
+   sentence is a candidate requirement; `must`/`shall` outrank `should`.
+4. **Numbered or bulleted lists** under headings like `Acceptance Criteria`,
+   `Requirements`, `Scope`, `Goals`, `What this PR does`.
+5. **"Closes #N" / "Fixes #N"** — pull the linked issue and repeat 1–4 there.
+If the PR has none of the above, treat the **title** as the single requirement
+and note the lack of explicit criteria in the report.
+---
+## 3. Handling PRs With No Description
+A blank or near-blank description is itself a finding. Do not invent intent.
+1. Note in the report: _"PR description is empty / minimal — spec adherence
+   inferred from diff and commit messages, may be incomplete."_
+2. Use, in order: linked issues, commit messages (`git log <base>..HEAD`),
+   branch name, file paths touched.
+3. List every inferred requirement explicitly so the author can correct any
+   misreading, prefixed with `(inferred)`.
+4. Do not penalize the diff for failing to satisfy a requirement that was
+   only inferred — flag the missing description instead.
+---
+## 4. Scope Creep vs. Legitimate Adjacent Fixes
+Not every out-of-spec change is scope creep. Use this rubric:
+| Change type                                                                  | Verdict                               |
+| ---------------------------------------------------------------------------- | ------------------------------------- |
+| Touches a file required by the spec, fixes an obvious nearby bug, < ~10 LOC  | **Allow** — note it; don't flag.      |
+| Renames or restructures a file the spec requires editing                     | **Allow if minimal**, otherwise flag. |
+| Drive-by formatting / style changes across many files                        | **Flag** — recommend separate PR.     |
+| Refactor of a module unrelated to the spec                                   | **Flag** — scope creep.               |
+| New feature not mentioned anywhere in spec                                   | **Flag** — scope creep, must justify. |
+| Dependency version bumps                                                     | **Flag** — separate PR by convention. |
+| Test additions for the spec'd code                                           | **Allow** — expected.                 |
+| Test additions for unrelated existing code                                   | **Allow but note** — usually welcome. |
+When flagging scope creep, always recommend the concrete remediation
+("split out into a follow-up PR" or "move to a separate commit if the team
+allows partial review").
+---
+## 5. Intent Drift
+The trickier failure mode: the diff *runs* but solves a subtly different
+problem than the spec. Symptoms:
+- Naming uses different domain terms than the spec (e.g., spec says
+  "session", code says "token").
+- Data flow contradicts the spec's mental model (e.g., spec says the server
+  is the source of truth, code caches and treats local as authoritative).
+- Edge cases the spec called out are silently excluded by an early `return`.
+- The PR title says "fix" but the diff is a rewrite, or vice versa.
+When you suspect intent drift, quote both the spec sentence and the code
+location side-by-side in the finding.
+---
+## 6. Requirement Coverage Table — Template
+Drop this into the Spec Adherence section of the report, one row per
+requirement extracted in step 2.
+```markdown
+## Spec Adherence
+**Source**: PR #<num> / Issue #<num>
+| Requirement                              | Status                              | Location                       |
+|------------------------------------------|-------------------------------------|--------------------------------|
+| <verbatim or paraphrased criterion>      | ✅ Implemented                       | `<file>:<line>`                |
+| <criterion with edge case>               | ⚠️ Partial — <what's missing>        | `<file>:<line>`                |
+| <criterion>                              | ❌ Not implemented                   | —                              |
+| <inferred criterion>                     | ✅ Implemented (inferred)            | `<file>:<line>`                |
+**Scope creep**: <count> unrelated change(s) — <one-line summary, recommend split>.
+**Spec gaps**: <count> criterion/criteria not addressed — see "Must fix" in
+prioritized action items.
+```
+Status legend (use these exact glyphs for greppability):
+- ✅ Implemented — code satisfies the criterion and tests, if any, cover it.
+- ⚠️ Partial — happy path works, but at least one edge case or branch is
+  missing. Always say *what* is missing.
+- ❌ Not implemented — no code addresses the criterion.
+- ➖ Not assessed — no spec context available; do not guess.
+If `status` is anything other than ✅, the corresponding action item belongs
+in **Must fix** or **Should fix** depending on whether the criterion was
+flagged `must`/`shall` versus `should`.