npm - @groupby/ai-dev - Versions diffs - 0.5.1 → 0.5.4 - Mend

@groupby/ai-dev 0.5.1 → 0.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/package.json +1 -1
package/teams/fhr-core/github/pull_request_template.md +19 -0
package/teams/fhr-core/resources/spec-template.md +22 -0
package/teams/fhr-core/skills/jira-ticket/SKILL.md +47 -0
package/teams/fhr-core/skills/plan-spec/SKILL.md +57 -0
package/teams/fhr-core/skills/refine-spec/SKILL.md +47 -0
package/teams/fhr-core/skills/tdd-green/SKILL.md +61 -0
package/teams/fhr-core/skills/tdd-red/SKILL.md +50 -0
package/teams/fhr-core/skills/tdd-workflow/SKILL.md +51 -0
package/teams/snpd/skills/README.md +306 -0
package/teams/snpd/skills/council-review/SKILL.md +243 -0
package/teams/snpd/skills/council-review/references/output-format.md +108 -0
package/teams/snpd/skills/council-review/references/reviewer-prompt.md +99 -0
package/teams/snpd/skills/council-review/references/technology-profiles.md +54 -0
package/teams/snpd/skills/council-review/scripts/summarize_review_config.py +226 -0
package/teams/snpd/skills/docs-init/SKILL.md +71 -11
package/teams/snpd/skills/docs-init-v2/SKILL.md +0 -402

package/teams/snpd/skills/council-review/SKILL.md ADDED Viewed

@@ -0,0 +1,243 @@
+---
+name: council-review
+description: "Multi-model code review council inspired by Karpathy's LLM Council. Spawns 3 sub-agents on different models (Claude Opus 4.8, GPT-5.3 Codex, GPT-5.5) to independently review code changes, then synthesizes and votes on the best comments to produce a unified, high-signal review. Use when the user says /council-review, 'council review', 'multi-model review', 'review council', or 'LLM council'."
+---
+# LLM Council Code Review
+## Purpose
+Provide a high-quality, consensus-driven code review by running **three independent
+reviewers on different LLM models**, then synthesizing their findings into a single
+review ranked by agreement and severity — similar to
+[Karpathy's LLM Council](https://github.com/karpathy/llm-council).
+The insight: different models catch different things. One model may spot a race
+condition another misses; one may flag a security issue the others gloss over.
+By requiring consensus, noise drops and signal rises.
+## Models Used (The Council)
+| Seat       | Model ID            | Strengths                                      |
+|------------|---------------------|-------------------------------------------------|
+| Reviewer A | `claude-opus-4.8`   | Deep reasoning, architecture, subtle logic bugs |
+| Reviewer B | `gpt-5.3-codex`     | Code-native, practical fixes, test gaps          |
+| Reviewer C | `gpt-5.5`           | Broad knowledge, security, API design            |
+## Trigger
+Activate this skill when the user says any of:
+- `/council-review`
+- `council review my changes`
+- `multi-model review`
+- `LLM council review`
+- `review council`
+## Inputs
+The user may provide:
+- **No argument** → review local uncommitted changes (staged + unstaged)
+- **`--staged`** → review only staged changes
+- **`--branch`** → review current branch diff vs `origin/main`
+- **`--commits <N>`** → review the last N commits (ignores uncommitted changes)
+- **`--commits <sha>..<sha>`** → review a specific commit range
+- **`--pr <number>`** → review a specific GitHub PR
+- **A file path or glob** → review only those files
+Natural language also works:
+- "review my last 2 commits" → same as `--commits 2`
+- "review last 3 commits before I open a PR" → same as `--commits 3`
+## Workflow
+### Phase 0: Repo Discovery
+Before reviewing any code, discover the **current repo's own rules**. Do not
+carry assumptions from another repo.
+1. **Confirm repository scope:**
+   - Run `git status --short` and `git branch --show-current`.
+   - Identify the repo root and project type.
+2. **Discover review configuration:**
+   - Check for these files and read them if present:
+     - `.github/workflows/claude-pr-review.yml` or `.github/workflows/claude.yml`
+     - `.github/workflows/build-pr.yaml`
+     - `.github/PULL_REQUEST_TEMPLATE.md`
+     - `.github/CODEOWNERS`
+   - Run this skill's bundled `scripts/summarize_review_config.py` script for a
+     quick context summary — it is repo-agnostic and works on any repository.
+     Resolve the script from this skill's own base directory (shown in your skill
+     context; do not hard-code a personal/author path) and pass the current repo
+     root as its argument:
+     ```bash
+     # <skill-dir> = this skill's base directory, e.g.
+     #   macOS/Linux: ~/.copilot/skills/council-review (or ~/.agents/skills/...)
+     #   Windows:     %USERPROFILE%\.copilot\skills\council-review
+     python3 "<skill-dir>/scripts/summarize_review_config.py" .   # use `python` on Windows
+     ```
+     If Python is unavailable, fall back to reading the config files above manually.
+3. **Discover repo guidance (read if present):**
+   - `.github/copilot-instructions.md`
+   - `CLAUDE.md`, `AGENTS.md`
+   - `docs/conventions.md`, `docs/project-rule.md`, `docs/source-control.md`
+   - `README.md` (skim for architecture/setup sections)
+4. **Detect technology profile:**
+   Scan build files and guidance docs for technology markers. Apply the
+   matching profile from `references/technology-profiles.md`. Only apply
+   rules that the **current repo actually uses**.
+   Key markers to scan for:
+   - Java/Gradle: `build.gradle`, `build.gradle.kts`, `settings.gradle`
+   - Maven: `pom.xml`
+   - Go: `go.mod`, `Makefile`
+   - Python: `pyproject.toml`, `requirements*.txt`
+   - Node: `package.json`
+5. **Build a PROJECT_CONTEXT block** from all discovered information. This
+   block will be injected into every reviewer's prompt so all three models
+   review against the same repo-specific rules.
+### Phase 1: Gather the Diff
+1. Determine the review scope based on user input:
+   - **Local changes (default):** If working tree has edits, use `git diff HEAD`
+     (includes staged + unstaged). If working tree is clean but branch has
+     commits, compare against the PR base: `git diff origin/main...HEAD`.
+     If `origin/main` is not available, inspect upstream and available remotes.
+   - **Staged only:** `git diff --cached`
+   - **Branch diff:** `git diff origin/main...HEAD`
+   - **Last N commits:** `git diff HEAD~N..HEAD` (ignores working tree entirely)
+     Example: `--commits 2` → `git diff HEAD~2..HEAD`
+   - **Commit range:** `git diff <sha1>..<sha2>` for explicit ranges
+   - **PR:** `gh pr diff <number>`
+2. Also gather context:
+   - `git diff --stat` for the file change summary
+   - The PROJECT_CONTEXT block built in Phase 0
+3. If the diff is empty, tell the user and stop.
+4. If the diff is very large (>5000 lines), warn the user and suggest narrowing scope.
+5. **Classify the change** (helps reviewers focus):
+   - API/controller, service/orchestration, repository/database, search engine,
+     Mongo query/indexing, cache, Pub/Sub/messaging, auth/security, feature flags,
+     docs, tests, build/dependency, deployment, or tooling.
+### Phase 2: Deploy the Council (Parallel Sub-Agents)
+Launch **exactly 3 `code-review` agents in parallel** using the `task` tool, each
+with a different `model` parameter. All three receive the **identical prompt** so
+their reviews are directly comparable.
+**CRITICAL: Launch all 3 in a single response — they run in parallel.**
+Each agent receives the prompt from `references/reviewer-prompt.md`, with the
+diff and project context injected.
+```
+Agent A: task(agent_type="code-review", model="claude-opus-4.8", ...)
+Agent B: task(agent_type="code-review", model="gpt-5.3-codex", ...)
+Agent C: task(agent_type="code-review", model="gpt-5.5", ...)
+```
+All three agents run in `mode="background"`. Wait for all three to complete
+before proceeding to Phase 3.
+### Phase 3: Collect & Parse Reviews
+Read all three agent results. Each agent returns findings in the structured
+format defined in `references/reviewer-prompt.md`. Extract:
+- File path and line range for each comment
+- Severity (P1/P2/P3)
+- Category (bug, security, performance, style, test-gap, design)
+- The finding description and suggested fix
+### Phase 4: Council Vote — Synthesize & Rank
+This is the core "council" step. Process the three reviews:
+#### 4a. Deduplicate
+Group comments that refer to the **same issue** (same file, overlapping lines,
+same root cause). Two comments are "the same issue" if they:
+- Point to the same file and overlapping line range, AND
+- Describe the same underlying problem (even in different words)
+#### 4b. Score by Agreement
+For each unique issue, count how many of the 3 reviewers flagged it:
+| Agreement | Label        | Weight |
+|-----------|--------------|--------|
+| 3/3       | 🟢 Unanimous | High   |
+| 2/3       | 🟡 Majority  | Medium |
+| 1/3       | 🔵 Solo      | Low    |
+#### 4c. Rank
+Sort the final list by:
+1. **Agreement** (unanimous > majority > solo)
+2. **Severity** (P1 > P2 > P3) within each agreement tier
+3. Within the same tier+severity, keep the most actionable/clear version of
+   the comment (pick the best phrasing from whichever model wrote it)
+#### 4d. Solo Comment Filter
+Solo comments (1/3) are **not discarded** but are presented separately under
+a "Minority Opinions" section. They may contain genuine catches the other
+models missed, or they may be noise. Let the user decide.
+### Phase 5: Present the Council Review
+Output the review using the format in `references/output-format.md`.
+## Hard Rules
+- **Identical prompts.** All three reviewers get exactly the same input.
+  Do not customize prompts per model — the whole point is fair comparison.
+- **No model bias.** Do not weight one model's opinion over another during
+  voting. Agreement count is the only ranking signal.
+- **Parallel launch.** Always launch all 3 agents in a single response.
+  Never run them sequentially.
+- **Transparency.** Always show which models agreed on each finding.
+- **No hallucinated code.** Do not generate suggested replacement code
+  yourself during synthesis. Use the reviewers' suggestions as-is.
+- **Severity consistency.** If reviewers disagree on severity for the same
+  issue, use the highest severity any reviewer assigned.
+- **Signal over noise.** The council exists to reduce noise. If a comment
+  is unclear or contradictory across reviewers, note the disagreement rather
+  than forcing consensus.
+## Configuration
+The user can customize the council by telling the agent:
+- Different models: "use Opus 4.5 instead of Opus 4.8"
+- Different number of reviewers: "use 5 models" (but default is 3)
+- Focus areas: "focus on security" or "focus on performance"
+- Strictness: "be strict" (lower the noise threshold) or "only critical" (P1 only)
+## Error Handling
+- If one agent fails, proceed with the remaining 2. Note the failure.
+- If two agents fail, fall back to a single-model review and explain.
+- If all three fail, tell the user and suggest running a simple code-review instead.
+## Phase 6 (Optional): Post-Review Verification
+After presenting the council review, **offer** to run verification. Do not
+run automatically — the user may just want the review.
+If the user accepts:
+1. **Run targeted tests** for changed files using the repo's test command
+   (discovered in Phase 0). Prefer the narrowest test scope first.
+2. **Run the PR build command** when feasible (from `build-pr.yaml`).
+3. **Run `git diff --check`** for whitespace issues.
+4. **Check PR template compliance** — if the repo has a PR template with
+   checkboxes, note which items are affected by the change.
+5. **Report CODEOWNERS** — if the repo has CODEOWNERS, note which owners
+   are relevant for the changed files.
+Append verification results to the review output.

package/teams/snpd/skills/council-review/references/output-format.md ADDED Viewed

@@ -0,0 +1,108 @@
+# Council Review Output Format
+Use this format when presenting the synthesized council review to the user.
+---
+## Header
+```
+# 🏛️ LLM Council Code Review
+**Scope:** <description of what was reviewed — branch, PR #, local changes>
+**Council:** Claude Opus 4.8 · GPT-5.3 Codex · GPT-5.5
+**Date:** <current date>
+**Verdict:** <PASS | PASS WITH COMMENTS | NEEDS CHANGES>
+```
+### Verdict Rules
+- **PASS** — No P1 or P2 issues found by any reviewer
+- **PASS WITH COMMENTS** — No P1 issues; some P2/P3 found
+- **NEEDS CHANGES** — At least one P1 issue found, OR 3+ P2 issues with majority agreement
+---
+## Consensus Findings (2/3 or 3/3 agreement)
+These are issues flagged by multiple models independently. High confidence.
+For each finding:
+```
+### <N>. <One-line summary>
+🟢 Unanimous (3/3) | 🟡 Majority (2/3)
+**Severity:** P1 | P2 | P3
+**Category:** <category>
+**File:** `<path>` (lines ~<range>)
+**Agreed by:** Opus 4.8 ✓ · Codex 5.3 ✓ · GPT-5.5 ✓
+<Best description from the reviewers. Pick the clearest, most actionable version.>
+**Suggested fix:**
+<Most concrete suggestion from any reviewer.>
+```
+---
+## Minority Opinions (1/3 — solo catches)
+These were flagged by only one model. They may be genuine catches the others
+missed, or false positives. Included for completeness.
+For each:
+```
+### <N>. <One-line summary>
+🔵 Solo — flagged by <model name> only
+**Severity:** P1 | P2 | P3
+**Category:** <category>
+**File:** `<path>` (lines ~<range>)
+<Description from the flagging model.>
+**Suggested fix:**
+<Suggestion if provided.>
+```
+---
+## Review Statistics
+```
+| Metric                    | Value |
+|---------------------------|-------|
+| Total unique issues       | <N>   |
+| Unanimous (3/3)           | <N>   |
+| Majority (2/3)            | <N>   |
+| Solo (1/3)                | <N>   |
+| P1 (Critical)             | <N>   |
+| P2 (Important)            | <N>   |
+| P3 (Minor)                | <N>   |
+| Files reviewed             | <N>   |
+| Lines changed              | +<N> / -<N> |
+```
+---
+## Model Agreement Matrix (optional, for large reviews)
+Show which model caught what. Only include for reviews with 5+ findings.
+```
+| # | Finding                        | Opus 4.8 | Codex 5.3 | GPT-5.5 |
+|---|--------------------------------|----------|-----------|---------|
+| 1 | Race condition in UserService   | ✓        | ✓         | ✓       |
+| 2 | Missing null check in parser    | ✓        | ✓         |         |
+| 3 | SQL injection in search filter  |          | ✓         | ✓       |
+| 4 | Unused import (solo)            | ✓        |           |         |
+```
+---
+## Footer
+```
+---
+*Review generated by LLM Council · 3 independent models · consensus-ranked*
+*Models may miss issues. This review supplements, not replaces, human judgment.*
+```

package/teams/snpd/skills/council-review/references/reviewer-prompt.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Reviewer Prompt Template
+You are one member of a 3-model code review council. Your job is to independently
+review the code changes below and produce high-signal findings. Another process
+will compare your review against two other models' reviews to find consensus.
+## Your Review Constraints
+- **Only flag things that genuinely matter.** Bugs, security issues, logic errors,
+  performance problems, missing error handling, test gaps for changed code.
+- **Never comment on style, formatting, naming preferences, or trivial matters**
+  unless they cause a real problem (e.g., a misleading variable name that could
+  cause a bug).
+- **Be specific.** Always include the file path, approximate line range, and a
+  concrete description of the problem.
+- **Suggest a fix** when possible. Don't just say "this is wrong" — say what to do.
+- **Don't be redundant.** If two issues share the same root cause, report it once.
+- **Respect repo-specific rules.** The project context below includes this repo's
+  own conventions, technology profile, and CI expectations. Review against THOSE
+  rules, not generic best practices. Do not assume patterns from other repos.
+## Project Context
+{PROJECT_CONTEXT}
+This context was discovered from the repo's own guidance files, CI workflows,
+build configuration, and technology markers. If the context mentions specific
+patterns (e.g., Micronaut DI, tenant isolation, cache key format), verify the
+diff follows them. If the context is silent on something, don't invent rules.
+## Technology Profile
+{TECHNOLOGY_PROFILE}
+## Change Classification
+{CHANGE_CLASSIFICATION}
+## Review Focus Areas (from repo's CI/review config)
+Review against these standard areas, but weight them based on the change
+classification above:
+1. **Code quality:** single responsibility, clarity, maintainability, unnecessary
+   complexity/nesting, redundant abstractions, local style conventions.
+2. **Security:** auth, authorization, tenant isolation, input validation, secrets,
+   sensitive data exposure.
+3. **Performance:** database/query shape, cache behavior, external calls,
+   async/blocking boundaries, memory/resource lifecycle.
+4. **Testing:** adequate coverage for changed code, edge cases, missing scenarios.
+5. **Documentation:** README/docs/OpenAPI/API docs accuracy when behavior changes.
+## Changed Files Summary
+{DIFF_STAT}
+## Full Diff
+{DIFF}
+## Output Format
+Return your findings as a structured list. Each finding must follow this exact format:
+```
+### Finding <N>
+- **File:** <path/to/file>
+- **Lines:** <start>-<end> (approximate)
+- **Severity:** P1 | P2 | P3
+- **Category:** bug | security | performance | design | test-gap | error-handling | concurrency | data-integrity
+- **Summary:** <one-line summary>
+- **Details:** <1-3 sentences explaining the issue>
+- **Suggestion:** <concrete fix or action>
+```
+### Severity Guide
+- **P1 — Critical:** Likely bug, security vulnerability, data loss, crash, race condition,
+  or production incident. Must fix before merge.
+- **P2 — Important:** Behavior regression, missing important test, incorrect error handling,
+  performance issue under realistic load. Should fix before merge.
+- **P3 — Minor:** Low-risk test gap, minor inefficiency, documentation inaccuracy.
+  Nice to fix but not blocking.
+### What NOT to Report
+- Style or formatting preferences
+- "Consider renaming X" suggestions
+- "Add a comment explaining Y" suggestions
+- Import ordering
+- Trailing whitespace
+- Suggestions that don't prevent a real problem
+If you find zero genuine issues, return:
+```
+### No Issues Found
+The changes look correct. No bugs, security issues, or significant concerns identified.
+```

package/teams/snpd/skills/council-review/references/technology-profiles.md ADDED Viewed

@@ -0,0 +1,54 @@
+# Technology Profiles
+Apply a profile **only** when discovered in the current repo's files.
+Do not carry assumptions from one repo to another.
+## Java / Micronaut
+- Check the Java version from workflow and Gradle config (do not assume 21 — some repos use 17).
+- Use Micronaut compile-time DI patterns; avoid Spring Boot assumptions unless the repo is Spring-based.
+- Prefer constructor injection and the Lombok annotations already used locally.
+- Follow the repo's `var` rule (many Rezolve Java repos use `var` for new variables created with `new`).
+- Check `@Singleton` vs `@Context` scope, `@Named` qualifiers, `@ExecuteOn` boundaries.
+- Verify blocking operations are not on the event loop thread.
+## JOOQ / Flyway / Database
+- Migration names and generated classes matter — check naming conventions.
+- Check tenant isolation on queries and mutation side effects (events, audit logs).
+- Verify transaction boundaries and connection management.
+## Search Services
+- Check strategy and engine selection order.
+- Ensure request builders, filters, refinements, biasing, pagination, and response builders preserve behavior.
+- For Google Retail: check proto conversion, request fields, fallback behavior.
+- For Mongo Atlas Search: check aggregation stages, index assumptions, field paths, unsupported Google-only features.
+- For Redis caches: check key composition, tenant/collection/area isolation, TTLs, skip-cache paths.
+## Mongo Data / Indexing
+- Check collection and tenant scoping.
+- Check aggregation pipeline correctness, projections, variant/inventory handling.
+- Check index definition generation, conditional indexing, feature flags.
+## Authentication / Security
+- Check token validation, claims, expiration, signature algorithms, public endpoint boundaries.
+- Check Redis key storage, key rotation, secret handling.
+- Check Pub/Sub credential update idempotency.
+## Go / Python / Node
+- Prefer repo-provided commands from Makefile, package files, workflow files, or docs.
+- Keep tests close to the changed package/module.
+- Check schema/serialization compatibility and environment variable handling.
+- Do not import Java/Micronaut assumptions into these repos.
+## General (all profiles)
+- **Code quality:** Single responsibility, clarity, maintainability, unnecessary complexity.
+- **Security:** Auth, authorization, tenant isolation, input validation, secrets.
+- **Performance:** Database/query shape, cache behavior, external calls, async/blocking.
+- **Testing:** Adequate coverage for changed code, edge cases, no superfluous tests.
+- **Documentation:** README/docs/OpenAPI accuracy when behavior changes.