pr-context-engine 0.1.0__tar.gz → 0.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/workflows/pr-review.yml +6 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CHANGELOG.md +14 -0
- pr_context_engine-0.1.2/PKG-INFO +261 -0
- pr_context_engine-0.1.2/README.md +231 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/action.yml +6 -0
- pr_context_engine-0.1.2/docs/architecture.md +125 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/docs/design-decisions.md +13 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/pyproject.toml +1 -1
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/generator.py +35 -10
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/prompt_templates.py +2 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/cli.py +3 -1
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_briefing_generator.py +55 -0
- pr_context_engine-0.1.0/PKG-INFO +0 -211
- pr_context_engine-0.1.0/README.md +0 -181
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.env.example +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/pull_request_template.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/workflows/release.yml +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.gitignore +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.python-version +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CODE_OF_CONDUCT.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CONFIG.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CONTRIBUTING.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/LICENSE +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/PROJECT.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/ast_walker.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/diff_parser.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/risk_scorer.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/config.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/codebase_index.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/git_history.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/confidence.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/fix_generator.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/github_api/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/github_api/comment_poster.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/anthropic_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/base.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/gemini_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/groq_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/ollama_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/01-simple-refactor.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/02-auth-middleware.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/03-db-migration.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/04-config-update.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/05-public-api-deleted.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/06-hardcoded-api-key.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/07-token-in-url.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/08-retry-no-limit.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/09-missing-null-check.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/10-trivial-docfix.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/11-multi-flag.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/12-new-endpoint.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/13-auth-bypass.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/14-env-file-update.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/15-dependency-update.json +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/rubric.md +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/test_briefings.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/__init__.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_anthropic_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_ast_walker.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_codebase_index.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_config.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_diff_parser.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_failover_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_fix_generator.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_gemini_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_git_history.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_groq_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_ollama_provider.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_risk_scorer.py +0 -0
- {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/uv.lock +0 -0
|
@@ -37,6 +37,12 @@ jobs:
|
|
|
37
37
|
- name: Install uv
|
|
38
38
|
run: pip install uv
|
|
39
39
|
|
|
40
|
+
- name: Restore embedding model cache
|
|
41
|
+
uses: actions/cache@v4
|
|
42
|
+
with:
|
|
43
|
+
path: ~/.cache/fastembed
|
|
44
|
+
key: fastembed-bge-small-en-v1.5
|
|
45
|
+
|
|
40
46
|
- name: Restore index cache
|
|
41
47
|
uses: actions/cache@v4
|
|
42
48
|
with:
|
|
@@ -6,6 +6,20 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Thi
|
|
|
6
6
|
|
|
7
7
|
## Unreleased
|
|
8
8
|
|
|
9
|
+
## 0.1.2 — 2026-05-23
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- **Briefing parser** — Section headers are now matched after stripping markdown decoration (`**`, `##`, `__`). Groq's llama-3.3-70b wraps headers in bold/heading markdown despite prompt instructions, causing all four sections to parse as empty. The parser now normalises headers before matching, and logs the raw LLM response when all sections fail to aid future debugging.
|
|
14
|
+
- **Prompt template** — Added explicit instruction prohibiting markdown decoration on section headers.
|
|
15
|
+
|
|
16
|
+
## 0.1.1 — 2026-05-20
|
|
17
|
+
|
|
18
|
+
### Fixed
|
|
19
|
+
|
|
20
|
+
- **RAG + history quality** — Restored full per-file RAG chunk retrieval and git history; only the file list shown in the prompt is capped at 20 to respect the token budget.
|
|
21
|
+
- **CI** — Cache fastembed embedding model in CI workflows to reduce cold-start time.
|
|
22
|
+
|
|
9
23
|
## 0.1.0 — 2026-05-17
|
|
10
24
|
|
|
11
25
|
### Added
|
|
@@ -0,0 +1,261 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pr-context-engine
|
|
3
|
+
Version: 0.1.2
|
|
4
|
+
Summary: An AI tool that reads every PR and posts a senior-engineer-style briefing.
|
|
5
|
+
Project-URL: Homepage, https://github.com/paramahastha/pr-context-engine
|
|
6
|
+
Project-URL: Repository, https://github.com/paramahastha/pr-context-engine
|
|
7
|
+
Project-URL: Issues, https://github.com/paramahastha/pr-context-engine/issues
|
|
8
|
+
Project-URL: Changelog, https://github.com/paramahastha/pr-context-engine/blob/main/CHANGELOG.md
|
|
9
|
+
Author-email: Kautsar <paramahastha@gmail.com>
|
|
10
|
+
License: MIT
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Keywords: ai,code-review,github,llm,pull-request
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Environment :: Console
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Topic :: Software Development :: Version Control :: Git
|
|
19
|
+
Requires-Python: >=3.12
|
|
20
|
+
Requires-Dist: anthropic>=0.40
|
|
21
|
+
Requires-Dist: fastembed>=0.4
|
|
22
|
+
Requires-Dist: google-genai>=1.0
|
|
23
|
+
Requires-Dist: groq>=0.13
|
|
24
|
+
Requires-Dist: pygithub>=2.4
|
|
25
|
+
Requires-Dist: python-dotenv>=1.0
|
|
26
|
+
Requires-Dist: requests>=2.32
|
|
27
|
+
Requires-Dist: sqlite-vec>=0.1
|
|
28
|
+
Requires-Dist: typer>=0.12
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
|
|
31
|
+
# PR Context Engine
|
|
32
|
+
|
|
33
|
+
[](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml)
|
|
34
|
+
[](https://pypi.org/project/pr-context-engine/)
|
|
35
|
+
[](LICENSE)
|
|
36
|
+
[](https://www.python.org/downloads/)
|
|
37
|
+
|
|
38
|
+
> An AI tool that reads every PR and writes the briefing — and the fixes — a senior engineer would, with the calibration data to prove it's not just guessing.
|
|
39
|
+
|
|
40
|
+
<!-- Demo GIF goes here once recorded: docs/demo.gif -->
|
|
41
|
+
<!--  -->
|
|
42
|
+
|
|
43
|
+
## What it does
|
|
44
|
+
|
|
45
|
+
Every PR opens with three problems for the reviewer: _what is this actually doing_, _what could it break_, and _what should I push back on_. A diff doesn't answer any of those.
|
|
46
|
+
|
|
47
|
+
PR Context Engine reads the diff plus surrounding code, recent git history, and semantically similar code from elsewhere in the repo, then posts a terse briefing written like a senior backend engineer would write it. No praise. No filler. No "this LGTM." Just the context a reviewer needs.
|
|
48
|
+
|
|
49
|
+
With `ENABLE_FIXES=true`, it also generates confidence-gated patch suggestions for located issues — posted as collapsible GitHub suggestion blocks the maintainer can apply in one click. When it isn't sure, it says so in prose instead of guessing.
|
|
50
|
+
|
|
51
|
+
## Quickstart (5 minutes)
|
|
52
|
+
|
|
53
|
+
### Check your setup first
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pipx install pr-context-engine
|
|
57
|
+
export GROQ_API_KEY=<your-key> # free at console.groq.com/keys
|
|
58
|
+
export GITHUB_TOKEN=$(gh auth token)
|
|
59
|
+
pr-context-engine quickstart # checks keys, scopes, prints what's missing
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### Option A — GitHub Action (recommended)
|
|
63
|
+
|
|
64
|
+
1. Get a free [Groq API key](https://console.groq.com/keys) — no credit card.
|
|
65
|
+
2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
|
|
66
|
+
3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
|
|
67
|
+
4. Add this to `.github/workflows/pr-briefing.yml`:
|
|
68
|
+
|
|
69
|
+
```yaml
|
|
70
|
+
name: PR Briefing
|
|
71
|
+
on:
|
|
72
|
+
pull_request:
|
|
73
|
+
types: [opened, synchronize, reopened]
|
|
74
|
+
jobs:
|
|
75
|
+
brief:
|
|
76
|
+
runs-on: ubuntu-latest
|
|
77
|
+
permissions:
|
|
78
|
+
pull-requests: write
|
|
79
|
+
contents: read
|
|
80
|
+
steps:
|
|
81
|
+
- uses: paramahastha/pr-context-engine@main
|
|
82
|
+
with:
|
|
83
|
+
groq-api-key: ${{ secrets.GROQ_API_KEY }}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
That's it. Every new PR gets a briefing comment automatically.
|
|
87
|
+
|
|
88
|
+
### Option B — CLI (any CI or local)
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
pipx install pr-context-engine
|
|
92
|
+
export GROQ_API_KEY=<your-groq-key>
|
|
93
|
+
export GITHUB_TOKEN=$(gh auth token)
|
|
94
|
+
|
|
95
|
+
# Dry-run: see the briefing without posting it
|
|
96
|
+
pr-context-engine review --pr 42 --repo owner/name --dry-run
|
|
97
|
+
|
|
98
|
+
# Post the real comment
|
|
99
|
+
pr-context-engine review --pr 42 --repo owner/name
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Live example
|
|
103
|
+
|
|
104
|
+
A PR touching auth middleware produces this comment automatically:
|
|
105
|
+
|
|
106
|
+
```markdown
|
|
107
|
+
## 🤖 PR Briefing
|
|
108
|
+
|
|
109
|
+
**What changed**
|
|
110
|
+
Refactors session token storage from an in-memory dict to Redis, adding a configurable
|
|
111
|
+
TTL. The auth middleware is updated to hit Redis on every request.
|
|
112
|
+
|
|
113
|
+
**Blast radius**
|
|
114
|
+
Any caller of `get_session()` now depends on Redis being reachable. If Redis is down,
|
|
115
|
+
all authenticated requests will 401. The previous in-memory store had no such single
|
|
116
|
+
point of failure.
|
|
117
|
+
|
|
118
|
+
**Risk flags**
|
|
119
|
+
- `modifies_auth`: src/auth/session.py line 42 — `token = generate_token(user_id)`
|
|
120
|
+
|
|
121
|
+
**Questions for the reviewer**
|
|
122
|
+
|
|
123
|
+
1. The Redis client is initialised once at import time — is there a reconnect strategy
|
|
124
|
+
if the connection drops mid-deploy?
|
|
125
|
+
2. `SESSION_TTL` defaults to 3600 but the old in-memory store had no TTL — will existing
|
|
126
|
+
sessions all expire immediately after deploy?
|
|
127
|
+
3. There are no tests for the Redis-down path — is 401-on-outage the intended degradation,
|
|
128
|
+
or should it fall back to the old store?
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
<sub>Generated by [PR Context Engine](https://github.com/paramahastha/pr-context-engine) via groq. Not a substitute for human review.</sub>
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Architecture
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
Front door A: Front door B:
|
|
139
|
+
GitHub Action wrapper pipx install + run in any CI / locally
|
|
140
|
+
(paramahastha/pr-context-engine@main) (pr-context-engine review --pr 42 --repo …)
|
|
141
|
+
│ │
|
|
142
|
+
└──────────────┬───────────────────────┘
|
|
143
|
+
▼
|
|
144
|
+
┌──────────────────────────────────────┐
|
|
145
|
+
│ CLI core — src/cli.py │
|
|
146
|
+
│ orchestrate: diff → analyze → │
|
|
147
|
+
│ brief → (fixes) → post │
|
|
148
|
+
└──────────────────────────────────────┘
|
|
149
|
+
│
|
|
150
|
+
├──► analyzers/ diff → FileChange objects, AST symbols, risk flags
|
|
151
|
+
├──► context/ git history + sqlite-vec RAG (fastembed, local)
|
|
152
|
+
├──► briefing/ prompt assembly → LLM call → structured Briefing
|
|
153
|
+
├──► fixes/ confidence-gated patch suggestions (opt-in)
|
|
154
|
+
├──► llm/ FailoverProvider: Groq → Gemini → hard error
|
|
155
|
+
└──► github_api/ fetch diff, post comment + suggestion blocks
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
The CLI is the product; the GitHub Action is a thin wrapper. All logic lives in Python — no YAML logic.
|
|
159
|
+
|
|
160
|
+
See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram and data-flow walkthrough.
|
|
161
|
+
|
|
162
|
+
## Switching LLM providers
|
|
163
|
+
|
|
164
|
+
Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
|
|
165
|
+
|
|
166
|
+
| Provider | Key env var | Notes |
|
|
167
|
+
|---|---|---|
|
|
168
|
+
| `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
|
|
169
|
+
| `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
|
|
170
|
+
| `ollama` | — | Local, offline, no rate limits |
|
|
171
|
+
| `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
|
|
172
|
+
|
|
173
|
+
**Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
|
|
174
|
+
|
|
175
|
+
## Fix suggestions (opt-in)
|
|
176
|
+
|
|
177
|
+
When `ENABLE_FIXES=true`, the tool generates confidence-gated patch suggestions for located issues (flags with a known file + line). Only `high`/`medium` confidence suggestions become one-click GitHub suggestion blocks; `low` confidence produces prose notes only. Max 3 suggestions per PR.
|
|
178
|
+
|
|
179
|
+
```yaml
|
|
180
|
+
- uses: paramahastha/pr-context-engine@main
|
|
181
|
+
with:
|
|
182
|
+
groq-api-key: ${{ secrets.GROQ_API_KEY }}
|
|
183
|
+
enable-fixes: "true"
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
See [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) for why this is opt-in and confidence-gated.
|
|
187
|
+
|
|
188
|
+
## Eval results
|
|
189
|
+
|
|
190
|
+
`pytest tests/eval/` measures briefing quality across 15 real-world PR fixtures.
|
|
191
|
+
|
|
192
|
+
**Static analysis (no API key needed):**
|
|
193
|
+
|
|
194
|
+
| Metric | Score |
|
|
195
|
+
|---|---|
|
|
196
|
+
| Risk flag precision | **1.00** (0 false positives across 15 fixtures) |
|
|
197
|
+
| Risk flag recall | **1.00** (all expected flags detected) |
|
|
198
|
+
|
|
199
|
+
**LLM-as-judge scores** (run with `GROQ_API_KEY` + `ANTHROPIC_API_KEY`) assess five dimensions — Accuracy, Blast radius, Risk flags, Question quality, Brevity — on a 0–3 scale, plus Fix correctness and Calibration rate for the fix feature. Historical scores are committed to `tests/eval/scores.jsonl` so regressions are visible in git history.
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
# Analyzer-only (no API key needed):
|
|
203
|
+
pytest tests/eval/ -v
|
|
204
|
+
|
|
205
|
+
# Full eval with LLM-as-judge scoring:
|
|
206
|
+
GROQ_API_KEY=... ANTHROPIC_API_KEY=... pytest tests/eval/ -v -s
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
The headline metrics are **fix correctness rate** (when the bot proposed a patch, was it actually correct?) and **false-confidence rate** (when it said `high` confidence, how often was the patch wrong?). These are the hardest-to-fake numbers in the scorecard.
|
|
210
|
+
|
|
211
|
+
## Data & privacy
|
|
212
|
+
|
|
213
|
+
**What leaves your machine:**
|
|
214
|
+
|
|
215
|
+
- The PR diff and parsed metadata (file paths, function names, changed lines) are sent to the active LLM provider (Groq or Gemini by default).
|
|
216
|
+
- No source code beyond the diff is sent to any external API. The codebase index (RAG) runs entirely locally via `fastembed` + `sqlite-vec` — no embedding API, no external call.
|
|
217
|
+
- Git history and PR metadata are fetched from the GitHub API using your `GITHUB_TOKEN`.
|
|
218
|
+
|
|
219
|
+
**Provider data policies:**
|
|
220
|
+
|
|
221
|
+
- Groq and Gemini free tiers may use inputs for model improvement. Check their privacy policies before using on private or sensitive repos.
|
|
222
|
+
- Use `LLM_PROVIDER=ollama` or `LLM_PROVIDER=anthropic` (BYO key) if you need stronger data-isolation guarantees.
|
|
223
|
+
- The tool has no shared backend. Your API key, your quota, your data. Running it on 1 000 repos costs you nothing extra and costs me nothing.
|
|
224
|
+
|
|
225
|
+
## Design decisions
|
|
226
|
+
|
|
227
|
+
Short ADRs covering the tradeoffs that shaped the architecture:
|
|
228
|
+
|
|
229
|
+
| ADR | Decision |
|
|
230
|
+
|---|---|
|
|
231
|
+
| [ADR-0](docs/design-decisions.md#adr-0-provider-abstraction-built-early) | Provider abstraction built in M2, not retrofitted later |
|
|
232
|
+
| [ADR-1](docs/design-decisions.md#adr-1-cli-core-with-two-front-doors) | CLI-core with two front doors (Action + pipx) |
|
|
233
|
+
| [ADR-2](docs/design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) | SQLite + sqlite-vec over Pinecone or Chroma |
|
|
234
|
+
| [ADR-3](docs/design-decisions.md#adr-3-local-embeddings-via-fastembed) | Local embeddings via fastembed (no embedding API) |
|
|
235
|
+
| [ADR-4](docs/design-decisions.md#adr-4-shallow-clone-tradeoff-in-ci-fetch-depth-50) | fetch-depth: 50 tradeoff in CI |
|
|
236
|
+
| [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) | Fix suggestions opt-in and confidence-gated |
|
|
237
|
+
| [ADR-6](docs/design-decisions.md#adr-6-mit-license) | MIT license |
|
|
238
|
+
| [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) | Failover order: Groq → Gemini → hard error |
|
|
239
|
+
| [ADR-8](docs/design-decisions.md#adr-8-python-312-as-the-implementation-language) | Python 3.12 over Go/TypeScript/Rust |
|
|
240
|
+
|
|
241
|
+
## Cost
|
|
242
|
+
|
|
243
|
+
**$0/month** for a portfolio-scale project on public repos.
|
|
244
|
+
|
|
245
|
+
| Component | Cost |
|
|
246
|
+
|---|---|
|
|
247
|
+
| GitHub Actions | Free for public repos |
|
|
248
|
+
| Groq (default LLM) | Free tier, ~1 000 req/day |
|
|
249
|
+
| Gemini (failover) | Free tier, ~1 500 req/day |
|
|
250
|
+
| Local embeddings (`fastembed`) | $0, no API, runs in-process |
|
|
251
|
+
| Shared backend | None — your key, your quota |
|
|
252
|
+
|
|
253
|
+
Free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). The [failover design](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) means a single provider's policy change degrades gracefully instead of breaking the tool.
|
|
254
|
+
|
|
255
|
+
## Configuration
|
|
256
|
+
|
|
257
|
+
See [CONFIG.md](CONFIG.md) for every env var, flag, default, and a minimal vs. full example.
|
|
258
|
+
|
|
259
|
+
## Contributing
|
|
260
|
+
|
|
261
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, running tests, and the milestone philosophy. Bug reports and feature requests go in [Issues](https://github.com/paramahastha/pr-context-engine/issues).
|
|
@@ -0,0 +1,231 @@
|
|
|
1
|
+
# PR Context Engine
|
|
2
|
+
|
|
3
|
+
[](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml)
|
|
4
|
+
[](https://pypi.org/project/pr-context-engine/)
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](https://www.python.org/downloads/)
|
|
7
|
+
|
|
8
|
+
> An AI tool that reads every PR and writes the briefing — and the fixes — a senior engineer would, with the calibration data to prove it's not just guessing.
|
|
9
|
+
|
|
10
|
+
<!-- Demo GIF goes here once recorded: docs/demo.gif -->
|
|
11
|
+
<!--  -->
|
|
12
|
+
|
|
13
|
+
## What it does
|
|
14
|
+
|
|
15
|
+
Every PR opens with three problems for the reviewer: _what is this actually doing_, _what could it break_, and _what should I push back on_. A diff doesn't answer any of those.
|
|
16
|
+
|
|
17
|
+
PR Context Engine reads the diff plus surrounding code, recent git history, and semantically similar code from elsewhere in the repo, then posts a terse briefing written like a senior backend engineer would write it. No praise. No filler. No "this LGTM." Just the context a reviewer needs.
|
|
18
|
+
|
|
19
|
+
With `ENABLE_FIXES=true`, it also generates confidence-gated patch suggestions for located issues — posted as collapsible GitHub suggestion blocks the maintainer can apply in one click. When it isn't sure, it says so in prose instead of guessing.
|
|
20
|
+
|
|
21
|
+
## Quickstart (5 minutes)
|
|
22
|
+
|
|
23
|
+
### Check your setup first
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
pipx install pr-context-engine
|
|
27
|
+
export GROQ_API_KEY=<your-key> # free at console.groq.com/keys
|
|
28
|
+
export GITHUB_TOKEN=$(gh auth token)
|
|
29
|
+
pr-context-engine quickstart # checks keys, scopes, prints what's missing
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### Option A — GitHub Action (recommended)
|
|
33
|
+
|
|
34
|
+
1. Get a free [Groq API key](https://console.groq.com/keys) — no credit card.
|
|
35
|
+
2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
|
|
36
|
+
3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
|
|
37
|
+
4. Add this to `.github/workflows/pr-briefing.yml`:
|
|
38
|
+
|
|
39
|
+
```yaml
|
|
40
|
+
name: PR Briefing
|
|
41
|
+
on:
|
|
42
|
+
pull_request:
|
|
43
|
+
types: [opened, synchronize, reopened]
|
|
44
|
+
jobs:
|
|
45
|
+
brief:
|
|
46
|
+
runs-on: ubuntu-latest
|
|
47
|
+
permissions:
|
|
48
|
+
pull-requests: write
|
|
49
|
+
contents: read
|
|
50
|
+
steps:
|
|
51
|
+
- uses: paramahastha/pr-context-engine@main
|
|
52
|
+
with:
|
|
53
|
+
groq-api-key: ${{ secrets.GROQ_API_KEY }}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
That's it. Every new PR gets a briefing comment automatically.
|
|
57
|
+
|
|
58
|
+
### Option B — CLI (any CI or local)
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pipx install pr-context-engine
|
|
62
|
+
export GROQ_API_KEY=<your-groq-key>
|
|
63
|
+
export GITHUB_TOKEN=$(gh auth token)
|
|
64
|
+
|
|
65
|
+
# Dry-run: see the briefing without posting it
|
|
66
|
+
pr-context-engine review --pr 42 --repo owner/name --dry-run
|
|
67
|
+
|
|
68
|
+
# Post the real comment
|
|
69
|
+
pr-context-engine review --pr 42 --repo owner/name
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Live example
|
|
73
|
+
|
|
74
|
+
A PR touching auth middleware produces this comment automatically:
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
## 🤖 PR Briefing
|
|
78
|
+
|
|
79
|
+
**What changed**
|
|
80
|
+
Refactors session token storage from an in-memory dict to Redis, adding a configurable
|
|
81
|
+
TTL. The auth middleware is updated to hit Redis on every request.
|
|
82
|
+
|
|
83
|
+
**Blast radius**
|
|
84
|
+
Any caller of `get_session()` now depends on Redis being reachable. If Redis is down,
|
|
85
|
+
all authenticated requests will 401. The previous in-memory store had no such single
|
|
86
|
+
point of failure.
|
|
87
|
+
|
|
88
|
+
**Risk flags**
|
|
89
|
+
- `modifies_auth`: src/auth/session.py line 42 — `token = generate_token(user_id)`
|
|
90
|
+
|
|
91
|
+
**Questions for the reviewer**
|
|
92
|
+
|
|
93
|
+
1. The Redis client is initialised once at import time — is there a reconnect strategy
|
|
94
|
+
if the connection drops mid-deploy?
|
|
95
|
+
2. `SESSION_TTL` defaults to 3600 but the old in-memory store had no TTL — will existing
|
|
96
|
+
sessions all expire immediately after deploy?
|
|
97
|
+
3. There are no tests for the Redis-down path — is 401-on-outage the intended degradation,
|
|
98
|
+
or should it fall back to the old store?
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
<sub>Generated by [PR Context Engine](https://github.com/paramahastha/pr-context-engine) via groq. Not a substitute for human review.</sub>
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Architecture
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
Front door A: Front door B:
|
|
109
|
+
GitHub Action wrapper pipx install + run in any CI / locally
|
|
110
|
+
(paramahastha/pr-context-engine@main) (pr-context-engine review --pr 42 --repo …)
|
|
111
|
+
│ │
|
|
112
|
+
└──────────────┬───────────────────────┘
|
|
113
|
+
▼
|
|
114
|
+
┌──────────────────────────────────────┐
|
|
115
|
+
│ CLI core — src/cli.py │
|
|
116
|
+
│ orchestrate: diff → analyze → │
|
|
117
|
+
│ brief → (fixes) → post │
|
|
118
|
+
└──────────────────────────────────────┘
|
|
119
|
+
│
|
|
120
|
+
├──► analyzers/ diff → FileChange objects, AST symbols, risk flags
|
|
121
|
+
├──► context/ git history + sqlite-vec RAG (fastembed, local)
|
|
122
|
+
├──► briefing/ prompt assembly → LLM call → structured Briefing
|
|
123
|
+
├──► fixes/ confidence-gated patch suggestions (opt-in)
|
|
124
|
+
├──► llm/ FailoverProvider: Groq → Gemini → hard error
|
|
125
|
+
└──► github_api/ fetch diff, post comment + suggestion blocks
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
The CLI is the product; the GitHub Action is a thin wrapper. All logic lives in Python — no YAML logic.
|
|
129
|
+
|
|
130
|
+
See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram and data-flow walkthrough.
|
|
131
|
+
|
|
132
|
+
## Switching LLM providers
|
|
133
|
+
|
|
134
|
+
Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
|
|
135
|
+
|
|
136
|
+
| Provider | Key env var | Notes |
|
|
137
|
+
|---|---|---|
|
|
138
|
+
| `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
|
|
139
|
+
| `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
|
|
140
|
+
| `ollama` | — | Local, offline, no rate limits |
|
|
141
|
+
| `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
|
|
142
|
+
|
|
143
|
+
**Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
|
|
144
|
+
|
|
145
|
+
## Fix suggestions (opt-in)
|
|
146
|
+
|
|
147
|
+
When `ENABLE_FIXES=true`, the tool generates confidence-gated patch suggestions for located issues (flags with a known file + line). Only `high`/`medium` confidence suggestions become one-click GitHub suggestion blocks; `low` confidence produces prose notes only. Max 3 suggestions per PR.
|
|
148
|
+
|
|
149
|
+
```yaml
|
|
150
|
+
- uses: paramahastha/pr-context-engine@main
|
|
151
|
+
with:
|
|
152
|
+
groq-api-key: ${{ secrets.GROQ_API_KEY }}
|
|
153
|
+
enable-fixes: "true"
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
See [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) for why this is opt-in and confidence-gated.
|
|
157
|
+
|
|
158
|
+
## Eval results
|
|
159
|
+
|
|
160
|
+
`pytest tests/eval/` measures briefing quality across 15 real-world PR fixtures.
|
|
161
|
+
|
|
162
|
+
**Static analysis (no API key needed):**
|
|
163
|
+
|
|
164
|
+
| Metric | Score |
|
|
165
|
+
|---|---|
|
|
166
|
+
| Risk flag precision | **1.00** (0 false positives across 15 fixtures) |
|
|
167
|
+
| Risk flag recall | **1.00** (all expected flags detected) |
|
|
168
|
+
|
|
169
|
+
**LLM-as-judge scores** (run with `GROQ_API_KEY` + `ANTHROPIC_API_KEY`) assess five dimensions — Accuracy, Blast radius, Risk flags, Question quality, Brevity — on a 0–3 scale, plus Fix correctness and Calibration rate for the fix feature. Historical scores are committed to `tests/eval/scores.jsonl` so regressions are visible in git history.
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
# Analyzer-only (no API key needed):
|
|
173
|
+
pytest tests/eval/ -v
|
|
174
|
+
|
|
175
|
+
# Full eval with LLM-as-judge scoring:
|
|
176
|
+
GROQ_API_KEY=... ANTHROPIC_API_KEY=... pytest tests/eval/ -v -s
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
The headline metrics are **fix correctness rate** (when the bot proposed a patch, was it actually correct?) and **false-confidence rate** (when it said `high` confidence, how often was the patch wrong?). These are the hardest-to-fake numbers in the scorecard.
|
|
180
|
+
|
|
181
|
+
## Data & privacy
|
|
182
|
+
|
|
183
|
+
**What leaves your machine:**
|
|
184
|
+
|
|
185
|
+
- The PR diff and parsed metadata (file paths, function names, changed lines) are sent to the active LLM provider (Groq or Gemini by default).
|
|
186
|
+
- No source code beyond the diff is sent to any external API. The codebase index (RAG) runs entirely locally via `fastembed` + `sqlite-vec` — no embedding API, no external call.
|
|
187
|
+
- Git history and PR metadata are fetched from the GitHub API using your `GITHUB_TOKEN`.
|
|
188
|
+
|
|
189
|
+
**Provider data policies:**
|
|
190
|
+
|
|
191
|
+
- Groq and Gemini free tiers may use inputs for model improvement. Check their privacy policies before using on private or sensitive repos.
|
|
192
|
+
- Use `LLM_PROVIDER=ollama` or `LLM_PROVIDER=anthropic` (BYO key) if you need stronger data-isolation guarantees.
|
|
193
|
+
- The tool has no shared backend. Your API key, your quota, your data. Running it on 1 000 repos costs you nothing extra and costs me nothing.
|
|
194
|
+
|
|
195
|
+
## Design decisions
|
|
196
|
+
|
|
197
|
+
Short ADRs covering the tradeoffs that shaped the architecture:
|
|
198
|
+
|
|
199
|
+
| ADR | Decision |
|
|
200
|
+
|---|---|
|
|
201
|
+
| [ADR-0](docs/design-decisions.md#adr-0-provider-abstraction-built-early) | Provider abstraction built in M2, not retrofitted later |
|
|
202
|
+
| [ADR-1](docs/design-decisions.md#adr-1-cli-core-with-two-front-doors) | CLI-core with two front doors (Action + pipx) |
|
|
203
|
+
| [ADR-2](docs/design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) | SQLite + sqlite-vec over Pinecone or Chroma |
|
|
204
|
+
| [ADR-3](docs/design-decisions.md#adr-3-local-embeddings-via-fastembed) | Local embeddings via fastembed (no embedding API) |
|
|
205
|
+
| [ADR-4](docs/design-decisions.md#adr-4-shallow-clone-tradeoff-in-ci-fetch-depth-50) | fetch-depth: 50 tradeoff in CI |
|
|
206
|
+
| [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) | Fix suggestions opt-in and confidence-gated |
|
|
207
|
+
| [ADR-6](docs/design-decisions.md#adr-6-mit-license) | MIT license |
|
|
208
|
+
| [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) | Failover order: Groq → Gemini → hard error |
|
|
209
|
+
| [ADR-8](docs/design-decisions.md#adr-8-python-312-as-the-implementation-language) | Python 3.12 over Go/TypeScript/Rust |
|
|
210
|
+
|
|
211
|
+
## Cost
|
|
212
|
+
|
|
213
|
+
**$0/month** for a portfolio-scale project on public repos.
|
|
214
|
+
|
|
215
|
+
| Component | Cost |
|
|
216
|
+
|---|---|
|
|
217
|
+
| GitHub Actions | Free for public repos |
|
|
218
|
+
| Groq (default LLM) | Free tier, ~1 000 req/day |
|
|
219
|
+
| Gemini (failover) | Free tier, ~1 500 req/day |
|
|
220
|
+
| Local embeddings (`fastembed`) | $0, no API, runs in-process |
|
|
221
|
+
| Shared backend | None — your key, your quota |
|
|
222
|
+
|
|
223
|
+
Free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). The [failover design](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) means a single provider's policy change degrades gracefully instead of breaking the tool.
|
|
224
|
+
|
|
225
|
+
## Configuration
|
|
226
|
+
|
|
227
|
+
See [CONFIG.md](CONFIG.md) for every env var, flag, default, and a minimal vs. full example.
|
|
228
|
+
|
|
229
|
+
## Contributing
|
|
230
|
+
|
|
231
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, running tests, and the milestone philosophy. Bug reports and feature requests go in [Issues](https://github.com/paramahastha/pr-context-engine/issues).
|
|
@@ -33,6 +33,12 @@ runs:
|
|
|
33
33
|
with:
|
|
34
34
|
python-version: "3.12"
|
|
35
35
|
|
|
36
|
+
- name: Restore embedding model cache
|
|
37
|
+
uses: actions/cache@v4
|
|
38
|
+
with:
|
|
39
|
+
path: ~/.cache/fastembed
|
|
40
|
+
key: fastembed-bge-small-en-v1.5
|
|
41
|
+
|
|
36
42
|
- name: Restore index cache
|
|
37
43
|
uses: actions/cache@v4
|
|
38
44
|
with:
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
PR Context Engine follows a **CLI-core with two front doors** design. The CLI (`src/cli.py`) is the product; the GitHub Action is a thin wrapper around it. No orchestration logic lives in YAML.
|
|
6
|
+
|
|
7
|
+
This means the tool is:
|
|
8
|
+
- Testable locally (`pr-context-engine review --pr 42 --repo owner/name --dry-run`)
|
|
9
|
+
- Runnable in any CI (GitLab, CircleCI, Jenkins) with no GitHub lock-in
|
|
10
|
+
- Installable as a standalone CLI (`pipx install pr-context-engine`)
|
|
11
|
+
- The GitHub Action just calls the same binary any user would call
|
|
12
|
+
|
|
13
|
+
## System diagram
|
|
14
|
+
|
|
15
|
+
```mermaid
|
|
16
|
+
flowchart TD
|
|
17
|
+
A["Front door A\nGitHub Action wrapper\n(paramahastha/pr-context-engine@main)"]
|
|
18
|
+
B["Front door B\npipx install + run in any CI / locally\n(pr-context-engine review --pr 42 --repo …)"]
|
|
19
|
+
|
|
20
|
+
A & B --> CLI
|
|
21
|
+
|
|
22
|
+
subgraph CLI["CLI core — src/cli.py"]
|
|
23
|
+
direction TB
|
|
24
|
+
Orchestrator["orchestrate: fetch diff → analyze → brief → post"]
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
CLI --> Analyzers
|
|
28
|
+
CLI --> Context
|
|
29
|
+
CLI --> Briefing
|
|
30
|
+
CLI --> Fixes
|
|
31
|
+
CLI --> LLM
|
|
32
|
+
CLI --> GitHub
|
|
33
|
+
|
|
34
|
+
subgraph Analyzers["src/analyzers/"]
|
|
35
|
+
DP["diff_parser.py\nUnified diff → FileChange objects\n(path, language, hunks, added/removed lines)"]
|
|
36
|
+
AW["ast_walker.py\nAST symbol extraction\n(changed function/class names)"]
|
|
37
|
+
RS["risk_scorer.py\nHeuristic flag detection\n(auth, migration, config, deleted APIs)"]
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
subgraph Context["src/context/"]
|
|
41
|
+
GH["git_history.py\nLast 5 commits per touched file\nLast 3 merged PRs on same files"]
|
|
42
|
+
CI["codebase_index.py\nsqlite-vec + fastembed RAG\nTop-5 semantically similar chunks"]
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
subgraph Briefing["src/briefing/"]
|
|
46
|
+
PT["prompt_templates.py\nSenior-engineer system prompt"]
|
|
47
|
+
BG["generator.py\nPrompt assembly + LLM call\n→ structured Briefing object"]
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
subgraph Fixes["src/fixes/"]
|
|
51
|
+
FG["fix_generator.py\nLocated-issue → patch + rationale\n+ self-assessed confidence"]
|
|
52
|
+
CF["confidence.py\nGate: high/medium → suggestion block\nlow → prose note only\nmax 3 per PR"]
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
subgraph LLM["src/llm/"]
|
|
56
|
+
FP["FailoverProvider\nGroq → Gemini → hard error"]
|
|
57
|
+
GP["groq_provider.py"]
|
|
58
|
+
GMP["gemini_provider.py"]
|
|
59
|
+
OP["ollama_provider.py"]
|
|
60
|
+
AP["anthropic_provider.py"]
|
|
61
|
+
FP --> GP & GMP & OP & AP
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
subgraph GitHub["src/github_api/"]
|
|
65
|
+
CP["comment_poster.py\nFetch diff, post briefing comment\nPost line-anchored suggestion blocks\nin collapsed details sections"]
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
LLM --> Briefing
|
|
69
|
+
LLM --> Fixes
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Data flow for a single PR
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
1. CLI receives --pr N --repo owner/name
|
|
76
|
+
2. github_api fetches the unified diff via REST
|
|
77
|
+
3. diff_parser converts raw diff → list[FileChange]
|
|
78
|
+
4. ast_walker extracts changed symbol names per file
|
|
79
|
+
5. risk_scorer emits located-issue objects {flag, file, line, snippet}
|
|
80
|
+
6. git_history fetches last 5 commits per touched file + last 3 merged PRs
|
|
81
|
+
7. codebase_index queries sqlite-vec for top-5 related chunks per FileChange
|
|
82
|
+
8. briefing/generator assembles all context into a structured prompt
|
|
83
|
+
9. llm/FailoverProvider calls Groq (falls back to Gemini on 429)
|
|
84
|
+
10. Generator parses the LLM response into a Briefing object
|
|
85
|
+
11. If ENABLE_FIXES=true: fix_generator generates patches for located issues
|
|
86
|
+
confidence.py gates: high/medium → suggestion block, low → prose
|
|
87
|
+
12. github_api posts the comment (briefing + collapsed suggestion blocks)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Module responsibilities
|
|
91
|
+
|
|
92
|
+
| Module | Single responsibility |
|
|
93
|
+
|---|---|
|
|
94
|
+
| `src/cli.py` | Typer entrypoint; orchestrates the pipeline; no business logic |
|
|
95
|
+
| `src/config.py` | Reads env vars, instantiates the right LLM provider |
|
|
96
|
+
| `src/analyzers/diff_parser.py` | Unified diff → `FileChange` data objects |
|
|
97
|
+
| `src/analyzers/ast_walker.py` | AST symbol extraction for Python/JS/TS/Go |
|
|
98
|
+
| `src/analyzers/risk_scorer.py` | Heuristic flags → located-issue objects |
|
|
99
|
+
| `src/context/git_history.py` | Commit history and merged-PR context per file |
|
|
100
|
+
| `src/context/codebase_index.py` | sqlite-vec RAG index; embedding via fastembed |
|
|
101
|
+
| `src/briefing/prompt_templates.py` | System prompt text (verbatim; no logic) |
|
|
102
|
+
| `src/briefing/generator.py` | Prompt assembly + LLM call + response parsing |
|
|
103
|
+
| `src/fixes/fix_generator.py` | Located issue → structured patch + confidence |
|
|
104
|
+
| `src/fixes/confidence.py` | Gate logic: which confidence levels produce suggestion blocks |
|
|
105
|
+
| `src/llm/base.py` | `LLMProvider` abstract interface |
|
|
106
|
+
| `src/llm/groq_provider.py` | Groq implementation |
|
|
107
|
+
| `src/llm/gemini_provider.py` | Gemini implementation |
|
|
108
|
+
| `src/llm/ollama_provider.py` | Local Ollama implementation |
|
|
109
|
+
| `src/llm/anthropic_provider.py` | Anthropic implementation |
|
|
110
|
+
| `src/llm/__init__.py` | `FailoverProvider` wrapping ordered provider list |
|
|
111
|
+
| `src/github_api/comment_poster.py` | Diff fetch + PR comment posting + suggestion blocks |
|
|
112
|
+
|
|
113
|
+
## Key design decisions
|
|
114
|
+
|
|
115
|
+
The five decisions that shaped everything else — with the reasoning that would survive a six-month gap:
|
|
116
|
+
|
|
117
|
+
1. **CLI-core over Action-only** — makes the tool testable locally and portable across CI systems. See [ADR-1](design-decisions.md#adr-1-cli-core-with-two-front-doors).
|
|
118
|
+
|
|
119
|
+
2. **Provider abstraction built in M2, not last** — free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). Retrofitting abstraction later would have required touching every caller. See [ADR-0](design-decisions.md#adr-0-provider-abstraction-built-early).
|
|
120
|
+
|
|
121
|
+
3. **sqlite-vec + fastembed over a hosted vector store** — $0/month, no external service, no second API key, no latency for network round-trips. The index file is cached across Action runs. See [ADR-2](design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) and [ADR-3](design-decisions.md#adr-3-local-embeddings-via-fastembed).
|
|
122
|
+
|
|
123
|
+
4. **Located-issue data shape in M3** — `risk_scorer` emits `{flag, file, line, snippet}` objects from the start. M8's fix generator depends on `file` and `line`; a bare string list would have forced a painful refactor eight milestones later.
|
|
124
|
+
|
|
125
|
+
5. **Fix suggestions opt-in and confidence-gated** — a confidently-wrong auto-fix erodes trust faster than no fix. The calibration metric in the eval harness is the accountability mechanism. See [ADR-5](design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8).
|