pr-context-engine 0.1.0__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/workflows/pr-review.yml +6 -0
  2. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CHANGELOG.md +14 -0
  3. pr_context_engine-0.1.2/PKG-INFO +261 -0
  4. pr_context_engine-0.1.2/README.md +231 -0
  5. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/action.yml +6 -0
  6. pr_context_engine-0.1.2/docs/architecture.md +125 -0
  7. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/docs/design-decisions.md +13 -0
  8. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/pyproject.toml +1 -1
  9. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/generator.py +35 -10
  10. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/prompt_templates.py +2 -0
  11. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/cli.py +3 -1
  12. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_briefing_generator.py +55 -0
  13. pr_context_engine-0.1.0/PKG-INFO +0 -211
  14. pr_context_engine-0.1.0/README.md +0 -181
  15. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.env.example +0 -0
  16. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  17. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  18. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/pull_request_template.md +0 -0
  19. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.github/workflows/release.yml +0 -0
  20. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.gitignore +0 -0
  21. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/.python-version +0 -0
  22. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CODE_OF_CONDUCT.md +0 -0
  23. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CONFIG.md +0 -0
  24. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/CONTRIBUTING.md +0 -0
  25. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/LICENSE +0 -0
  26. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/PROJECT.md +0 -0
  27. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/__init__.py +0 -0
  28. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/__init__.py +0 -0
  29. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/ast_walker.py +0 -0
  30. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/diff_parser.py +0 -0
  31. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/analyzers/risk_scorer.py +0 -0
  32. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/briefing/__init__.py +0 -0
  33. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/config.py +0 -0
  34. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/__init__.py +0 -0
  35. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/codebase_index.py +0 -0
  36. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/context/git_history.py +0 -0
  37. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/__init__.py +0 -0
  38. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/confidence.py +0 -0
  39. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/fixes/fix_generator.py +0 -0
  40. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/github_api/__init__.py +0 -0
  41. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/github_api/comment_poster.py +0 -0
  42. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/__init__.py +0 -0
  43. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/anthropic_provider.py +0 -0
  44. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/base.py +0 -0
  45. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/gemini_provider.py +0 -0
  46. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/groq_provider.py +0 -0
  47. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/src/llm/ollama_provider.py +0 -0
  48. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/__init__.py +0 -0
  49. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/__init__.py +0 -0
  50. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/01-simple-refactor.json +0 -0
  51. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/02-auth-middleware.json +0 -0
  52. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/03-db-migration.json +0 -0
  53. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/04-config-update.json +0 -0
  54. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/05-public-api-deleted.json +0 -0
  55. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/06-hardcoded-api-key.json +0 -0
  56. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/07-token-in-url.json +0 -0
  57. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/08-retry-no-limit.json +0 -0
  58. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/09-missing-null-check.json +0 -0
  59. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/10-trivial-docfix.json +0 -0
  60. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/11-multi-flag.json +0 -0
  61. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/12-new-endpoint.json +0 -0
  62. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/13-auth-bypass.json +0 -0
  63. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/14-env-file-update.json +0 -0
  64. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/fixtures/15-dependency-update.json +0 -0
  65. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/rubric.md +0 -0
  66. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/eval/test_briefings.py +0 -0
  67. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/__init__.py +0 -0
  68. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_anthropic_provider.py +0 -0
  69. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_ast_walker.py +0 -0
  70. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_codebase_index.py +0 -0
  71. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_config.py +0 -0
  72. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_diff_parser.py +0 -0
  73. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_failover_provider.py +0 -0
  74. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_fix_generator.py +0 -0
  75. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_gemini_provider.py +0 -0
  76. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_git_history.py +0 -0
  77. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_groq_provider.py +0 -0
  78. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_ollama_provider.py +0 -0
  79. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/tests/unit/test_risk_scorer.py +0 -0
  80. {pr_context_engine-0.1.0 → pr_context_engine-0.1.2}/uv.lock +0 -0
@@ -37,6 +37,12 @@ jobs:
37
37
  - name: Install uv
38
38
  run: pip install uv
39
39
 
40
+ - name: Restore embedding model cache
41
+ uses: actions/cache@v4
42
+ with:
43
+ path: ~/.cache/fastembed
44
+ key: fastembed-bge-small-en-v1.5
45
+
40
46
  - name: Restore index cache
41
47
  uses: actions/cache@v4
42
48
  with:
@@ -6,6 +6,20 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Thi
6
6
 
7
7
  ## Unreleased
8
8
 
9
+ ## 0.1.2 — 2026-05-23
10
+
11
+ ### Fixed
12
+
13
+ - **Briefing parser** — Section headers are now matched after stripping markdown decoration (`**`, `##`, `__`). Groq's llama-3.3-70b wraps headers in bold/heading markdown despite prompt instructions, causing all four sections to parse as empty. The parser now normalises headers before matching, and logs the raw LLM response when all sections fail to aid future debugging.
14
+ - **Prompt template** — Added explicit instruction prohibiting markdown decoration on section headers.
15
+
16
+ ## 0.1.1 — 2026-05-20
17
+
18
+ ### Fixed
19
+
20
+ - **RAG + history quality** — Restored full per-file RAG chunk retrieval and git history; only the file list shown in the prompt is capped at 20 to respect the token budget.
21
+ - **CI** — Cache fastembed embedding model in CI workflows to reduce cold-start time.
22
+
9
23
  ## 0.1.0 — 2026-05-17
10
24
 
11
25
  ### Added
@@ -0,0 +1,261 @@
1
+ Metadata-Version: 2.4
2
+ Name: pr-context-engine
3
+ Version: 0.1.2
4
+ Summary: An AI tool that reads every PR and posts a senior-engineer-style briefing.
5
+ Project-URL: Homepage, https://github.com/paramahastha/pr-context-engine
6
+ Project-URL: Repository, https://github.com/paramahastha/pr-context-engine
7
+ Project-URL: Issues, https://github.com/paramahastha/pr-context-engine/issues
8
+ Project-URL: Changelog, https://github.com/paramahastha/pr-context-engine/blob/main/CHANGELOG.md
9
+ Author-email: Kautsar <paramahastha@gmail.com>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: ai,code-review,github,llm,pull-request
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Environment :: Console
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Software Development :: Version Control :: Git
19
+ Requires-Python: >=3.12
20
+ Requires-Dist: anthropic>=0.40
21
+ Requires-Dist: fastembed>=0.4
22
+ Requires-Dist: google-genai>=1.0
23
+ Requires-Dist: groq>=0.13
24
+ Requires-Dist: pygithub>=2.4
25
+ Requires-Dist: python-dotenv>=1.0
26
+ Requires-Dist: requests>=2.32
27
+ Requires-Dist: sqlite-vec>=0.1
28
+ Requires-Dist: typer>=0.12
29
+ Description-Content-Type: text/markdown
30
+
31
+ # PR Context Engine
32
+
33
+ [![CI](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml/badge.svg)](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml)
34
+ [![PyPI version](https://img.shields.io/pypi/v/pr-context-engine)](https://pypi.org/project/pr-context-engine/)
35
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
36
+ [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
37
+
38
+ > An AI tool that reads every PR and writes the briefing — and the fixes — a senior engineer would, with the calibration data to prove it's not just guessing.
39
+
40
+ <!-- Demo GIF goes here once recorded: docs/demo.gif -->
41
+ <!-- ![Demo: PR getting briefed and a suggestion being one-click applied](docs/demo.gif) -->
42
+
43
+ ## What it does
44
+
45
+ Every PR opens with three problems for the reviewer: _what is this actually doing_, _what could it break_, and _what should I push back on_. A diff doesn't answer any of those.
46
+
47
+ PR Context Engine reads the diff plus surrounding code, recent git history, and semantically similar code from elsewhere in the repo, then posts a terse briefing written like a senior backend engineer would write it. No praise. No filler. No "this LGTM." Just the context a reviewer needs.
48
+
49
+ With `ENABLE_FIXES=true`, it also generates confidence-gated patch suggestions for located issues — posted as collapsible GitHub suggestion blocks the maintainer can apply in one click. When it isn't sure, it says so in prose instead of guessing.
50
+
51
+ ## Quickstart (5 minutes)
52
+
53
+ ### Check your setup first
54
+
55
+ ```bash
56
+ pipx install pr-context-engine
57
+ export GROQ_API_KEY=<your-key> # free at console.groq.com/keys
58
+ export GITHUB_TOKEN=$(gh auth token)
59
+ pr-context-engine quickstart # checks keys, scopes, prints what's missing
60
+ ```
61
+
62
+ ### Option A — GitHub Action (recommended)
63
+
64
+ 1. Get a free [Groq API key](https://console.groq.com/keys) — no credit card.
65
+ 2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
66
+ 3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
67
+ 4. Add this to `.github/workflows/pr-briefing.yml`:
68
+
69
+ ```yaml
70
+ name: PR Briefing
71
+ on:
72
+ pull_request:
73
+ types: [opened, synchronize, reopened]
74
+ jobs:
75
+ brief:
76
+ runs-on: ubuntu-latest
77
+ permissions:
78
+ pull-requests: write
79
+ contents: read
80
+ steps:
81
+ - uses: paramahastha/pr-context-engine@main
82
+ with:
83
+ groq-api-key: ${{ secrets.GROQ_API_KEY }}
84
+ ```
85
+
86
+ That's it. Every new PR gets a briefing comment automatically.
87
+
88
+ ### Option B — CLI (any CI or local)
89
+
90
+ ```bash
91
+ pipx install pr-context-engine
92
+ export GROQ_API_KEY=<your-groq-key>
93
+ export GITHUB_TOKEN=$(gh auth token)
94
+
95
+ # Dry-run: see the briefing without posting it
96
+ pr-context-engine review --pr 42 --repo owner/name --dry-run
97
+
98
+ # Post the real comment
99
+ pr-context-engine review --pr 42 --repo owner/name
100
+ ```
101
+
102
+ ## Live example
103
+
104
+ A PR touching auth middleware produces this comment automatically:
105
+
106
+ ```markdown
107
+ ## 🤖 PR Briefing
108
+
109
+ **What changed**
110
+ Refactors session token storage from an in-memory dict to Redis, adding a configurable
111
+ TTL. The auth middleware is updated to hit Redis on every request.
112
+
113
+ **Blast radius**
114
+ Any caller of `get_session()` now depends on Redis being reachable. If Redis is down,
115
+ all authenticated requests will 401. The previous in-memory store had no such single
116
+ point of failure.
117
+
118
+ **Risk flags**
119
+ - `modifies_auth`: src/auth/session.py line 42 — `token = generate_token(user_id)`
120
+
121
+ **Questions for the reviewer**
122
+
123
+ 1. The Redis client is initialised once at import time — is there a reconnect strategy
124
+ if the connection drops mid-deploy?
125
+ 2. `SESSION_TTL` defaults to 3600 but the old in-memory store had no TTL — will existing
126
+ sessions all expire immediately after deploy?
127
+ 3. There are no tests for the Redis-down path — is 401-on-outage the intended degradation,
128
+ or should it fall back to the old store?
129
+
130
+ ---
131
+
132
+ <sub>Generated by [PR Context Engine](https://github.com/paramahastha/pr-context-engine) via groq. Not a substitute for human review.</sub>
133
+ ```
134
+
135
+ ## Architecture
136
+
137
+ ```
138
+ Front door A: Front door B:
139
+ GitHub Action wrapper pipx install + run in any CI / locally
140
+ (paramahastha/pr-context-engine@main) (pr-context-engine review --pr 42 --repo …)
141
+ │ │
142
+ └──────────────┬───────────────────────┘
143
+
144
+ ┌──────────────────────────────────────┐
145
+ │ CLI core — src/cli.py │
146
+ │ orchestrate: diff → analyze → │
147
+ │ brief → (fixes) → post │
148
+ └──────────────────────────────────────┘
149
+
150
+ ├──► analyzers/ diff → FileChange objects, AST symbols, risk flags
151
+ ├──► context/ git history + sqlite-vec RAG (fastembed, local)
152
+ ├──► briefing/ prompt assembly → LLM call → structured Briefing
153
+ ├──► fixes/ confidence-gated patch suggestions (opt-in)
154
+ ├──► llm/ FailoverProvider: Groq → Gemini → hard error
155
+ └──► github_api/ fetch diff, post comment + suggestion blocks
156
+ ```
157
+
158
+ The CLI is the product; the GitHub Action is a thin wrapper. All logic lives in Python — no YAML logic.
159
+
160
+ See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram and data-flow walkthrough.
161
+
162
+ ## Switching LLM providers
163
+
164
+ Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
165
+
166
+ | Provider | Key env var | Notes |
167
+ |---|---|---|
168
+ | `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
169
+ | `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
170
+ | `ollama` | — | Local, offline, no rate limits |
171
+ | `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
172
+
173
+ **Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
174
+
175
+ ## Fix suggestions (opt-in)
176
+
177
+ When `ENABLE_FIXES=true`, the tool generates confidence-gated patch suggestions for located issues (flags with a known file + line). Only `high`/`medium` confidence suggestions become one-click GitHub suggestion blocks; `low` confidence produces prose notes only. Max 3 suggestions per PR.
178
+
179
+ ```yaml
180
+ - uses: paramahastha/pr-context-engine@main
181
+ with:
182
+ groq-api-key: ${{ secrets.GROQ_API_KEY }}
183
+ enable-fixes: "true"
184
+ ```
185
+
186
+ See [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) for why this is opt-in and confidence-gated.
187
+
188
+ ## Eval results
189
+
190
+ `pytest tests/eval/` measures briefing quality across 15 real-world PR fixtures.
191
+
192
+ **Static analysis (no API key needed):**
193
+
194
+ | Metric | Score |
195
+ |---|---|
196
+ | Risk flag precision | **1.00** (0 false positives across 15 fixtures) |
197
+ | Risk flag recall | **1.00** (all expected flags detected) |
198
+
199
+ **LLM-as-judge scores** (run with `GROQ_API_KEY` + `ANTHROPIC_API_KEY`) assess five dimensions — Accuracy, Blast radius, Risk flags, Question quality, Brevity — on a 0–3 scale, plus Fix correctness and Calibration rate for the fix feature. Historical scores are committed to `tests/eval/scores.jsonl` so regressions are visible in git history.
200
+
201
+ ```bash
202
+ # Analyzer-only (no API key needed):
203
+ pytest tests/eval/ -v
204
+
205
+ # Full eval with LLM-as-judge scoring:
206
+ GROQ_API_KEY=... ANTHROPIC_API_KEY=... pytest tests/eval/ -v -s
207
+ ```
208
+
209
+ The headline metrics are **fix correctness rate** (when the bot proposed a patch, was it actually correct?) and **false-confidence rate** (when it said `high` confidence, how often was the patch wrong?). These are the hardest-to-fake numbers in the scorecard.
210
+
211
+ ## Data & privacy
212
+
213
+ **What leaves your machine:**
214
+
215
+ - The PR diff and parsed metadata (file paths, function names, changed lines) are sent to the active LLM provider (Groq or Gemini by default).
216
+ - No source code beyond the diff is sent to any external API. The codebase index (RAG) runs entirely locally via `fastembed` + `sqlite-vec` — no embedding API, no external call.
217
+ - Git history and PR metadata are fetched from the GitHub API using your `GITHUB_TOKEN`.
218
+
219
+ **Provider data policies:**
220
+
221
+ - Groq and Gemini free tiers may use inputs for model improvement. Check their privacy policies before using on private or sensitive repos.
222
+ - Use `LLM_PROVIDER=ollama` or `LLM_PROVIDER=anthropic` (BYO key) if you need stronger data-isolation guarantees.
223
+ - The tool has no shared backend. Your API key, your quota, your data. Running it on 1 000 repos costs you nothing extra and costs me nothing.
224
+
225
+ ## Design decisions
226
+
227
+ Short ADRs covering the tradeoffs that shaped the architecture:
228
+
229
+ | ADR | Decision |
230
+ |---|---|
231
+ | [ADR-0](docs/design-decisions.md#adr-0-provider-abstraction-built-early) | Provider abstraction built in M2, not retrofitted later |
232
+ | [ADR-1](docs/design-decisions.md#adr-1-cli-core-with-two-front-doors) | CLI-core with two front doors (Action + pipx) |
233
+ | [ADR-2](docs/design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) | SQLite + sqlite-vec over Pinecone or Chroma |
234
+ | [ADR-3](docs/design-decisions.md#adr-3-local-embeddings-via-fastembed) | Local embeddings via fastembed (no embedding API) |
235
+ | [ADR-4](docs/design-decisions.md#adr-4-shallow-clone-tradeoff-in-ci-fetch-depth-50) | fetch-depth: 50 tradeoff in CI |
236
+ | [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) | Fix suggestions opt-in and confidence-gated |
237
+ | [ADR-6](docs/design-decisions.md#adr-6-mit-license) | MIT license |
238
+ | [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) | Failover order: Groq → Gemini → hard error |
239
+ | [ADR-8](docs/design-decisions.md#adr-8-python-312-as-the-implementation-language) | Python 3.12 over Go/TypeScript/Rust |
240
+
241
+ ## Cost
242
+
243
+ **$0/month** for a portfolio-scale project on public repos.
244
+
245
+ | Component | Cost |
246
+ |---|---|
247
+ | GitHub Actions | Free for public repos |
248
+ | Groq (default LLM) | Free tier, ~1 000 req/day |
249
+ | Gemini (failover) | Free tier, ~1 500 req/day |
250
+ | Local embeddings (`fastembed`) | $0, no API, runs in-process |
251
+ | Shared backend | None — your key, your quota |
252
+
253
+ Free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). The [failover design](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) means a single provider's policy change degrades gracefully instead of breaking the tool.
254
+
255
+ ## Configuration
256
+
257
+ See [CONFIG.md](CONFIG.md) for every env var, flag, default, and a minimal vs. full example.
258
+
259
+ ## Contributing
260
+
261
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, running tests, and the milestone philosophy. Bug reports and feature requests go in [Issues](https://github.com/paramahastha/pr-context-engine/issues).
@@ -0,0 +1,231 @@
1
+ # PR Context Engine
2
+
3
+ [![CI](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml/badge.svg)](https://github.com/paramahastha/pr-context-engine/actions/workflows/pr-review.yml)
4
+ [![PyPI version](https://img.shields.io/pypi/v/pr-context-engine)](https://pypi.org/project/pr-context-engine/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
6
+ [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
7
+
8
+ > An AI tool that reads every PR and writes the briefing — and the fixes — a senior engineer would, with the calibration data to prove it's not just guessing.
9
+
10
+ <!-- Demo GIF goes here once recorded: docs/demo.gif -->
11
+ <!-- ![Demo: PR getting briefed and a suggestion being one-click applied](docs/demo.gif) -->
12
+
13
+ ## What it does
14
+
15
+ Every PR opens with three problems for the reviewer: _what is this actually doing_, _what could it break_, and _what should I push back on_. A diff doesn't answer any of those.
16
+
17
+ PR Context Engine reads the diff plus surrounding code, recent git history, and semantically similar code from elsewhere in the repo, then posts a terse briefing written like a senior backend engineer would write it. No praise. No filler. No "this LGTM." Just the context a reviewer needs.
18
+
19
+ With `ENABLE_FIXES=true`, it also generates confidence-gated patch suggestions for located issues — posted as collapsible GitHub suggestion blocks the maintainer can apply in one click. When it isn't sure, it says so in prose instead of guessing.
20
+
21
+ ## Quickstart (5 minutes)
22
+
23
+ ### Check your setup first
24
+
25
+ ```bash
26
+ pipx install pr-context-engine
27
+ export GROQ_API_KEY=<your-key> # free at console.groq.com/keys
28
+ export GITHUB_TOKEN=$(gh auth token)
29
+ pr-context-engine quickstart # checks keys, scopes, prints what's missing
30
+ ```
31
+
32
+ ### Option A — GitHub Action (recommended)
33
+
34
+ 1. Get a free [Groq API key](https://console.groq.com/keys) — no credit card.
35
+ 2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
36
+ 3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
37
+ 4. Add this to `.github/workflows/pr-briefing.yml`:
38
+
39
+ ```yaml
40
+ name: PR Briefing
41
+ on:
42
+ pull_request:
43
+ types: [opened, synchronize, reopened]
44
+ jobs:
45
+ brief:
46
+ runs-on: ubuntu-latest
47
+ permissions:
48
+ pull-requests: write
49
+ contents: read
50
+ steps:
51
+ - uses: paramahastha/pr-context-engine@main
52
+ with:
53
+ groq-api-key: ${{ secrets.GROQ_API_KEY }}
54
+ ```
55
+
56
+ That's it. Every new PR gets a briefing comment automatically.
57
+
58
+ ### Option B — CLI (any CI or local)
59
+
60
+ ```bash
61
+ pipx install pr-context-engine
62
+ export GROQ_API_KEY=<your-groq-key>
63
+ export GITHUB_TOKEN=$(gh auth token)
64
+
65
+ # Dry-run: see the briefing without posting it
66
+ pr-context-engine review --pr 42 --repo owner/name --dry-run
67
+
68
+ # Post the real comment
69
+ pr-context-engine review --pr 42 --repo owner/name
70
+ ```
71
+
72
+ ## Live example
73
+
74
+ A PR touching auth middleware produces this comment automatically:
75
+
76
+ ```markdown
77
+ ## 🤖 PR Briefing
78
+
79
+ **What changed**
80
+ Refactors session token storage from an in-memory dict to Redis, adding a configurable
81
+ TTL. The auth middleware is updated to hit Redis on every request.
82
+
83
+ **Blast radius**
84
+ Any caller of `get_session()` now depends on Redis being reachable. If Redis is down,
85
+ all authenticated requests will 401. The previous in-memory store had no such single
86
+ point of failure.
87
+
88
+ **Risk flags**
89
+ - `modifies_auth`: src/auth/session.py line 42 — `token = generate_token(user_id)`
90
+
91
+ **Questions for the reviewer**
92
+
93
+ 1. The Redis client is initialised once at import time — is there a reconnect strategy
94
+ if the connection drops mid-deploy?
95
+ 2. `SESSION_TTL` defaults to 3600 but the old in-memory store had no TTL — will existing
96
+ sessions all expire immediately after deploy?
97
+ 3. There are no tests for the Redis-down path — is 401-on-outage the intended degradation,
98
+ or should it fall back to the old store?
99
+
100
+ ---
101
+
102
+ <sub>Generated by [PR Context Engine](https://github.com/paramahastha/pr-context-engine) via groq. Not a substitute for human review.</sub>
103
+ ```
104
+
105
+ ## Architecture
106
+
107
+ ```
108
+ Front door A: Front door B:
109
+ GitHub Action wrapper pipx install + run in any CI / locally
110
+ (paramahastha/pr-context-engine@main) (pr-context-engine review --pr 42 --repo …)
111
+ │ │
112
+ └──────────────┬───────────────────────┘
113
+
114
+ ┌──────────────────────────────────────┐
115
+ │ CLI core — src/cli.py │
116
+ │ orchestrate: diff → analyze → │
117
+ │ brief → (fixes) → post │
118
+ └──────────────────────────────────────┘
119
+
120
+ ├──► analyzers/ diff → FileChange objects, AST symbols, risk flags
121
+ ├──► context/ git history + sqlite-vec RAG (fastembed, local)
122
+ ├──► briefing/ prompt assembly → LLM call → structured Briefing
123
+ ├──► fixes/ confidence-gated patch suggestions (opt-in)
124
+ ├──► llm/ FailoverProvider: Groq → Gemini → hard error
125
+ └──► github_api/ fetch diff, post comment + suggestion blocks
126
+ ```
127
+
128
+ The CLI is the product; the GitHub Action is a thin wrapper. All logic lives in Python — no YAML logic.
129
+
130
+ See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram and data-flow walkthrough.
131
+
132
+ ## Switching LLM providers
133
+
134
+ Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
135
+
136
+ | Provider | Key env var | Notes |
137
+ |---|---|---|
138
+ | `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
139
+ | `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
140
+ | `ollama` | — | Local, offline, no rate limits |
141
+ | `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
142
+
143
+ **Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
144
+
145
+ ## Fix suggestions (opt-in)
146
+
147
+ When `ENABLE_FIXES=true`, the tool generates confidence-gated patch suggestions for located issues (flags with a known file + line). Only `high`/`medium` confidence suggestions become one-click GitHub suggestion blocks; `low` confidence produces prose notes only. Max 3 suggestions per PR.
148
+
149
+ ```yaml
150
+ - uses: paramahastha/pr-context-engine@main
151
+ with:
152
+ groq-api-key: ${{ secrets.GROQ_API_KEY }}
153
+ enable-fixes: "true"
154
+ ```
155
+
156
+ See [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) for why this is opt-in and confidence-gated.
157
+
158
+ ## Eval results
159
+
160
+ `pytest tests/eval/` measures briefing quality across 15 real-world PR fixtures.
161
+
162
+ **Static analysis (no API key needed):**
163
+
164
+ | Metric | Score |
165
+ |---|---|
166
+ | Risk flag precision | **1.00** (0 false positives across 15 fixtures) |
167
+ | Risk flag recall | **1.00** (all expected flags detected) |
168
+
169
+ **LLM-as-judge scores** (run with `GROQ_API_KEY` + `ANTHROPIC_API_KEY`) assess five dimensions — Accuracy, Blast radius, Risk flags, Question quality, Brevity — on a 0–3 scale, plus Fix correctness and Calibration rate for the fix feature. Historical scores are committed to `tests/eval/scores.jsonl` so regressions are visible in git history.
170
+
171
+ ```bash
172
+ # Analyzer-only (no API key needed):
173
+ pytest tests/eval/ -v
174
+
175
+ # Full eval with LLM-as-judge scoring:
176
+ GROQ_API_KEY=... ANTHROPIC_API_KEY=... pytest tests/eval/ -v -s
177
+ ```
178
+
179
+ The headline metrics are **fix correctness rate** (when the bot proposed a patch, was it actually correct?) and **false-confidence rate** (when it said `high` confidence, how often was the patch wrong?). These are the hardest-to-fake numbers in the scorecard.
180
+
181
+ ## Data & privacy
182
+
183
+ **What leaves your machine:**
184
+
185
+ - The PR diff and parsed metadata (file paths, function names, changed lines) are sent to the active LLM provider (Groq or Gemini by default).
186
+ - No source code beyond the diff is sent to any external API. The codebase index (RAG) runs entirely locally via `fastembed` + `sqlite-vec` — no embedding API, no external call.
187
+ - Git history and PR metadata are fetched from the GitHub API using your `GITHUB_TOKEN`.
188
+
189
+ **Provider data policies:**
190
+
191
+ - Groq and Gemini free tiers may use inputs for model improvement. Check their privacy policies before using on private or sensitive repos.
192
+ - Use `LLM_PROVIDER=ollama` or `LLM_PROVIDER=anthropic` (BYO key) if you need stronger data-isolation guarantees.
193
+ - The tool has no shared backend. Your API key, your quota, your data. Running it on 1 000 repos costs you nothing extra and costs me nothing.
194
+
195
+ ## Design decisions
196
+
197
+ Short ADRs covering the tradeoffs that shaped the architecture:
198
+
199
+ | ADR | Decision |
200
+ |---|---|
201
+ | [ADR-0](docs/design-decisions.md#adr-0-provider-abstraction-built-early) | Provider abstraction built in M2, not retrofitted later |
202
+ | [ADR-1](docs/design-decisions.md#adr-1-cli-core-with-two-front-doors) | CLI-core with two front doors (Action + pipx) |
203
+ | [ADR-2](docs/design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) | SQLite + sqlite-vec over Pinecone or Chroma |
204
+ | [ADR-3](docs/design-decisions.md#adr-3-local-embeddings-via-fastembed) | Local embeddings via fastembed (no embedding API) |
205
+ | [ADR-4](docs/design-decisions.md#adr-4-shallow-clone-tradeoff-in-ci-fetch-depth-50) | fetch-depth: 50 tradeoff in CI |
206
+ | [ADR-5](docs/design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8) | Fix suggestions opt-in and confidence-gated |
207
+ | [ADR-6](docs/design-decisions.md#adr-6-mit-license) | MIT license |
208
+ | [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) | Failover order: Groq → Gemini → hard error |
209
+ | [ADR-8](docs/design-decisions.md#adr-8-python-312-as-the-implementation-language) | Python 3.12 over Go/TypeScript/Rust |
210
+
211
+ ## Cost
212
+
213
+ **$0/month** for a portfolio-scale project on public repos.
214
+
215
+ | Component | Cost |
216
+ |---|---|
217
+ | GitHub Actions | Free for public repos |
218
+ | Groq (default LLM) | Free tier, ~1 000 req/day |
219
+ | Gemini (failover) | Free tier, ~1 500 req/day |
220
+ | Local embeddings (`fastembed`) | $0, no API, runs in-process |
221
+ | Shared backend | None — your key, your quota |
222
+
223
+ Free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). The [failover design](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation) means a single provider's policy change degrades gracefully instead of breaking the tool.
224
+
225
+ ## Configuration
226
+
227
+ See [CONFIG.md](CONFIG.md) for every env var, flag, default, and a minimal vs. full example.
228
+
229
+ ## Contributing
230
+
231
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, running tests, and the milestone philosophy. Bug reports and feature requests go in [Issues](https://github.com/paramahastha/pr-context-engine/issues).
@@ -33,6 +33,12 @@ runs:
33
33
  with:
34
34
  python-version: "3.12"
35
35
 
36
+ - name: Restore embedding model cache
37
+ uses: actions/cache@v4
38
+ with:
39
+ path: ~/.cache/fastembed
40
+ key: fastembed-bge-small-en-v1.5
41
+
36
42
  - name: Restore index cache
37
43
  uses: actions/cache@v4
38
44
  with:
@@ -0,0 +1,125 @@
1
+ # Architecture
2
+
3
+ ## Overview
4
+
5
+ PR Context Engine follows a **CLI-core with two front doors** design. The CLI (`src/cli.py`) is the product; the GitHub Action is a thin wrapper around it. No orchestration logic lives in YAML.
6
+
7
+ This means the tool is:
8
+ - Testable locally (`pr-context-engine review --pr 42 --repo owner/name --dry-run`)
9
+ - Runnable in any CI (GitLab, CircleCI, Jenkins) with no GitHub lock-in
10
+ - Installable as a standalone CLI (`pipx install pr-context-engine`)
11
+ - The GitHub Action just calls the same binary any user would call
12
+
13
+ ## System diagram
14
+
15
+ ```mermaid
16
+ flowchart TD
17
+ A["Front door A\nGitHub Action wrapper\n(paramahastha/pr-context-engine@main)"]
18
+ B["Front door B\npipx install + run in any CI / locally\n(pr-context-engine review --pr 42 --repo …)"]
19
+
20
+ A & B --> CLI
21
+
22
+ subgraph CLI["CLI core — src/cli.py"]
23
+ direction TB
24
+ Orchestrator["orchestrate: fetch diff → analyze → brief → post"]
25
+ end
26
+
27
+ CLI --> Analyzers
28
+ CLI --> Context
29
+ CLI --> Briefing
30
+ CLI --> Fixes
31
+ CLI --> LLM
32
+ CLI --> GitHub
33
+
34
+ subgraph Analyzers["src/analyzers/"]
35
+ DP["diff_parser.py\nUnified diff → FileChange objects\n(path, language, hunks, added/removed lines)"]
36
+ AW["ast_walker.py\nAST symbol extraction\n(changed function/class names)"]
37
+ RS["risk_scorer.py\nHeuristic flag detection\n(auth, migration, config, deleted APIs)"]
38
+ end
39
+
40
+ subgraph Context["src/context/"]
41
+ GH["git_history.py\nLast 5 commits per touched file\nLast 3 merged PRs on same files"]
42
+ CI["codebase_index.py\nsqlite-vec + fastembed RAG\nTop-5 semantically similar chunks"]
43
+ end
44
+
45
+ subgraph Briefing["src/briefing/"]
46
+ PT["prompt_templates.py\nSenior-engineer system prompt"]
47
+ BG["generator.py\nPrompt assembly + LLM call\n→ structured Briefing object"]
48
+ end
49
+
50
+ subgraph Fixes["src/fixes/"]
51
+ FG["fix_generator.py\nLocated-issue → patch + rationale\n+ self-assessed confidence"]
52
+ CF["confidence.py\nGate: high/medium → suggestion block\nlow → prose note only\nmax 3 per PR"]
53
+ end
54
+
55
+ subgraph LLM["src/llm/"]
56
+ FP["FailoverProvider\nGroq → Gemini → hard error"]
57
+ GP["groq_provider.py"]
58
+ GMP["gemini_provider.py"]
59
+ OP["ollama_provider.py"]
60
+ AP["anthropic_provider.py"]
61
+ FP --> GP & GMP & OP & AP
62
+ end
63
+
64
+ subgraph GitHub["src/github_api/"]
65
+ CP["comment_poster.py\nFetch diff, post briefing comment\nPost line-anchored suggestion blocks\nin collapsed details sections"]
66
+ end
67
+
68
+ LLM --> Briefing
69
+ LLM --> Fixes
70
+ ```
71
+
72
+ ## Data flow for a single PR
73
+
74
+ ```
75
+ 1. CLI receives --pr N --repo owner/name
76
+ 2. github_api fetches the unified diff via REST
77
+ 3. diff_parser converts raw diff → list[FileChange]
78
+ 4. ast_walker extracts changed symbol names per file
79
+ 5. risk_scorer emits located-issue objects {flag, file, line, snippet}
80
+ 6. git_history fetches last 5 commits per touched file + last 3 merged PRs
81
+ 7. codebase_index queries sqlite-vec for top-5 related chunks per FileChange
82
+ 8. briefing/generator assembles all context into a structured prompt
83
+ 9. llm/FailoverProvider calls Groq (falls back to Gemini on 429)
84
+ 10. Generator parses the LLM response into a Briefing object
85
+ 11. If ENABLE_FIXES=true: fix_generator generates patches for located issues
86
+ confidence.py gates: high/medium → suggestion block, low → prose
87
+ 12. github_api posts the comment (briefing + collapsed suggestion blocks)
88
+ ```
89
+
90
+ ## Module responsibilities
91
+
92
+ | Module | Single responsibility |
93
+ |---|---|
94
+ | `src/cli.py` | Typer entrypoint; orchestrates the pipeline; no business logic |
95
+ | `src/config.py` | Reads env vars, instantiates the right LLM provider |
96
+ | `src/analyzers/diff_parser.py` | Unified diff → `FileChange` data objects |
97
+ | `src/analyzers/ast_walker.py` | AST symbol extraction for Python/JS/TS/Go |
98
+ | `src/analyzers/risk_scorer.py` | Heuristic flags → located-issue objects |
99
+ | `src/context/git_history.py` | Commit history and merged-PR context per file |
100
+ | `src/context/codebase_index.py` | sqlite-vec RAG index; embedding via fastembed |
101
+ | `src/briefing/prompt_templates.py` | System prompt text (verbatim; no logic) |
102
+ | `src/briefing/generator.py` | Prompt assembly + LLM call + response parsing |
103
+ | `src/fixes/fix_generator.py` | Located issue → structured patch + confidence |
104
+ | `src/fixes/confidence.py` | Gate logic: which confidence levels produce suggestion blocks |
105
+ | `src/llm/base.py` | `LLMProvider` abstract interface |
106
+ | `src/llm/groq_provider.py` | Groq implementation |
107
+ | `src/llm/gemini_provider.py` | Gemini implementation |
108
+ | `src/llm/ollama_provider.py` | Local Ollama implementation |
109
+ | `src/llm/anthropic_provider.py` | Anthropic implementation |
110
+ | `src/llm/__init__.py` | `FailoverProvider` wrapping ordered provider list |
111
+ | `src/github_api/comment_poster.py` | Diff fetch + PR comment posting + suggestion blocks |
112
+
113
+ ## Key design decisions
114
+
115
+ The five decisions that shaped everything else — with the reasoning that would survive a six-month gap:
116
+
117
+ 1. **CLI-core over Action-only** — makes the tool testable locally and portable across CI systems. See [ADR-1](design-decisions.md#adr-1-cli-core-with-two-front-doors).
118
+
119
+ 2. **Provider abstraction built in M2, not last** — free LLM tiers change without warning (Gemini cut 50–80% in Dec 2025). Retrofitting abstraction later would have required touching every caller. See [ADR-0](design-decisions.md#adr-0-provider-abstraction-built-early).
120
+
121
+ 3. **sqlite-vec + fastembed over a hosted vector store** — $0/month, no external service, no second API key, no latency for network round-trips. The index file is cached across Action runs. See [ADR-2](design-decisions.md#adr-2-sqlite--sqlite-vec-over-a-hosted-vector-store) and [ADR-3](design-decisions.md#adr-3-local-embeddings-via-fastembed).
122
+
123
+ 4. **Located-issue data shape in M3** — `risk_scorer` emits `{flag, file, line, snippet}` objects from the start. M8's fix generator depends on `file` and `line`; a bare string list would have forced a painful refactor eight milestones later.
124
+
125
+ 5. **Fix suggestions opt-in and confidence-gated** — a confidently-wrong auto-fix erodes trust faster than no fix. The calibration metric in the eval harness is the accountability mechanism. See [ADR-5](design-decisions.md#adr-5-opt-in-fix-suggestions-with-confidence-gating-milestone-8).