pr-context-engine 0.1.2__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (79) hide show
  1. pr_context_engine-0.1.3/.claude/commands/publish-pypi.md +58 -0
  2. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.github/workflows/pr-review.yml +4 -2
  3. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/CHANGELOG.md +8 -0
  4. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/PKG-INFO +42 -12
  5. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/README.md +41 -11
  6. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/action.yml +28 -5
  7. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/pyproject.toml +1 -1
  8. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/github_api/comment_poster.py +18 -2
  9. pr_context_engine-0.1.2/PROJECT.md +0 -481
  10. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.env.example +0 -0
  11. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  12. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  13. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.github/pull_request_template.md +0 -0
  14. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.github/workflows/release.yml +0 -0
  15. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.gitignore +0 -0
  16. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/.python-version +0 -0
  17. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/CODE_OF_CONDUCT.md +0 -0
  18. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/CONFIG.md +0 -0
  19. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/CONTRIBUTING.md +0 -0
  20. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/LICENSE +0 -0
  21. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/docs/architecture.md +0 -0
  22. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/docs/design-decisions.md +0 -0
  23. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/__init__.py +0 -0
  24. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/analyzers/__init__.py +0 -0
  25. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/analyzers/ast_walker.py +0 -0
  26. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/analyzers/diff_parser.py +0 -0
  27. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/analyzers/risk_scorer.py +0 -0
  28. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/briefing/__init__.py +0 -0
  29. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/briefing/generator.py +0 -0
  30. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/briefing/prompt_templates.py +0 -0
  31. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/cli.py +0 -0
  32. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/config.py +0 -0
  33. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/context/__init__.py +0 -0
  34. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/context/codebase_index.py +0 -0
  35. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/context/git_history.py +0 -0
  36. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/fixes/__init__.py +0 -0
  37. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/fixes/confidence.py +0 -0
  38. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/fixes/fix_generator.py +0 -0
  39. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/github_api/__init__.py +0 -0
  40. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/__init__.py +0 -0
  41. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/anthropic_provider.py +0 -0
  42. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/base.py +0 -0
  43. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/gemini_provider.py +0 -0
  44. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/groq_provider.py +0 -0
  45. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/src/llm/ollama_provider.py +0 -0
  46. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/__init__.py +0 -0
  47. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/__init__.py +0 -0
  48. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/01-simple-refactor.json +0 -0
  49. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/02-auth-middleware.json +0 -0
  50. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/03-db-migration.json +0 -0
  51. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/04-config-update.json +0 -0
  52. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/05-public-api-deleted.json +0 -0
  53. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/06-hardcoded-api-key.json +0 -0
  54. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/07-token-in-url.json +0 -0
  55. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/08-retry-no-limit.json +0 -0
  56. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/09-missing-null-check.json +0 -0
  57. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/10-trivial-docfix.json +0 -0
  58. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/11-multi-flag.json +0 -0
  59. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/12-new-endpoint.json +0 -0
  60. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/13-auth-bypass.json +0 -0
  61. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/14-env-file-update.json +0 -0
  62. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/fixtures/15-dependency-update.json +0 -0
  63. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/rubric.md +0 -0
  64. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/eval/test_briefings.py +0 -0
  65. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/__init__.py +0 -0
  66. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_anthropic_provider.py +0 -0
  67. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_ast_walker.py +0 -0
  68. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_briefing_generator.py +0 -0
  69. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_codebase_index.py +0 -0
  70. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_config.py +0 -0
  71. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_diff_parser.py +0 -0
  72. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_failover_provider.py +0 -0
  73. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_fix_generator.py +0 -0
  74. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_gemini_provider.py +0 -0
  75. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_git_history.py +0 -0
  76. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_groq_provider.py +0 -0
  77. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_ollama_provider.py +0 -0
  78. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/tests/unit/test_risk_scorer.py +0 -0
  79. {pr_context_engine-0.1.2 → pr_context_engine-0.1.3}/uv.lock +0 -0
@@ -0,0 +1,58 @@
1
+ ---
2
+ description: Publish a new release to PyPI — bumps version, updates CHANGELOG, commits, tags, and merges to main. Usage: /publish-pypi [patch|minor|major] (default: patch)
3
+ ---
4
+
5
+ Publish a new version of pr-context-engine to PyPI using the project's GitHub Actions release flow.
6
+
7
+ **Bump type**: $ARGUMENTS (default to `patch` if empty)
8
+
9
+ Follow these steps exactly, in order:
10
+
11
+ ## 1. Read current version
12
+ Read `pyproject.toml` and extract the current `version` field.
13
+
14
+ ## 2. Compute new version
15
+ Apply the bump type to the current version (semver):
16
+ - `patch` — increment the third number (0.1.2 → 0.1.3)
17
+ - `minor` — increment the second number, reset patch (0.1.2 → 0.2.0)
18
+ - `major` — increment the first number, reset minor and patch (0.1.2 → 1.0.0)
19
+
20
+ ## 3. Update pyproject.toml
21
+ Edit the `version` field in `pyproject.toml` to the new version.
22
+
23
+ ## 4. Update CHANGELOG.md
24
+ Read `CHANGELOG.md`. Under `## Unreleased`, look at what's there.
25
+ - If `## Unreleased` has content, move it into a new dated section `## X.Y.Z — YYYY-MM-DD` (use today's date) inserted below `## Unreleased`.
26
+ - If `## Unreleased` is empty, create the new section anyway and populate it by summarising the commits since the last tag: run `git log $(git describe --tags --abbrev=0)..HEAD --oneline` to get them, then write a short changelog entry under `### Fixed`, `### Added`, or `### Changed` as appropriate.
27
+ - Leave `## Unreleased` as an empty section above the new version section.
28
+
29
+ ## 5. Commit the version bump
30
+ Stage only `pyproject.toml` and `CHANGELOG.md`, then commit with message:
31
+ `chore: bump version to X.Y.Z`
32
+
33
+ Do NOT commit anything else.
34
+
35
+ ## 6. Push the current branch
36
+ Run `git push origin <current-branch>`.
37
+
38
+ ## 7. Tag and push the tag
39
+ ```
40
+ git tag vX.Y.Z
41
+ git push origin vX.Y.Z
42
+ ```
43
+ This triggers the `release.yml` GitHub Actions workflow, which builds and publishes to PyPI via OIDC trusted publishing — no credentials needed.
44
+
45
+ ## 8. Merge to main
46
+ ```
47
+ git checkout main
48
+ git merge <previous-branch> --ff-only
49
+ git push origin main
50
+ git checkout <previous-branch>
51
+ ```
52
+
53
+ ## 9. Confirm
54
+ Report:
55
+ - New version published: `X.Y.Z`
56
+ - Tag pushed: `vX.Y.Z`
57
+ - GitHub Actions release workflow URL: `https://github.com/paramahastha/pr-context-engine/actions`
58
+ - PyPI package URL: `https://pypi.org/project/pr-context-engine/X.Y.Z/`
@@ -21,6 +21,8 @@ jobs:
21
21
 
22
22
  brief:
23
23
  needs: lint
24
+ # Fork PRs get a read-only GITHUB_TOKEN regardless of `permissions:` — skip rather than 401.
25
+ if: github.event.pull_request.head.repo.full_name == github.repository
24
26
  runs-on: ubuntu-latest
25
27
  permissions:
26
28
  pull-requests: write
@@ -56,9 +58,9 @@ jobs:
56
58
  - name: Generate briefing
57
59
  env:
58
60
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
59
- LLM_PROVIDER: groq
61
+ LLM_PROVIDER: gemini
62
+ GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
60
63
  GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
61
- GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} # fallback (Milestone 7)
62
64
  run: >
63
65
  uv run pr-context-engine review
64
66
  --pr ${{ github.event.pull_request.number }}
@@ -6,6 +6,14 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Thi
6
6
 
7
7
  ## Unreleased
8
8
 
9
+ ## 0.1.3 — 2026-05-23
10
+
11
+ ### Fixed
12
+
13
+ - **GitHub Action 401 error** — `action.yml` now falls back to `github.token` when the `github-token` input is not explicitly passed, preventing Bad credentials errors in consumer workflows.
14
+ - **Error messages** — `post_pr_comment` now catches `GithubException` 401/403 and raises a `RuntimeError` with an actionable message (missing `permissions: pull-requests: write` vs. invalid token) instead of a raw PyGithub traceback.
15
+ - **Fork PR 401** — `pr-review.yml` skips the `brief` job for fork PRs; GitHub's security model forces a read-only token on `pull_request` events from forks regardless of the `permissions:` block.
16
+
9
17
  ## 0.1.2 — 2026-05-23
10
18
 
11
19
  ### Fixed
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pr-context-engine
3
- Version: 0.1.2
3
+ Version: 0.1.3
4
4
  Summary: An AI tool that reads every PR and posts a senior-engineer-style briefing.
5
5
  Project-URL: Homepage, https://github.com/paramahastha/pr-context-engine
6
6
  Project-URL: Repository, https://github.com/paramahastha/pr-context-engine
@@ -61,8 +61,8 @@ pr-context-engine quickstart # checks keys, scopes, prints what's missin
61
61
 
62
62
  ### Option A — GitHub Action (recommended)
63
63
 
64
- 1. Get a free [Groq API key](https://console.groq.com/keys) no credit card.
65
- 2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
64
+ 1. Pick a provider and get an API key (see table below).
65
+ 2. Add it as a secret: **Settings → Secrets → Actions → New secret**.
66
66
  3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
67
67
  4. Add this to `.github/workflows/pr-briefing.yml`:
68
68
 
@@ -80,11 +80,13 @@ jobs:
80
80
  steps:
81
81
  - uses: paramahastha/pr-context-engine@main
82
82
  with:
83
- groq-api-key: ${{ secrets.GROQ_API_KEY }}
83
+ groq-api-key: ${{ secrets.GROQ_API_KEY }} # default provider
84
84
  ```
85
85
 
86
86
  That's it. Every new PR gets a briefing comment automatically.
87
87
 
88
+ > **Using a different provider?** Set `llm-provider` to match your key — see [Switching LLM providers](#switching-llm-providers) below.
89
+
88
90
  ### Option B — CLI (any CI or local)
89
91
 
90
92
  ```bash
@@ -161,16 +163,44 @@ See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram an
161
163
 
162
164
  ## Switching LLM providers
163
165
 
164
- Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
166
+ | Provider | Secret name | `llm-provider` value | Notes |
167
+ |---|---|---|---|
168
+ | `groq` *(default)* | `GROQ_API_KEY` | `groq` | Free, ~1 000 req/day, fast |
169
+ | `gemini` | `GEMINI_API_KEY` | `gemini` | Free-tier, ~1 500 req/day |
170
+ | `anthropic` | `ANTHROPIC_API_KEY` | `anthropic` | BYO key, no free tier |
171
+ | `ollama` | — | `ollama` | Local, offline, no rate limits |
172
+
173
+ **You must set both `llm-provider` and the matching API key input.** Providing only the key without `llm-provider` will fail because the default provider is `groq`.
174
+
175
+ **GitHub Action examples:**
165
176
 
166
- | Provider | Key env var | Notes |
167
- |---|---|---|
168
- | `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
169
- | `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
170
- | `ollama` | — | Local, offline, no rate limits |
171
- | `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
177
+ ```yaml
178
+ # Gemini
179
+ - uses: paramahastha/pr-context-engine@main
180
+ with:
181
+ llm-provider: gemini
182
+ gemini-api-key: ${{ secrets.GEMINI_API_KEY }}
183
+
184
+ # Anthropic
185
+ - uses: paramahastha/pr-context-engine@main
186
+ with:
187
+ llm-provider: anthropic
188
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
189
+
190
+ # Ollama (self-hosted)
191
+ - uses: paramahastha/pr-context-engine@main
192
+ with:
193
+ llm-provider: ollama
194
+ ollama-base-url: http://my-ollama-host:11434
195
+ ```
196
+
197
+ **CLI / env var:**
198
+
199
+ ```bash
200
+ LLM_PROVIDER=gemini GEMINI_API_KEY=<key> pr-context-engine review --pr 42 --repo owner/name
201
+ ```
172
202
 
173
- **Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
203
+ **Automatic failover:** if `GEMINI_API_KEY` is set alongside any other provider, Gemini is used as a fallback on rate-limit errors. The PR comment footer shows which provider was actually used. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
174
204
 
175
205
  ## Fix suggestions (opt-in)
176
206
 
@@ -31,8 +31,8 @@ pr-context-engine quickstart # checks keys, scopes, prints what's missin
31
31
 
32
32
  ### Option A — GitHub Action (recommended)
33
33
 
34
- 1. Get a free [Groq API key](https://console.groq.com/keys) no credit card.
35
- 2. Add it as a secret: **Settings → Secrets → Actions → New secret** → `GROQ_API_KEY`.
34
+ 1. Pick a provider and get an API key (see table below).
35
+ 2. Add it as a secret: **Settings → Secrets → Actions → New secret**.
36
36
  3. Enable write permissions: **Settings → Actions → General → Workflow permissions → Read and write**.
37
37
  4. Add this to `.github/workflows/pr-briefing.yml`:
38
38
 
@@ -50,11 +50,13 @@ jobs:
50
50
  steps:
51
51
  - uses: paramahastha/pr-context-engine@main
52
52
  with:
53
- groq-api-key: ${{ secrets.GROQ_API_KEY }}
53
+ groq-api-key: ${{ secrets.GROQ_API_KEY }} # default provider
54
54
  ```
55
55
 
56
56
  That's it. Every new PR gets a briefing comment automatically.
57
57
 
58
+ > **Using a different provider?** Set `llm-provider` to match your key — see [Switching LLM providers](#switching-llm-providers) below.
59
+
58
60
  ### Option B — CLI (any CI or local)
59
61
 
60
62
  ```bash
@@ -131,16 +133,44 @@ See [docs/architecture.md](docs/architecture.md) for the full Mermaid diagram an
131
133
 
132
134
  ## Switching LLM providers
133
135
 
134
- Set `LLM_PROVIDER` to any of `groq` (default), `gemini`, `ollama`, or `anthropic`. Nothing downstream changes.
136
+ | Provider | Secret name | `llm-provider` value | Notes |
137
+ |---|---|---|---|
138
+ | `groq` *(default)* | `GROQ_API_KEY` | `groq` | Free, ~1 000 req/day, fast |
139
+ | `gemini` | `GEMINI_API_KEY` | `gemini` | Free-tier, ~1 500 req/day |
140
+ | `anthropic` | `ANTHROPIC_API_KEY` | `anthropic` | BYO key, no free tier |
141
+ | `ollama` | — | `ollama` | Local, offline, no rate limits |
142
+
143
+ **You must set both `llm-provider` and the matching API key input.** Providing only the key without `llm-provider` will fail because the default provider is `groq`.
144
+
145
+ **GitHub Action examples:**
135
146
 
136
- | Provider | Key env var | Notes |
137
- |---|---|---|
138
- | `groq` *(default)* | `GROQ_API_KEY` | Free, ~1 000 req/day, fast |
139
- | `gemini` | `GEMINI_API_KEY` | Free-tier fallback; auto-engaged on Groq 429 |
140
- | `ollama` | — | Local, offline, no rate limits |
141
- | `anthropic` | `ANTHROPIC_API_KEY` | BYO key, no free tier |
147
+ ```yaml
148
+ # Gemini
149
+ - uses: paramahastha/pr-context-engine@main
150
+ with:
151
+ llm-provider: gemini
152
+ gemini-api-key: ${{ secrets.GEMINI_API_KEY }}
153
+
154
+ # Anthropic
155
+ - uses: paramahastha/pr-context-engine@main
156
+ with:
157
+ llm-provider: anthropic
158
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
159
+
160
+ # Ollama (self-hosted)
161
+ - uses: paramahastha/pr-context-engine@main
162
+ with:
163
+ llm-provider: ollama
164
+ ollama-base-url: http://my-ollama-host:11434
165
+ ```
166
+
167
+ **CLI / env var:**
168
+
169
+ ```bash
170
+ LLM_PROVIDER=gemini GEMINI_API_KEY=<key> pr-context-engine review --pr 42 --repo owner/name
171
+ ```
142
172
 
143
- **Automatic failover:** if `GEMINI_API_KEY` is set, the tool fails over to Gemini on any Groq 429 or error and logs which provider was used in the PR comment footer. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
173
+ **Automatic failover:** if `GEMINI_API_KEY` is set alongside any other provider, Gemini is used as a fallback on rate-limit errors. The PR comment footer shows which provider was actually used. See [ADR-7](docs/design-decisions.md#adr-7-provider-failover-order-and-motivation).
144
174
 
145
175
  ## Fix suggestions (opt-in)
146
176
 
@@ -4,13 +4,27 @@ author: paramahastha
4
4
 
5
5
  inputs:
6
6
  groq-api-key:
7
- description: Groq API key (free at https://console.groq.com/keys). Required unless gemini-api-key is set.
7
+ description: Groq API key (free at https://console.groq.com/keys).
8
8
  required: false
9
9
  gemini-api-key:
10
- description: Google Gemini API key. Used as failover when Groq is rate-limited.
10
+ description: Google Gemini API key (https://aistudio.google.com/apikey).
11
11
  required: false
12
+ anthropic-api-key:
13
+ description: Anthropic API key. Required when llm-provider=anthropic.
14
+ required: false
15
+ ollama-base-url:
16
+ description: Ollama server URL. Used when llm-provider=ollama.
17
+ required: false
18
+ default: "http://localhost:11434"
19
+ ollama-model:
20
+ description: Ollama model name. Used when llm-provider=ollama.
21
+ required: false
22
+ default: "qwen2.5-coder:7b"
12
23
  github-token:
13
- description: GitHub token with pull-requests:write. Defaults to the built-in GITHUB_TOKEN.
24
+ description: >
25
+ GitHub token with pull-requests:write permission. Defaults to the built-in GITHUB_TOKEN.
26
+ Your calling workflow MUST include `permissions: pull-requests: write` or the comment
27
+ post will fail with a 401/403 error.
14
28
  required: false
15
29
  default: ${{ github.token }}
16
30
  enable-fixes:
@@ -18,7 +32,10 @@ inputs:
18
32
  required: false
19
33
  default: "false"
20
34
  llm-provider:
21
- description: Primary LLM provider — groq | gemini | ollama | anthropic (default groq).
35
+ description: >
36
+ Primary LLM provider — groq | gemini | anthropic | ollama (default: groq).
37
+ Must match the API key you provide: set gemini + gemini-api-key,
38
+ anthropic + anthropic-api-key, etc.
22
39
  required: false
23
40
  default: "groq"
24
41
 
@@ -53,9 +70,15 @@ runs:
53
70
  - name: Generate briefing
54
71
  shell: bash
55
72
  env:
56
- GITHUB_TOKEN: ${{ inputs.github-token }}
73
+ # Prefer the explicit input; fall back to the calling workflow's built-in token.
74
+ # The calling workflow MUST grant `permissions: pull-requests: write` for
75
+ # the token to have enough scope to post a comment.
76
+ GITHUB_TOKEN: ${{ inputs.github-token || github.token }}
57
77
  GROQ_API_KEY: ${{ inputs.groq-api-key }}
58
78
  GEMINI_API_KEY: ${{ inputs.gemini-api-key }}
79
+ ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }}
80
+ OLLAMA_BASE_URL: ${{ inputs.ollama-base-url }}
81
+ OLLAMA_MODEL: ${{ inputs.ollama-model }}
59
82
  LLM_PROVIDER: ${{ inputs.llm-provider }}
60
83
  ENABLE_FIXES: ${{ inputs.enable-fixes }}
61
84
  run: >
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "pr-context-engine"
3
- version = "0.1.2"
3
+ version = "0.1.3"
4
4
  description = "An AI tool that reads every PR and posts a senior-engineer-style briefing."
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.12"
@@ -11,6 +11,7 @@ from typing import TYPE_CHECKING
11
11
 
12
12
  import requests
13
13
  from github import Auth, Github
14
+ from github import GithubException
14
15
 
15
16
  from src.github_api import GITHUB_API_URL
16
17
 
@@ -53,8 +54,23 @@ def post_pr_comment(repo: str, pr_number: int, body: str, github_token: str) ->
53
54
  """
54
55
  logger.info("Posting comment to %s PR #%d", repo, pr_number)
55
56
  gh = Github(auth=Auth.Token(github_token))
56
- pull_request = gh.get_repo(repo).get_pull(pr_number)
57
- pull_request.create_issue_comment(body)
57
+ try:
58
+ pull_request = gh.get_repo(repo).get_pull(pr_number)
59
+ pull_request.create_issue_comment(body)
60
+ except GithubException as exc:
61
+ if exc.status == 401:
62
+ raise RuntimeError(
63
+ "GitHub returned 401 Bad credentials. "
64
+ "Ensure your workflow grants `permissions: pull-requests: write` "
65
+ "and that GITHUB_TOKEN (or github-token input) is a valid token."
66
+ ) from exc
67
+ if exc.status == 403:
68
+ raise RuntimeError(
69
+ "GitHub returned 403 Forbidden. "
70
+ "The token lacks `pull-requests: write` permission. "
71
+ "Add `permissions: pull-requests: write` to the calling workflow job."
72
+ ) from exc
73
+ raise
58
74
 
59
75
 
60
76
  def format_fix_section(suggestions: list[FixSuggestion], extra_count: int = 0) -> str:
@@ -1,481 +0,0 @@
1
- # PR Context Engine
2
-
3
- > An AI tool that reads every new pull request and posts a senior-engineer-style briefing as a comment: what actually changed, blast radius, risk flags, and three sharp review questions — with optional, confidence-gated fix suggestions. A `pipx`-installable CLI at its core, with a one-line GitHub Action wrapper. Runs in any CI or locally. Free to run.
4
-
5
- ---
6
-
7
- ## How to use this file with Claude Code
8
-
9
- 1. Create a new GitHub repo (public, free).
10
- 2. Drop this `PROJECT.md` at the root.
11
- 3. Open the repo in Claude Code and run:
12
-
13
- > "Read PROJECT.md and build the project. Start with Milestone 1 only. Stop after each milestone so I can review and test."
14
-
15
- 4. Work through milestones one at a time. Do not let Claude Code skip or reorder them — the value of this project is in the eval, the architecture, and the adoption layer, each of which only lands if the layers beneath it are solid. In particular: the CLI-core entrypoint (M1) and provider abstraction (M2) are deliberately early because retrofitting them later is expensive.
16
-
17
- ---
18
-
19
- ## Vision (one paragraph)
20
-
21
- Every PR opens with three problems for the reviewer: _what is this actually doing_, _what could it break_, and _what should I push back on_. A diff doesn't answer any of those — it just shows lines. PR Context Engine reads the diff plus surrounding code, recent history, and similar past PRs, then produces a terse briefing written like a senior backend engineer would write it. No praise, no filler, no "this LGTM" — just the context a reviewer needs and the questions worth asking.
22
-
23
- ## Non-goals
24
-
25
- - Not an auto-approver. Never approves or blocks PRs.
26
- - Not a linter. Style/format issues are out of scope — existing tools do that.
27
- - Not an auto-fixer. It _proposes_ fixes (opt-in, confidence-gated, Milestone 8); it never edits or commits. The human always applies.
28
- - Not a "the AI thinks your code is great" bot. If there's nothing risky, the briefing is short. A wrong fix is treated as worse than no fix.
29
-
30
- ---
31
-
32
- ## Architecture
33
-
34
- ```
35
- Front door A: Front door B:
36
- GitHub Action wrapper pipx install + run in any CI / locally
37
- (yourname/pr-context-engine@v1) (pr-context-engine review --pr 123)
38
- │ │
39
- └────────────┬────────────────────┘
40
-
41
- ┌─────────────────────────────────────┐
42
- │ CLI core (src/cli.py → orchestrator)│
43
- └─────────────────────────────────────┘
44
-
45
- ├──► analyzers/ (diff → semantic chunks, AST, risk score)
46
- ├──► context/ (git history, codebase index via sqlite-vec)
47
- ├──► briefing/ (prompt assembly → LLM call)
48
- ├──► fixes/ (opt-in: confidence-gated patch suggestions)
49
- ├──► llm/ (pluggable: Groq / Gemini / Anthropic / Ollama)
50
- └──► github/ (post comment + collapsed suggestion blocks)
51
- ```
52
-
53
- Design principle: **the CLI is the product; the GitHub Action is a thin wrapper around it.** No logic lives in the workflow YAML. This makes the tool testable locally, runnable in any CI (GitLab, CircleCI, Jenkins), and not hostage to GitHub — while still giving newcomers a one-line install.
54
-
55
- ### Why this stack
56
-
57
- | Decision | Choice | Reason |
58
- | ------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
59
- | Runtime | GitHub Actions (wrapping the CLI) | Free for public repos. No server to host. |
60
- | Distribution | `pipx`-installable CLI + published GitHub Action | CLI is the engine; Action is the easy on-ramp. Works in any CI, not just GitHub. |
61
- | CLI framework | `typer` | Minimal, type-hint-driven, auto-generates `--help`. |
62
- | Language | Python 3.12 | Best AST + LLM SDK ecosystem. |
63
- | Vector store | SQLite + `sqlite-vec` | File-based, commits with the repo, no external DB. |
64
- | LLM access | Pluggable provider interface (built early, Milestone 2) | This is the project's key design decision — see ADR below. Free LLM tiers are volatile; the abstraction is insurance, not polish. |
65
- | Default LLM (prod) | Groq (`llama-3.3-70b-versatile`) | Free tier, no credit card, ~1,000 req/day, strong code reasoning, very fast. |
66
- | Fallback LLM (prod) | Google Gemini (`gemini-2.5-flash`) | Free tier still generous (~1,500 req/day) but Google cut free quotas 50-80% in Dec 2025 with little warning — hence not the default. |
67
- | Default LLM (dev) | Ollama (`qwen2.5-coder:7b`) | Local, free, offline, no rate limits while iterating. |
68
- | Package manager | `uv` | Fast, modern, single-binary. |
69
-
70
- ### ADR-0: Why provider abstraction is built early, not last
71
-
72
- In December 2025, Google cut Gemini's free-tier rate limits by 50–80% overnight, and in April 2026 moved Pro models behind a paywall. Any portfolio project hard-wired to one free provider is one policy change away from a broken demo. Building the `LLMProvider` interface in Milestone 2 (not as a final flourish) means the entire project is provider-agnostic from the start, the eval harness can compare providers, and the live demo can fail over. Treating provider risk as an architectural concern — rather than an afterthought — is the single clearest senior-engineer signal in this project. This reasoning belongs verbatim in `docs/design-decisions.md`.
73
-
74
- ---
75
-
76
- ## Repo layout (target)
77
-
78
- ```
79
- pr-context-engine/
80
- ├── .github/
81
- │ ├── workflows/
82
- │ │ └── pr-review.yml
83
- │ ├── ISSUE_TEMPLATE/
84
- │ └── pull_request_template.md
85
- ├── action.yml # makes this a usable published GitHub Action
86
- ├── src/
87
- │ ├── __init__.py
88
- │ ├── cli.py # typer entrypoint — the product
89
- │ ├── config.py
90
- │ ├── llm/
91
- │ │ ├── __init__.py
92
- │ │ ├── base.py
93
- │ │ ├── groq_provider.py
94
- │ │ ├── gemini_provider.py
95
- │ │ ├── ollama_provider.py
96
- │ │ └── anthropic_provider.py
97
- │ ├── analyzers/
98
- │ │ ├── __init__.py
99
- │ │ ├── diff_parser.py
100
- │ │ ├── ast_walker.py
101
- │ │ └── risk_scorer.py
102
- │ ├── context/
103
- │ │ ├── __init__.py
104
- │ │ ├── git_history.py
105
- │ │ └── codebase_index.py
106
- │ ├── briefing/
107
- │ │ ├── __init__.py
108
- │ │ ├── prompt_templates.py
109
- │ │ └── generator.py
110
- │ ├── fixes/
111
- │ │ ├── __init__.py
112
- │ │ ├── fix_generator.py
113
- │ │ └── confidence.py
114
- │ └── github_api/
115
- │ ├── __init__.py
116
- │ └── comment_poster.py
117
- ├── tests/
118
- │ ├── unit/
119
- │ └── eval/
120
- │ ├── fixtures/ # captured real PRs (sanitized)
121
- │ ├── rubric.md
122
- │ └── test_briefings.py
123
- ├── docs/
124
- │ ├── architecture.md
125
- │ ├── design-decisions.md
126
- │ └── demo.gif
127
- ├── pyproject.toml # includes [project.scripts] for the CLI
128
- ├── LICENSE # MIT
129
- ├── CONTRIBUTING.md
130
- ├── CODE_OF_CONDUCT.md
131
- ├── CONFIG.md
132
- ├── CHANGELOG.md
133
- ├── README.md
134
- └── PROJECT.md (this file)
135
- ```
136
-
137
- ---
138
-
139
- ## Milestones (build in this order)
140
-
141
- ### Milestone 1 — End-to-end skeleton (target: 1 evening)
142
-
143
- Goal: prove the whole loop works with the dumbest possible logic.
144
-
145
- - [ ] Init `uv` project, set up `pyproject.toml` with deps: `requests`, `groq`, `pygithub`, `python-dotenv`.
146
- - [ ] `src/llm/base.py` — abstract `LLMProvider` class with `generate(prompt: str) -> str`.
147
- - [ ] `src/llm/groq_provider.py` — minimal Groq implementation using `llama-3.3-70b-versatile`.
148
- - [ ] `src/github_api/comment_poster.py` — function to post a comment to a PR via REST.
149
- - [ ] `src/cli.py` — a `typer` CLI with one command: `pr-context-engine review --pr <N> --repo <owner/name>`. Reads config from flags **or** env vars (CI-friendly). This is the entrypoint from day one — there is no separate `main.py` script. It fetches the diff, sends `"Summarize this diff in 3 bullets:\n\n{diff}"` to Groq, posts the result.
150
- - [ ] `pyproject.toml` — declare a `[project.scripts]` entry: `pr-context-engine = "src.cli:app"` so `pipx install` yields a working command.
151
- - [ ] `.github/workflows/pr-review.yml` — triggers on `pull_request: [opened, synchronize]`, and simply _calls the CLI_ (`uv run pr-context-engine review --pr ${{ github.event.pull_request.number }} --repo ${{ github.repository }}`). No logic in the YAML.
152
- - [ ] Open a test PR. Confirm comment appears. Also confirm `pipx install .` then running the command locally against a real PR works.
153
-
154
- **Definition of done:** a real PR gets a comment both (a) via the workflow and (b) by running the installed CLI locally. The CLI is the single entrypoint.
155
-
156
- ### Milestone 2 — Pluggable LLM providers (moved up — see ADR-0)
157
-
158
- Goal: lock in provider independence _before_ building everything else on top of one API. This is deliberately early.
159
-
160
- - [ ] Confirm `src/llm/base.py` interface is clean: one method, `generate(prompt: str) -> str`, no provider-specific types leaking out.
161
- - [ ] `src/llm/gemini_provider.py` — Gemini (`gemini-2.5-flash`) implementation.
162
- - [ ] `src/llm/ollama_provider.py` — local model support (`qwen2.5-coder:7b`), used for dev.
163
- - [ ] `src/llm/anthropic_provider.py` — Claude support (for BYO-key users).
164
- - [ ] `src/config.py` — reads `LLM_PROVIDER` env var (`groq` | `gemini` | `ollama` | `anthropic`) and instantiates the right provider. Default `groq`.
165
- - [ ] One unit test per provider using a mocked HTTP response — verify the interface contract, not the model.
166
- - [ ] README documents how to switch providers with one env var.
167
-
168
- **Definition of done:** the same PR can be briefed by Groq, Gemini, or local Ollama by changing one environment variable. Nothing downstream knows which provider is active.
169
-
170
- ### Milestone 3 — Real diff analysis
171
-
172
- Goal: stop sending raw diffs to the LLM. Send structured context.
173
-
174
- - [ ] `src/analyzers/diff_parser.py` — parse unified diff into `FileChange` objects: `path`, `language`, `added_lines`, `removed_lines`, `hunks`.
175
- - [ ] `src/analyzers/ast_walker.py` — for Python/JS/TS/Go files, extract the _names_ of changed functions and classes (use `ast` for Python, tree-sitter for others if time; otherwise regex fallback).
176
- - [ ] `src/analyzers/risk_scorer.py` — heuristic flags. Each flag is a **located issue object**, not a bare string: `{flag: str, file: str, line: int | None, snippet: str}`. (Milestone 8's fix generator depends on `file` + `line` being present — design this data shape now, even though fixes come later. A bare string list would force a painful refactor in M8.) Flags to detect:
177
- - file paths matching `migrations/`, `alembic/`, `*.sql` → `touches_migration`
178
- - keywords `auth`, `token`, `password`, `secret`, `permission` → `modifies_auth` (capture the line)
179
- - `.env*`, `config.*`, `*.yaml` at repo root → `changes_config`
180
- - deletions of top-level functions → `deletes_public_api` (capture the function name + line)
181
- Flags where a specific line genuinely doesn't apply (e.g. a whole-file config change) may set `line: None`; M8 will treat `line: None` flags as briefing-only, never fix-eligible.
182
- - [ ] Update the CLI orchestrator (`src/cli.py` and the module it calls) to assemble structured context before prompting.
183
-
184
- **Definition of done:** the prompt sent to the LLM is now a structured object, not raw diff text.
185
-
186
- ### Milestone 4 — Senior-voice prompt + structured output
187
-
188
- Goal: make the comment actually feel like a senior wrote it.
189
-
190
- - [ ] `src/briefing/prompt_templates.py` — system prompt below.
191
- - [ ] `src/briefing/generator.py` — assembles final prompt, calls LLM, parses response.
192
- - [ ] Briefing must follow this exact markdown structure:
193
-
194
- ```markdown
195
- ## 🤖 PR Briefing
196
-
197
- **What changed**
198
- <2-3 sentences, semantic not line-by-line>
199
-
200
- **Blast radius**
201
- <which callers, services, or contracts could break — omit if trivial>
202
-
203
- **Risk flags**
204
- <bullets, only present if risk_scorer found something>
205
-
206
- **Questions for the reviewer**
207
-
208
- 1. <sharp question>
209
- 2. <sharp question>
210
- 3. <sharp question>
211
-
212
- ---
213
-
214
- <sub>Generated by [PR Context Engine](https://github.com/YOUR_USERNAME/pr-context-engine). Not a substitute for human review.</sub>
215
- ```
216
-
217
- - [ ] Prompt must include the instruction: _"Be terse. No praise. No 'this looks good.' If a section has nothing meaningful to say, write 'None.' and move on."_
218
-
219
- **Definition of done:** the comments now read like a senior reviewer wrote them.
220
-
221
- ### Milestone 5 — Codebase index (RAG)
222
-
223
- Goal: pull in context from the rest of the repo, not just the diff.
224
-
225
- - [ ] `src/context/codebase_index.py` — uses `sqlite-vec`. On first run, walks the repo, chunks files by function/class, embeds, stores in `index.db`. Note: Groq has no embeddings API — use a local embedding model (`fastembed` with `BAAI/bge-small-en-v1.5`, runs in-process, no API, no key) so indexing stays provider-independent and free.
226
- - [ ] On subsequent runs, only re-embeds files whose git hash changed.
227
- - [ ] For each `FileChange`, query the index for top-5 semantically similar chunks elsewhere in the repo. Include these in the prompt as "related code."
228
- - [ ] `index.db` is cached across Action runs via `actions/cache`.
229
-
230
- **Definition of done:** the briefing references functions and patterns from elsewhere in the repo when relevant.
231
-
232
- ### Milestone 6 — Git history context
233
-
234
- Goal: use the past to inform the present.
235
-
236
- - [ ] `src/context/git_history.py` — for each touched file, fetch the last 5 commit messages that modified it. Include in prompt as "recent activity on these files."
237
- - [ ] Bonus: find the last 3 _merged_ PRs that touched any of the same files. Include their titles + first line of description.
238
- - [ ] Note the shallow-clone tradeoff: the workflow uses `fetch-depth: 50`, so on large/old repos history for rarely-touched files may be truncated. This is an accepted tradeoff (full clones are slow/expensive in CI). Degrade gracefully — if history is unavailable, say "limited history" rather than erroring. Document this in `docs/design-decisions.md` as a deliberate CI-cost-vs-completeness call.
239
-
240
- **Definition of done:** briefing can say things like "this is the third migration to `users` this month; previous one introduced [issue]."
241
-
242
- ### Milestone 7 — Provider failover & resilience
243
-
244
- Goal: turn the early provider abstraction (Milestone 2) into real resilience — the payoff for ADR-0.
245
-
246
- - [ ] `src/llm/__init__.py` — a `FailoverProvider` that wraps an ordered list of providers: try Groq, fall back to Gemini on rate-limit/error, fall back to a clear error comment if all fail.
247
- - [ ] Detect 429 / quota errors specifically and log which provider was used in the PR comment footer (e.g. "Generated by Groq" / "Generated by Gemini (Groq rate-limited)").
248
- - [ ] Add a unit test that simulates the primary provider 429-ing and asserts failover fires.
249
- - [ ] Document the failover order and the December 2025 Gemini-quota-cut anecdote in `docs/design-decisions.md` as the concrete motivation.
250
-
251
- **Definition of done:** kill the primary provider's key and the bot still posts a briefing via the fallback, noting which model it used.
252
-
253
- ### Milestone 8 — Suggested fixes (opt-in, guard-railed)
254
-
255
- Goal: go from "here's a problem" to "here's a fix you can apply in one click" — **without becoming a noisy, confidently-wrong bot.** The discipline here is the portfolio point. A bad version of this feature is worse than not having it; the guardrails below are not optional.
256
-
257
- **Dependencies (read before starting):** This milestone consumes the located-issue objects from Milestone 3's `risk_scorer` (the `file` + `line` fields). Only issues with a non-null `line` are fix-eligible; everything else stays briefing-only. This milestone also _extends_ the Milestone 4 system prompt — it must **add** a fix-format contract as an appended section, not rewrite the briefing prompt. The M4 briefing behavior must remain byte-for-byte unchanged when `ENABLE_FIXES=false`. Verify M4's eval still passes after this milestone.
258
-
259
- **Hard rules (build these as actual code constraints, not prompt suggestions):**
260
-
261
- - A fix is **only** generated when `risk_scorer` produced a located issue (`line` is not null) for a _concrete, specific_ problem. No fixes for vague observations or `line: None` flags.
262
- - Every suggestion is posted as a GitHub **suggestion block** (` ```suggestion `) inside a **collapsed `<details>`** section, so the diff isn't cluttered and the human opts in by expanding + clicking "Commit suggestion."
263
- - Every suggestion carries a one-line **rationale** and a **confidence label** (`high` / `medium` / `low`). `low` confidence suggestions are _described in prose only_ — no auto-applicable block. The model must never present a guess as a fix.
264
- - **Max 3 suggestions per PR.** If more issues exist, say "N more issues — see briefing" rather than flooding the diff.
265
- - The bot never edits; it only proposes. The human always commits.
266
-
267
- **Tasks:**
268
-
269
- - [ ] `src/fixes/fix_generator.py` — takes a flagged issue + surrounding code, asks the LLM for a minimal patch + rationale + self-assessed confidence. Returns structured output, not raw text.
270
- - [ ] `src/fixes/confidence.py` — gate logic: only `high`/`medium` become suggestion blocks; `low` becomes a prose note.
271
- - [ ] `src/github_api/comment_poster.py` — extend to post line-anchored suggestion blocks inside collapsed `<details>`.
272
- - [ ] Extend the system prompt with a strict fix-format contract, **appended as a separate section** so the Milestone 4 briefing prompt is untouched. Include the instruction: _"If you are not confident the patch is correct and complete, label it 'low' and do not produce a suggestion block. A wrong fix is worse than no fix."_ The fix section is only included in the prompt when `ENABLE_FIXES=true`.
273
- - [ ] Add a kill switch: `ENABLE_FIXES` env var (default `false`) so the feature is explicitly opt-in per repo.
274
-
275
- **Definition of done:** on a PR with a real, located bug, the bot posts a collapsed suggestion the maintainer can apply in one click — and on a PR where it's unsure, it says so in prose instead of guessing.
276
-
277
- ### Milestone 9 — Eval harness (the portfolio differentiator)
278
-
279
- This is what separates "AI side project" from "engineer who knows what they're doing." **Do not skip.** With Milestone 8 added, this milestone now also measures whether the _fixes are actually correct_ — which is the hardest and most credible thing to measure in the whole project.
280
-
281
- - [ ] `tests/eval/fixtures/` — 15-20 real PRs you've captured (diff + actual review comments, sanitized). Pull from open-source repos if your own are private. Include several with a _known, real bug_ so fix-correctness can be scored.
282
- - [ ] `tests/eval/rubric.md` — a scoring rubric. Briefing dimensions (0-3 each):
283
- 1. **Accuracy** — does the "what changed" actually describe the change?
284
- 2. **Blast radius** — does it identify real risk areas?
285
- 3. **Risk flags** — are flags present when they should be? false positives?
286
- 4. **Question quality** — would a senior reviewer actually ask these?
287
- 5. **Brevity** — is it terse, or does it pad?
288
- - Plus, for the fix feature: **Fix correctness** — does the patch actually resolve the flagged issue without breaking anything? (0 = wrong/harmful, 1 = partial, 2 = correct but non-minimal, 3 = correct and minimal). And **Calibration** — when the bot said `high` confidence, was it actually right? Track false-confidence rate explicitly; this number is the headline metric.
289
- - [ ] `tests/eval/test_briefings.py` — runs each fixture, generates briefing + fixes, uses an LLM-as-judge (different model from the one being evaluated) to score against the rubric. Where possible, run the suggested patch against the repo's tests to verify fix correctness empirically, not just by judge opinion.
290
- - [ ] Print a summary table. Commit historical scores so improvements are visible in git history.
291
-
292
- **Definition of done:** `pytest tests/eval/` produces a scorecard including a fix-correctness rate and a false-confidence rate. README shows both.
293
-
294
- ### Milestone 10 — Open-source readiness (the adoption layer)
295
-
296
- Goal: make a stranger able to install and trust this in 5 minutes with zero prior context. A tool nobody can install is just a private script. These items are what separate "starred and forgotten" from "actually used."
297
-
298
- **The 5-minute install path (must work exactly as written in the README):**
299
-
300
- - [ ] Publish the GitHub Action: add `action.yml` at repo root so others can use `uses: YOUR_USERNAME/pr-context-engine@v1` with just a `GROQ_API_KEY` secret. This is the newcomer on-ramp.
301
- - [ ] Publish the CLI to PyPI so `pipx install pr-context-engine` works for the power-user / other-CI path. (Test on TestPyPI first.)
302
- - [ ] A `quickstart` CLI command that interactively checks: is a provider key set? does the GitHub token have the right scope? — and prints exactly what's missing. First-run failure is the #1 reason OSS tools get abandoned; this catches it.
303
- - [ ] `--dry-run` flag: generate the briefing and print it to stdout _without_ posting. Lets a new user see value before granting write access. Critical for trust.
304
-
305
- **Trust & safety for adopters:**
306
-
307
- - [ ] A clear, prominent statement in the README: what data leaves their machine, which provider sees their code, and that free-tier providers may train on inputs (link the design-decisions ADR). Engineers will not adopt a code tool that's vague about this.
308
- - [ ] `CONFIG.md` documenting every env var / flag, defaults, and a minimal vs. full example.
309
- - [ ] Sensible defaults so the tool is useful with _zero_ config beyond one API key. Every required-config item is an adoption tax.
310
-
311
- **Project hygiene (signals the project is alive and safe to depend on):**
312
-
313
- - [ ] `LICENSE` — MIT (most permissive, lowest adoption friction; state this choice in an ADR).
314
- - [ ] `CONTRIBUTING.md` — how to set up dev env, run tests, the milestone philosophy.
315
- - [ ] `CODE_OF_CONDUCT.md` — standard Contributor Covenant.
316
- - [ ] Issue + PR templates in `.github/`.
317
- - [ ] CI badge, PyPI version badge, license badge in README.
318
- - [ ] A `CHANGELOG.md` and real semver git tags (`v0.1.0` …). Dependents need to pin versions.
319
- - [ ] Dogfood it: the repo runs its own Action on its own PRs. The best possible demo is the tool reviewing its own development.
320
-
321
- **Definition of done:** a fresh machine with only `pipx` can install the tool and get a `--dry-run` briefing on a public PR using only a free Groq key, following only the README, in under 5 minutes — verified by actually doing it (or having someone else do it).
322
-
323
- ### Milestone 11 — Portfolio polish
324
-
325
- - [ ] `docs/architecture.md` — the architecture section above, expanded, with a Mermaid diagram.
326
- - [ ] `docs/design-decisions.md` — short ADRs for the top choices: why SQLite over Pinecone, why Actions over a server, why Python, why fixes are opt-in and confidence-gated, why CLI-core over Action-only, why MIT. Each ADR shows you understood a tradeoff.
327
- - [ ] `docs/demo.gif` — record a real PR getting briefed _and_ a suggestion being one-click applied. Embed in README.
328
- - [ ] README sections in order: Demo GIF → What it does → 5-minute Quickstart → Live example → Architecture diagram → Eval results (incl. fix-correctness + calibration) → Data & privacy → Design decisions → Cost analysis → Contributing.
329
- - [ ] Pin the repo on your GitHub profile.
330
-
331
- ---
332
-
333
- ## The system prompt (Milestone 4)
334
-
335
- Paste this verbatim into `prompt_templates.py`:
336
-
337
- ```
338
- You are a senior backend engineer reviewing a pull request. You have 90 seconds.
339
- Your job is to brief the human reviewer so they can review effectively.
340
-
341
- You will receive:
342
- - A list of changed files with parsed function/class names
343
- - Risk flags detected by static heuristics
344
- - Recent commit history on touched files
345
- - Semantically related code from elsewhere in the repo
346
-
347
- Produce a briefing with exactly four sections:
348
-
349
- 1. WHAT CHANGED — 2-3 sentences. Describe the *intent* of the change, not the
350
- lines. Do not list files. If you can't tell the intent, say so.
351
-
352
- 2. BLAST RADIUS — Which callers, services, contracts, or data could break?
353
- Be specific. If the change is internal and self-contained, write "Self-contained."
354
-
355
- 3. RISK FLAGS — Bullet list. Only include flags that are actually present.
356
- If none, write "None."
357
-
358
- 4. QUESTIONS — Exactly three questions a senior reviewer would ask before
359
- approving. Questions must be answerable and specific. Bad question:
360
- "Did you test this?" Good question: "The new retry loop in fetch_user
361
- has no backoff — is that intentional given this is called per-request?"
362
-
363
- Rules:
364
- - Be terse. Aim for under 200 words total.
365
- - No praise. No "this looks good." No emojis except the section icons.
366
- - If the PR is trivial (typo fix, doc change), say so in one line and skip
367
- the other sections.
368
- - Never speculate about things you can't see. If you don't have the context,
369
- say "Cannot tell from diff."
370
- ```
371
-
372
- ---
373
-
374
- ## GitHub Actions workflow (Milestone 1)
375
-
376
- `.github/workflows/pr-review.yml`:
377
-
378
- ```yaml
379
- name: PR Context Briefing
380
-
381
- on:
382
- pull_request:
383
- types: [opened, synchronize, reopened]
384
-
385
- jobs:
386
- brief:
387
- runs-on: ubuntu-latest
388
- permissions:
389
- pull-requests: write
390
- contents: read
391
- steps:
392
- - uses: actions/checkout@v4
393
- with:
394
- fetch-depth: 50 # tradeoff: enough history for most files, fast clone; see Milestone 6 note
395
-
396
- - uses: actions/setup-python@v5
397
- with:
398
- python-version: "3.12"
399
-
400
- - name: Install uv
401
- run: pip install uv
402
-
403
- - name: Restore index cache
404
- uses: actions/cache@v4
405
- with:
406
- path: index.db
407
- key: pr-engine-index-${{ github.event.pull_request.base.sha }}
408
- restore-keys: pr-engine-index-
409
-
410
- - name: Install dependencies
411
- run: uv sync
412
-
413
- - name: Generate briefing
414
- env:
415
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
416
- LLM_PROVIDER: groq
417
- GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
418
- GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} # fallback (Milestone 7)
419
- run: >
420
- uv run pr-context-engine review
421
- --pr ${{ github.event.pull_request.number }}
422
- --repo ${{ github.repository }}
423
- ```
424
-
425
- Note: the workflow contains zero logic — it just invokes the CLI. That's the point of the CLI-core design (see ADR). Once `action.yml` is published (Milestone 10), other repos skip even this and use a single `uses:` line.
426
-
427
- ---
428
-
429
- ## Setup checklist (one-time, by you)
430
-
431
- - [ ] Create a new public repo on GitHub.
432
- - [ ] Get a free Groq API key at https://console.groq.com/keys (no credit card).
433
- - [ ] (Optional, for failover) Get a free Gemini key at https://aistudio.google.com/apikey.
434
- - [ ] In repo Settings → Secrets and variables → Actions → add `GROQ_API_KEY` (and `GEMINI_API_KEY` if using failover).
435
- - [ ] In repo Settings → Actions → General → enable "Read and write permissions" for `GITHUB_TOKEN`.
436
- - [ ] Drop this file in. Open Claude Code. Tell it to start Milestone 1.
437
-
438
- ---
439
-
440
- ## Rules for Claude Code while building this
441
-
442
- - **One milestone per session.** Stop and let the human test before moving on.
443
- - **No new dependencies without asking.** Justify each addition.
444
- - **Every module gets a docstring** explaining its single responsibility.
445
- - **Unit tests for `analyzers/` and `context/`** — pure functions, easy to test.
446
- - **No try/except: pass.** If something fails, fail loudly with context.
447
- - **Type hints on every public function.**
448
- - **No print statements in `src/`** — use `logging`.
449
- - **Commit message format:** `feat(milestone-N): description` or `fix(milestone-N): description`.
450
-
451
- ---
452
-
453
- ## Cost expectation
454
-
455
- - GitHub Actions: $0 (public repo, well under 2000-min/month free tier).
456
- - Groq free tier: $0 (~1,000 requests/day — you'd need 1,000 PRs in a day to exhaust it).
457
- - Local embeddings (`fastembed`): $0, no API.
458
- - Gemini fallback: $0 (only hit if Groq is rate-limited).
459
- - Total: $0/month for a portfolio-scale project.
460
-
461
- Free LLM tiers change without warning (Google cut Gemini's by 50–80% in Dec 2025). The Milestone 7 failover design means a single provider's policy change degrades gracefully instead of breaking. And because the tool is BYO-key, anyone who adopts it runs on their _own_ free Groq key at their own $0 — there's no shared backend you pay for as it gets popular. The project's cost does not scale with its adoption, which is exactly what you want for an OSS tool.
462
-
463
- ---
464
-
465
- ## README guidance
466
-
467
- The authoritative README structure lives in **Milestone 11** (it accounts for the fix feature, eval/calibration results, the data & privacy section, and the 5-minute quickstart). Do not use an older/shorter structure — build the README from the Milestone 11 checklist.
468
-
469
- The one-line pitch to lead with: _"An AI tool that reads every PR and writes the briefing — and the fixes — a senior engineer would, with the calibration data to prove it's not just guessing."_
470
-
471
- ---
472
-
473
- ## Why this project works as a portfolio piece
474
-
475
- It checks all three boxes a senior backend role looks for:
476
-
477
- - **AI depth** — RAG over a real codebase (M5), context engineering (M3–M6), and an eval harness with an LLM-as-judge and empirical fix verification (M9).
478
- - **Product thinking** — the terse-senior-reviewer voice, the no-praise prompt rules, opt-in confidence-gated fixes (M8), and a deliberate 5-minute install path (M10).
479
- - **Systems design** — a CLI-core with two front doors, provider abstraction built early (M2), and real failover (M7).
480
-
481
- The two hardest-to-fake signals: the fix feature _plus its calibration metric_ (M8 + M9) shows you can ship a risky capability and the discipline to measure and bound it; and real adoption (M10) — strangers installing and depending on it — is something no résumé bullet can manufacture. Build it in order; the early decisions (CLI-core, provider abstraction, located-issue data shape) exist specifically so the later milestones don't require painful refactors.