@aarushpandey/gitagent 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +104 -0
- package/LICENSE +21 -0
- package/README.md +570 -0
- package/TESTING.md +290 -0
- package/action.yml +113 -0
- package/examples/README.md +124 -0
- package/examples/sample-audit-trail-issue-4.md +112 -0
- package/examples/sample-review-tqec-pr894-v1-raw-flawed.md +71 -0
- package/examples/sample-review-tqec-pr894-v2-raw.md +48 -0
- package/examples/sample-review-tqec-pr894-v3-curated.md +118 -0
- package/examples/verify-marker-precedence/README.md +97 -0
- package/examples/verify-marker-precedence/conftest.py +15 -0
- package/examples/verify-marker-precedence/pyproject.toml +8 -0
- package/examples/verify-marker-precedence/test_marker_precedence.py +56 -0
- package/examples/verify-marker-precedence/verify_precedence.py +67 -0
- package/examples/workflows/issue-fix.yml +32 -0
- package/examples/workflows/pr-review.yml +34 -0
- package/package.json +75 -0
- package/scripts/verify.js +478 -0
- package/src/agents/agentLoop.js +176 -0
- package/src/agents/engineeringAgent.js +51 -0
- package/src/agents/reviewCopilot.js +79 -0
- package/src/agents/tools.js +486 -0
- package/src/cli/output.js +137 -0
- package/src/config.js +22 -0
- package/src/mapper/fileRelevance.js +113 -0
- package/src/mapper/repoMap.js +105 -0
- package/src/orchestrator.js +336 -0
- package/src/pipeline.js +985 -0
- package/src/prompts/engineering.js +189 -0
- package/src/prompts/review.js +149 -0
- package/src/utils/cost.js +47 -0
- package/src/utils/diffLines.js +67 -0
- package/src/utils/githubUrl.js +8 -0
- package/src/web/public/index.html +128 -0
- package/src/web/server.js +51 -0
package/README.md
ADDED
|
@@ -0,0 +1,570 @@
|
|
|
1
|
+
<h1 align="center">
|
|
2
|
+
<br>
|
|
3
|
+
π€ github-agent
|
|
4
|
+
<br>
|
|
5
|
+
</h1>
|
|
6
|
+
|
|
7
|
+
<h3 align="center">An AI that ships pull requests β and reviews its own work before opening them.</h3>
|
|
8
|
+
|
|
9
|
+
<p align="center">
|
|
10
|
+
<a href="#-quick-start">Quick Start</a> β’
|
|
11
|
+
<a href="#-what-makes-this-different">Why github-agent</a> β’
|
|
12
|
+
<a href="#-built-for-big-open-source-projects">Big Projects</a> β’
|
|
13
|
+
<a href="#οΈ-architecture">Architecture</a> β’
|
|
14
|
+
<a href="#οΈ-safety-guardrails">Safety</a> β’
|
|
15
|
+
<a href="#οΈ-roadmap">Roadmap</a>
|
|
16
|
+
</p>
|
|
17
|
+
|
|
18
|
+
<p align="center">
|
|
19
|
+
<img src="https://img.shields.io/badge/model-Claude%20Sonnet%204.6-blueviolet?style=flat-square&logo=anthropic" alt="Claude Sonnet 4.6">
|
|
20
|
+
<img src="https://img.shields.io/badge/tests-146%20passing-brightgreen?style=flat-square" alt="146 tests passing">
|
|
21
|
+
<img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen?style=flat-square&logo=node.js" alt="Node 18+">
|
|
22
|
+
<img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="MIT License">
|
|
23
|
+
<img src="https://img.shields.io/badge/CI-Linux%20%7C%20macOS%20%7C%20Windows-555?style=flat-square&logo=githubactions" alt="CI matrix">
|
|
24
|
+
</p>
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
`github-agent` is an **autonomous engineering pipeline** built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, **has a second AI instance review the diff**, refuses to ship a PR that fails its own review, and opens a pull request β all in one command.
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
node src/pipeline.js issue https://github.com/your/repo/issues/42
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## β¨ See it in action
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
$ node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
|
|
40
|
+
|
|
41
|
+
ββββββββββββββββββββββββββββββββββββββββββββββ
|
|
42
|
+
β github-agent β autonomous PR engineer β
|
|
43
|
+
β engineering β self-review β ship β
|
|
44
|
+
ββββββββββββββββββββββββββββββββββββββββββββββ
|
|
45
|
+
|
|
46
|
+
βΈ Issue qiskit/qiskit#9421
|
|
47
|
+
title: Transpiler drops global phase on conditional gates
|
|
48
|
+
default branch: main
|
|
49
|
+
|
|
50
|
+
βΈ Cloning + branching
|
|
51
|
+
β branch: fix/issue-9421
|
|
52
|
+
test command: tox
|
|
53
|
+
lint commands: ruff check ., black --check ., mypy .
|
|
54
|
+
monorepo sub-packages: terra, aer, ibmq
|
|
55
|
+
guessed sub-package for issue: terra
|
|
56
|
+
CONTRIBUTING.md found at CONTRIBUTING.md
|
|
57
|
+
Project requires DCO Signed-off-by β will auto-sign commits.
|
|
58
|
+
20 file(s) prefiltered as likely relevant
|
|
59
|
+
pre-fix HEAD: 3f4a1b2
|
|
60
|
+
|
|
61
|
+
βΈ Engineering agent β autonomous fix loop
|
|
62
|
+
π [turn 1] Scoring the shortlist β transpiler/passes/optimization looks like the hit.
|
|
63
|
+
π§ find_relevant_files(query="transpiler global phase conditional gates")
|
|
64
|
+
π§ read_file(qiskit/transpiler/passes/optimization/consolidate_blocks.py)
|
|
65
|
+
π [turn 2] Found it β line 142 drops .global_phase on IfElseOp. Patching.
|
|
66
|
+
π§ apply_patch(qiskit/transpiler/passes/optimization/consolidate_blocks.py, ...)
|
|
67
|
+
π§ run_tests(tox) β PASS
|
|
68
|
+
π§ run_lint(ruff check .) β PASS
|
|
69
|
+
π§ run_lint(mypy .) β PASS
|
|
70
|
+
π§ finish({"pr_summary":"Preserve global_phase through IfElseOp consolidation..."})
|
|
71
|
+
β Agent finished after 6 turn(s)
|
|
72
|
+
|
|
73
|
+
βΈ Self-review β auditing the diff
|
|
74
|
+
β Review verdict: APPROVE
|
|
75
|
+
|
|
76
|
+
Token usage (engineering + revision)
|
|
77
|
+
input: 18,204 tok Β· output: 2,131 tok Β· cache_read: 14,067 tok
|
|
78
|
+
cost: $0.4912
|
|
79
|
+
|
|
80
|
+
βΈ Committing + pushing
|
|
81
|
+
β added DCO Signed-off-by trailer
|
|
82
|
+
β pushed fix/issue-9421 to Hadar01/qiskit
|
|
83
|
+
|
|
84
|
+
βΈ Opening pull request
|
|
85
|
+
β PR opened: https://github.com/qiskit/qiskit/pull/11504
|
|
86
|
+
β commented on issue: https://github.com/qiskit/qiskit/issues/9421#issuecomment-...
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## π What makes this different
|
|
92
|
+
|
|
93
|
+
Most AI coding tools **generate code and hand it to a human.** `github-agent` **ships it** β and audits itself first, refuses to ship bad work, and handles OSS repos you don't own.
|
|
94
|
+
|
|
95
|
+
| | Copilot / Cursor | Devin / SWE-agent | **github-agent** |
|
|
96
|
+
|---|:---:|:---:|:---:|
|
|
97
|
+
| Generates code | β
| β
| β
|
|
|
98
|
+
| Runs tests autonomously | β | β
| β
|
|
|
99
|
+
| Runs project linters autonomously | β | partial | β
|
|
|
100
|
+
| Opens the PR for you | β | β
| β
|
|
|
101
|
+
| **Reviews its own diff before shipping** | β | β | β
|
|
|
102
|
+
| **Refuses to ship on bad self-review** | β | β | β
|
|
|
103
|
+
| **Revises based on its own review** | β | β | β
|
|
|
104
|
+
| Knows when to give up | β | β | β
|
|
|
105
|
+
| Works on repos you don't own (fork + PR) | β | β | β
|
|
|
106
|
+
| Human-readable audit trail in PR body | β | partial | β
|
|
|
107
|
+
| Cost estimate + kill switch per run | β | β | β
|
|
|
108
|
+
|
|
109
|
+
### The self-review loop β the killer feature
|
|
110
|
+
|
|
111
|
+
A **second Claude instance**, with a completely fresh context and a different system prompt, audits the diff for:
|
|
112
|
+
|
|
113
|
+
- π **Bug risk** β logic errors, off-by-ones, null dereferences, drift from the original issue intent
|
|
114
|
+
- π² **Edge cases** β inputs the engineering agent didn't consider
|
|
115
|
+
- π§ͺ **Test coverage** β is the change actually tested?
|
|
116
|
+
- π― **Scope creep** β did the agent touch things it shouldn't?
|
|
117
|
+
|
|
118
|
+
Verdict is one of `APPROVE` / `REQUEST_CHANGES` / `NEEDS_DISCUSSION`. On `REQUEST_CHANGES` the engineering agent does a **revision pass** with the review as input. On anything that isn't `APPROVE`, **the pipeline refuses to open the PR** β you have to pass `--force-pr` to override. No silent bad PRs.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## π¬ Built for big open-source projects
|
|
123
|
+
|
|
124
|
+
Working on a 50-file toy repo is easy. Working on Qiskit, Cirq, VIO is not. `github-agent` has specific affordances for large scientific-Python-class codebases:
|
|
125
|
+
|
|
126
|
+
| Problem on a Qiskit-scale repo | What github-agent does |
|
|
127
|
+
|---|---|
|
|
128
|
+
| Thousands of files β context blows up | **Keyword relevance prefilter** scores every file against issue text; top-20 injected as starting hint. No embeddings API needed. |
|
|
129
|
+
| Narrow language support misses `.pyx`/`.pxd`/`.pyi`/`.rst`/config | Walks all of them, plus `Makefile`, `tox.ini`, `noxfile.py`, `CONTRIBUTING.md`, PR templates. |
|
|
130
|
+
| Monorepos with sub-packages (`qiskit-terra`, `qiskit-aer`, β¦) | **Auto-detects sub-packages**, guesses from issue text which one the change belongs to, tells the agent. |
|
|
131
|
+
| Test command isn't bare `pytest` β it's `tox`, `nox`, `make test` | Priority-ordered detection: Makefile `test:` target β `make test`. `tox.ini` β `tox`. `noxfile.py` β `nox`. Then Python/Node/Rust. |
|
|
132
|
+
| CI gates on `ruff`, `black`, `mypy` β not just tests | **Lint gate**: auto-detects configured linters and the agent must pass them all before `finish()`. |
|
|
133
|
+
| Deeply-indented Python makes `apply_patch` brittle | **Whitespace-normalized fallback** + `apply_patch_range` (replace by line numbers) when strings won't disambiguate. |
|
|
134
|
+
| DCO sign-off / PR templates / CONTRIBUTING.md rules | All read and honored. `Signed-off-by:` trailer appended automatically. PR template preserved at top of PR body. |
|
|
135
|
+
| Scientific deps fail to install (BLAS/CUDA/compiled extensions) | `run_tests` detects `ModuleNotFoundError`/`ImportError` and flags `env_error:true`. The agent **gives up gracefully** instead of thrashing. |
|
|
136
|
+
| Complex issues need human judgment | The agent can call `give_up({reason, explanation, blockers})`. With `--comment` it posts the reason on the issue so a human picks up with full context. |
|
|
137
|
+
| Duplicate runs open duplicate PRs | **Duplicate-PR guard** β scans open PRs for `Resolves/Fixes/Closes #N` or matching `fix/issue-N` branch before cloning. |
|
|
138
|
+
|
|
139
|
+
> π **Honest limitation:** we don't provision test environments. If a repo needs GPU / BLAS / conda, you'll want to run the agent inside a pre-warmed Docker image. That executor is on the roadmap.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## π§ββοΈ For maintainers wary of AI-generated PR noise
|
|
144
|
+
|
|
145
|
+
If you maintain a repo and you're (rightly) sceptical about AI tools dumping
|
|
146
|
+
generic *"consider error handling"* comments into your PR threads β read this.
|
|
147
|
+
|
|
148
|
+
**The `review` subcommand is offline by default.**
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
node src/pipeline.js review https://github.com/your-repo/pull/123
|
|
152
|
+
# β writes review-report.md to disk; never posts anywhere
|
|
153
|
+
# β exits 1 on REQUEST_CHANGES, 2 on NEEDS_DISCUSSION/UNKNOWN
|
|
154
|
+
# β exits 0 only on APPROVE
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Posting to the PR requires an explicit `--post` flag. The default workflow is:
|
|
158
|
+
|
|
159
|
+
1. Run `review` offline on a PR you'd otherwise review by hand.
|
|
160
|
+
2. Read `review-report.md`. Cut anything speculative.
|
|
161
|
+
3. **Manually** decide whether the curated output is worth pasting into the
|
|
162
|
+
thread. If not, throw it away β nothing was posted, no noise added.
|
|
163
|
+
|
|
164
|
+
Bug-risk findings must cite `file:line`. The verdict prompt biases toward
|
|
165
|
+
`NEEDS_DISCUSSION` rather than rubber-stamping `APPROVE`. The exit-code-on-
|
|
166
|
+
verdict design makes it CI-gateable as a *"block merge until a human
|
|
167
|
+
acknowledges the bot's concerns"* check, without ever opening a PR comment.
|
|
168
|
+
|
|
169
|
+
When you *do* post (`--post`), findings land as **inline comments anchored to
|
|
170
|
+
the exact diff line** β not one wall-of-text blob. Each finding's `(file, line)`
|
|
171
|
+
is validated against the PR's diff hunks before posting, so a hallucinated line
|
|
172
|
+
number can never 422 the whole review; anything that won't anchor is folded into
|
|
173
|
+
the review summary with its `file:line` instead of being dropped.
|
|
174
|
+
|
|
175
|
+
See [`examples/`](examples/) for sample artifacts produced by real runs.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
## π€ Contributing to repos you don't own
|
|
180
|
+
|
|
181
|
+
You can run `github-agent` on any public open-source project, even without write access. A `public_repo`-scoped PAT is enough.
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
# Fork-and-PR: pushes to your own fork, opens PR upstream, links back to the issue.
|
|
185
|
+
node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
|
|
186
|
+
|
|
187
|
+
# Review a PR in a project you're not a maintainer of.
|
|
188
|
+
# --post submits the review as a PR comment (falls back to issue comment if permissions block).
|
|
189
|
+
node src/pipeline.js review https://github.com/qiskit/qiskit/pull/11504 --post
|
|
190
|
+
|
|
191
|
+
# Triage multiple issues in one shot.
|
|
192
|
+
node src/pipeline.js triage https://github.com/qiskit/qiskit --label=bug --max=5 --fork --comment
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
The review subcommand **exits non-zero** on `REQUEST_CHANGES` so you can wire it straight into CI as a pre-merge gate.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## π Quick start
|
|
200
|
+
|
|
201
|
+
### Prerequisites
|
|
202
|
+
|
|
203
|
+
- Node.js 18+
|
|
204
|
+
- An [Anthropic API key](https://console.anthropic.com/)
|
|
205
|
+
- A [GitHub Personal Access Token](https://github.com/settings/tokens) β `public_repo` for OSS work, `repo` for private repos
|
|
206
|
+
|
|
207
|
+
### Install
|
|
208
|
+
|
|
209
|
+
**No clone β just run it** (published on npm as [`@aarushpandey/gitagent`](https://www.npmjs.com/package/@aarushpandey/gitagent)):
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# one-off, no install:
|
|
213
|
+
ANTHROPIC_API_KEY=sk-ant-... GITHUB_TOKEN=ghp_... \
|
|
214
|
+
npx @aarushpandey/gitagent review https://github.com/your/repo/pull/123
|
|
215
|
+
|
|
216
|
+
# or install the `github-agent` command globally:
|
|
217
|
+
npm install -g @aarushpandey/gitagent
|
|
218
|
+
github-agent review https://github.com/your/repo/pull/123
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
> The npm package is named `@aarushpandey/gitagent`; the command it installs is `github-agent`.
|
|
222
|
+
|
|
223
|
+
**Or clone for development:**
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
git clone https://github.com/Hadar01/github-agents.git
|
|
227
|
+
cd github-agents
|
|
228
|
+
npm install
|
|
229
|
+
cp .env.example .env
|
|
230
|
+
# edit .env:
|
|
231
|
+
# ANTHROPIC_API_KEY=sk-ant-...
|
|
232
|
+
# GITHUB_TOKEN=ghp_...
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
### Your first run
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
# Dry run first β full pipeline, no commits/push/PR
|
|
239
|
+
node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run
|
|
240
|
+
|
|
241
|
+
# Ship it for real
|
|
242
|
+
node src/pipeline.js issue https://github.com/your/repo/issues/42
|
|
243
|
+
|
|
244
|
+
# Review an existing PR (no editing β just the audit)
|
|
245
|
+
node src/pipeline.js review https://github.com/your/repo/pull/123
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
Or use the npm shorthand scripts:
|
|
249
|
+
|
|
250
|
+
```bash
|
|
251
|
+
npm run issue -- https://github.com/your/repo/issues/42
|
|
252
|
+
npm run review -- https://github.com/your/repo/pull/123
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## β‘ Run it in CI β the GitHub Action
|
|
258
|
+
|
|
259
|
+
The fastest way to get a whole team using this: don't make anyone install
|
|
260
|
+
anything. Drop a workflow into your repo and `github-agent` reviews every PR
|
|
261
|
+
(and can auto-fix labeled issues) on GitHub's runners. No clone, no `.env` β
|
|
262
|
+
just one secret.
|
|
263
|
+
|
|
264
|
+
**Auto-review every PR** (`.github/workflows/pr-review.yml`):
|
|
265
|
+
|
|
266
|
+
```yaml
|
|
267
|
+
name: PR review (github-agent)
|
|
268
|
+
on:
|
|
269
|
+
pull_request_target:
|
|
270
|
+
types: [opened, synchronize, reopened]
|
|
271
|
+
permissions:
|
|
272
|
+
contents: read
|
|
273
|
+
pull-requests: write
|
|
274
|
+
jobs:
|
|
275
|
+
review:
|
|
276
|
+
runs-on: ubuntu-latest
|
|
277
|
+
steps:
|
|
278
|
+
- uses: Hadar01/github-agents@v1
|
|
279
|
+
with:
|
|
280
|
+
command: review
|
|
281
|
+
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
282
|
+
# Advisory by default β posts findings, never blocks merge.
|
|
283
|
+
# Set 'true' to make REQUEST_CHANGES fail the check (a merge gate).
|
|
284
|
+
fail-on-request-changes: 'false'
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
**Auto-fix labeled issues** β apply the `agent-fix` label and it opens a PR
|
|
288
|
+
(`.github/workflows/issue-fix.yml`):
|
|
289
|
+
|
|
290
|
+
```yaml
|
|
291
|
+
on:
|
|
292
|
+
issues:
|
|
293
|
+
types: [labeled]
|
|
294
|
+
permissions: { contents: write, issues: write, pull-requests: write }
|
|
295
|
+
jobs:
|
|
296
|
+
fix:
|
|
297
|
+
if: github.event.label.name == 'agent-fix'
|
|
298
|
+
runs-on: ubuntu-latest
|
|
299
|
+
steps:
|
|
300
|
+
- uses: Hadar01/github-agents@v1
|
|
301
|
+
with:
|
|
302
|
+
command: issue
|
|
303
|
+
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
304
|
+
comment: 'true'
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
Add one secret (**Settings β Secrets β Actions β `ANTHROPIC_API_KEY`**); the
|
|
308
|
+
built-in `GITHUB_TOKEN` handles the rest. Ready-to-copy files live in
|
|
309
|
+
[`examples/workflows/`](examples/workflows/).
|
|
310
|
+
|
|
311
|
+
| Action input | Default | Effect |
|
|
312
|
+
|---|---|---|
|
|
313
|
+
| `command` | β | `review`, `issue`, or `triage`. |
|
|
314
|
+
| `target` | event URL | PR/issue/repo URL. Auto-derived from the trigger if omitted. |
|
|
315
|
+
| `anthropic-api-key` | β | **Required.** Store as a repo/org secret. |
|
|
316
|
+
| `github-token` | `${{ github.token }}` | Token for GitHub API calls. |
|
|
317
|
+
| `post` | `true` | (review) Post the review back to the PR. |
|
|
318
|
+
| `fail-on-request-changes` | `false` | (review) `true` = block merge on a bad verdict; `false` = advisory. |
|
|
319
|
+
| `comment` | `false` | (issue) Link the PR back on the source issue. |
|
|
320
|
+
| `fork` | `false` | (issue) Push to your fork and PR from there. |
|
|
321
|
+
| `max-cost` | project default | USD ceiling for the run. |
|
|
322
|
+
|
|
323
|
+
The job exposes the verdict as an output (`steps.<id>.outputs.verdict`) and
|
|
324
|
+
renders it in the job summary, so you can branch on it in later steps.
|
|
325
|
+
|
|
326
|
+
> π **Why `pull_request_target`?** Review fetches the PR diff via the GitHub
|
|
327
|
+
> API and sends it to Claude β it never checks out or runs the PR's code. That
|
|
328
|
+
> makes `pull_request_target` safe here, and it's what lets reviews on
|
|
329
|
+
> **fork PRs** read the API-key secret (plain `pull_request` can't).
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## π Commands & flags
|
|
334
|
+
|
|
335
|
+
```
|
|
336
|
+
node src/pipeline.js issue <issue-url> [flags]
|
|
337
|
+
node src/pipeline.js review <pr-url> [flags]
|
|
338
|
+
node src/pipeline.js triage <repo-url> [flags]
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
| Flag | Subcommand | Effect |
|
|
342
|
+
|---|---|---|
|
|
343
|
+
| `--dry-run` | `issue`, `triage` | Full pipeline β skip commit/push/PR. |
|
|
344
|
+
| `--fork` | `issue`, `triage` | Push to your fork; open PR from fork to upstream. |
|
|
345
|
+
| `--comment` | `issue`, `triage` | Post a link-back comment on the original issue after PR opens. |
|
|
346
|
+
| `--post` | `review` | Submit the review, with bug findings as **inline `file:line` comments** anchored to the diff (issue-comment fallback if blocked). |
|
|
347
|
+
| `--advisory` | `review` | Always exit 0 (post findings without failing the run). Powers the Action's non-blocking mode. |
|
|
348
|
+
| `--force-pr` | `issue`, `triage` | Override PR safety gate. Ship on `REQUEST_CHANGES` / no passing tests. |
|
|
349
|
+
| `--web` | any | Start a **live dashboard** at `http://localhost:3000`. |
|
|
350
|
+
| `--port=N` | any | Dashboard port (default `3000`). |
|
|
351
|
+
| `--max-cost=2.50` | any | Hard-abort agent if run cost (USD) exceeds this. Default `$5.00`. |
|
|
352
|
+
| `--label=bug` | `triage` | Only process issues with this label. |
|
|
353
|
+
| `--max=5` | `triage` | Cap batch size. |
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## ποΈ Architecture
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
βββββββββββββββββββ
|
|
361
|
+
β GitHub Issue β
|
|
362
|
+
ββββββββββ¬βββββββββ
|
|
363
|
+
β
|
|
364
|
+
βΌ
|
|
365
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
366
|
+
β Project discovery (zero-cost, local) β
|
|
367
|
+
β Β· detect test command (make/tox/nox/pytest/npm/...) β
|
|
368
|
+
β Β· detect linters (ruff/black/mypy/eslint/...) β
|
|
369
|
+
β Β· detect monorepo sub-packages + guess target β
|
|
370
|
+
β Β· read CONTRIBUTING.md, PR template, DCO requirement β
|
|
371
|
+
β Β· prefilter top-20 relevant files by keyword score β
|
|
372
|
+
β Β· check for duplicate open PR β
|
|
373
|
+
ββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
374
|
+
β
|
|
375
|
+
βΌ
|
|
376
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
377
|
+
β Engineering Agent (Claude + tool use, cost-capped) β
|
|
378
|
+
β β
|
|
379
|
+
β Tools: read_file list_files find_relevant_files β
|
|
380
|
+
β write_file apply_patch apply_patch_range β
|
|
381
|
+
β run_tests run_lint git_diff β
|
|
382
|
+
β git_status finish give_up β
|
|
383
|
+
β β
|
|
384
|
+
β Loop: explore β patch β test β lint β repeat β
|
|
385
|
+
ββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
386
|
+
β diff
|
|
387
|
+
βΌ
|
|
388
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
389
|
+
β Self-Review (Claude, fresh context + issue text) β
|
|
390
|
+
β β
|
|
391
|
+
β Audits: bug risk Β· edge cases β
|
|
392
|
+
β test coverage Β· scope creep β
|
|
393
|
+
β drift from original issue intent β
|
|
394
|
+
β β
|
|
395
|
+
β Verdict: APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION β
|
|
396
|
+
ββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
397
|
+
β
|
|
398
|
+
βββββββ΄ββββββββββββββββββββββββββ
|
|
399
|
+
β APPROVE β REQUEST_CHANGES
|
|
400
|
+
β βΌ
|
|
401
|
+
β βββββββββββββββββββββββββ
|
|
402
|
+
β β Revision Pass β
|
|
403
|
+
β β (engineering agent β
|
|
404
|
+
β β + review feedback) β
|
|
405
|
+
β ββββββββββββ¬βββββββββββββ
|
|
406
|
+
β β
|
|
407
|
+
βΌ βΌ
|
|
408
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
409
|
+
β Safety gate: require passing tests + clean verdict β
|
|
410
|
+
β On pass β commit (with DCO) β push (fork or upstream) β
|
|
411
|
+
β β open PR (honors PR template) β
|
|
412
|
+
β β optional: comment on source issue β
|
|
413
|
+
β On fail β audit-trail.md written, PR blocked β
|
|
414
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
---
|
|
418
|
+
|
|
419
|
+
## π‘οΈ Safety guardrails
|
|
420
|
+
|
|
421
|
+
The agent has real write access to files on disk, real API tokens, and real cost. We've put real fences around it:
|
|
422
|
+
|
|
423
|
+
| Guardrail | Detail |
|
|
424
|
+
|---|---|
|
|
425
|
+
| **Path traversal blocked** | `read_file`, `write_file`, `apply_patch*` reject any path escaping the repo root |
|
|
426
|
+
| **No shell interpretation** | `run_tests` / `run_lint` tokenize the command, reject shell metacharacters (`;`, `&&`, backticks, `$(β¦)`), and spawn with `shell: false` |
|
|
427
|
+
| **PR gate on bad self-review** | `REQUEST_CHANGES`, `NEEDS_DISCUSSION`, unparseable verdict, or no passing tests β PR is **blocked**. `--force-pr` to override |
|
|
428
|
+
| **Review exits non-zero for CI** | `pipeline.js review` exits `1` on `REQUEST_CHANGES`, `2` on `NEEDS_DISCUSSION`/`UNKNOWN` |
|
|
429
|
+
| **Iteration cap** | Hard stop at 18 agent turns per pass |
|
|
430
|
+
| **Cost kill-switch** | Configurable per-run USD ceiling (default $5.00) β aborts before overspending |
|
|
431
|
+
| **Token leak prevention** | GitHub PAT used for clone + push but never written to `.git/config` (remote URL stripped after clone) |
|
|
432
|
+
| **Patch uniqueness** | `apply_patch` requires a unique match; fallback to whitespace-normalized match; errors include closest-line hints |
|
|
433
|
+
| **No accidental file wipes** | `write_file` refuses to overwrite an existing file unless `overwrite:true` is explicitly passed |
|
|
434
|
+
| **Pre-fix HEAD in audit** | Every run records the starting SHA with a ready-to-paste `git reset --hard <sha>` revert |
|
|
435
|
+
| **Flaky-test tolerance** | `run_tests` retries 3Γ on failure; passes on retry are flagged `flaky:true`, not treated as clean |
|
|
436
|
+
| **Graceful give-up** | Agent can abort with `give_up({reason, explanation, blockers})` β no half-fixes shipped |
|
|
437
|
+
| **API retries** | Anthropic calls retry with exponential backoff on 429/529/network errors |
|
|
438
|
+
| **`--dry-run` mode** | Full pipeline simulation without committing, pushing, or opening anything |
|
|
439
|
+
|
|
440
|
+
---
|
|
441
|
+
|
|
442
|
+
## π° Cost transparency
|
|
443
|
+
|
|
444
|
+
Every run prints a token breakdown and a USD estimate. The same numbers land in the audit trail and the PR body.
|
|
445
|
+
|
|
446
|
+
**Typical cost per issue:** $0.20 β $1.50, depending on repo size and whether the self-review triggers a revision pass. Bigger repos (Qiskit-scale) trend toward the upper end.
|
|
447
|
+
|
|
448
|
+
```
|
|
449
|
+
Token usage (engineering + revision)
|
|
450
|
+
input: 18,204 tok Β· output: 2,131 tok
|
|
451
|
+
cache_read: 14,067 tok Β· cache_create: 0 tok
|
|
452
|
+
βββββββββββββββββββββββββββββββββββββββββββββββ
|
|
453
|
+
cost: $0.4912 (in $0.2731 + out $0.1598 + cache_r $0.0211 + cache_c $0.0000)
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
> Rates live in `src/config.js` (`COST_INPUT_PER_MTOK`, `COST_OUTPUT_PER_MTOK`, `COST_CACHE_READ_PER_MTOK`, `COST_CACHE_CREATION_PER_MTOK`). Update them if Anthropic pricing changes.
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## π Audit trail
|
|
461
|
+
|
|
462
|
+
Every run writes `audit-trail.md` (gitignored). Designed to be skimmable by a human reviewer in under a minute:
|
|
463
|
+
|
|
464
|
+
```
|
|
465
|
+
# Audit trail β issue #9421: Transpiler drops global phase on conditional gates
|
|
466
|
+
|
|
467
|
+
**Issue:** https://github.com/qiskit/qiskit/issues/9421
|
|
468
|
+
**Branch:** fix/issue-9421
|
|
469
|
+
**Pre-fix HEAD:** 3f4a1b2 β revert with git reset --hard 3f4a1b2
|
|
470
|
+
**Turns used:** 6 of 18
|
|
471
|
+
**Cost:** $0.4912
|
|
472
|
+
|
|
473
|
+
## Outcome
|
|
474
|
+
β
Finished β in single pass
|
|
475
|
+
Preserve global_phase through IfElseOp consolidation...
|
|
476
|
+
|
|
477
|
+
## Safety gates
|
|
478
|
+
- Self-review verdict: APPROVE
|
|
479
|
+
- Tests observed passing: YES
|
|
480
|
+
- Lint observed passing: YES
|
|
481
|
+
|
|
482
|
+
## Files touched
|
|
483
|
+
- qiskit/transpiler/passes/optimization/consolidate_blocks.py β 1 edit via apply_patch
|
|
484
|
+
|
|
485
|
+
## Test runs
|
|
486
|
+
- Total invocations: 1 Β· Passed: 1 Β· Failed: 0
|
|
487
|
+
|
|
488
|
+
## Timeline (condensed)
|
|
489
|
+
- Turn 1 β Scoring the shortlistβ¦
|
|
490
|
+
- ranked files for: "transpiler global phase conditional gates"
|
|
491
|
+
- read qiskit/transpiler/passes/optimization/consolidate_blocks.py
|
|
492
|
+
- Turn 2 β Found it β line 142 drops .global_phaseβ¦
|
|
493
|
+
- patched qiskit/transpiler/passes/optimization/consolidate_blocks.py
|
|
494
|
+
- Turn 3 β ran tests: tox β PASS; ran lint: ruff check . β PASS; ran lint: mypy . β PASS
|
|
495
|
+
- Turn 4 β signalled finish
|
|
496
|
+
|
|
497
|
+
## Self-review report
|
|
498
|
+
[full reviewer output]
|
|
499
|
+
|
|
500
|
+
## Full tool transcript
|
|
501
|
+
<details>β¦raw trace for debuggingβ¦</details>
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
---
|
|
505
|
+
|
|
506
|
+
## π Project structure
|
|
507
|
+
|
|
508
|
+
```
|
|
509
|
+
github-agent/
|
|
510
|
+
βββ src/
|
|
511
|
+
β βββ pipeline.js β CLI entry + subcommands
|
|
512
|
+
β βββ orchestrator.js β engineering β review β revision β PR + project discovery
|
|
513
|
+
β βββ config.js β model, limits, cost rates
|
|
514
|
+
β βββ agents/
|
|
515
|
+
β β βββ engineeringAgent.js β issue β autonomous fix
|
|
516
|
+
β β βββ reviewCopilot.js β diff β structured audit
|
|
517
|
+
β β βββ agentLoop.js β multi-turn tool-use loop, retries, cost ceiling
|
|
518
|
+
β β βββ tools.js β tool schemas + sandboxed handlers
|
|
519
|
+
β βββ prompts/
|
|
520
|
+
β β βββ engineering.js β agentic system prompt, monorepo/lint/contrib hints
|
|
521
|
+
β β βββ review.js β review system prompt + verdict format
|
|
522
|
+
β βββ mapper/
|
|
523
|
+
β β βββ repoMap.js β big-project file walker, ignore-dirs, truncation
|
|
524
|
+
β β βββ fileRelevance.js β keyword scorer β starting-file prefilter
|
|
525
|
+
β βββ utils/
|
|
526
|
+
β β βββ cost.js β pricing math (input/output/cache)
|
|
527
|
+
β β βββ diffLines.js β unified-diff parser β valid inline-comment anchors
|
|
528
|
+
β β βββ githubUrl.js β parse owner/repo/number from URLs
|
|
529
|
+
β βββ cli/
|
|
530
|
+
β β βββ output.js β pretty terminal + cost summary
|
|
531
|
+
β βββ web/
|
|
532
|
+
β βββ server.js β Express SSE dashboard
|
|
533
|
+
β βββ public/index.html β live agent feed
|
|
534
|
+
βββ tests/ β 127 tests across 9 suites
|
|
535
|
+
βββ .github/workflows/test.yml β CI matrix: Linux/macOS/Windows Γ Node 18/20/22
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
---
|
|
539
|
+
|
|
540
|
+
## π§ͺ Tests
|
|
541
|
+
|
|
542
|
+
```bash
|
|
543
|
+
npm test
|
|
544
|
+
```
|
|
545
|
+
|
|
546
|
+
**146 tests across 12 suites** covering path traversal, shell-injection guards, patch fallback strategies, repo walker truncation, big-project ignore-dirs, orchestrator verdict parsing, monorepo detection, CONTRIBUTING/DCO reading, cost math (including cache creation), audit trail structure, PR body + template honoring, GitHub Action verdict reporting, unified-diff line anchoring, inline-comment parsing/partitioning, and a mocked-SDK end-to-end run with retry semantics.
|
|
547
|
+
|
|
548
|
+
CI runs the full suite on **Linux / macOS / Windows Γ Node 18 / 20 / 22** for every push and pull request. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the contributor workflow and [`TESTING.md`](TESTING.md) for live, end-to-end feature testing recipes.
|
|
549
|
+
|
|
550
|
+
---
|
|
551
|
+
|
|
552
|
+
## πΊοΈ Roadmap
|
|
553
|
+
|
|
554
|
+
- [ ] **Docker/devcontainer executor** β so `pytest` works on Qiskit-class repos that need BLAS / CUDA / compiled extensions
|
|
555
|
+
- [ ] **Embedding-based relevance** β drop-in replacement for the keyword prefilter on very abstract issues
|
|
556
|
+
- [ ] **Parallel triage** β one dashboard pane per issue when batching
|
|
557
|
+
- [ ] **LangSmith / Helicone telemetry export**
|
|
558
|
+
- [ ] **Pluggable language adapters** β `rustfmt`+`cargo`, `gofmt`+`go vet`, etc.
|
|
559
|
+
|
|
560
|
+
---
|
|
561
|
+
|
|
562
|
+
## π€ Contributing
|
|
563
|
+
|
|
564
|
+
See [`CONTRIBUTING.md`](CONTRIBUTING.md). Short version: one behaviour change per PR, add a test with every behaviour change, `npm test` must be green on Node 18/20/22.
|
|
565
|
+
|
|
566
|
+
---
|
|
567
|
+
|
|
568
|
+
## π License
|
|
569
|
+
|
|
570
|
+
[MIT](LICENSE) β use it, fork it, ship it.
|