@aarushpandey/gitagent 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/CONTRIBUTING.md +104 -0
  2. package/LICENSE +21 -0
  3. package/README.md +570 -0
  4. package/TESTING.md +290 -0
  5. package/action.yml +113 -0
  6. package/examples/README.md +124 -0
  7. package/examples/sample-audit-trail-issue-4.md +112 -0
  8. package/examples/sample-review-tqec-pr894-v1-raw-flawed.md +71 -0
  9. package/examples/sample-review-tqec-pr894-v2-raw.md +48 -0
  10. package/examples/sample-review-tqec-pr894-v3-curated.md +118 -0
  11. package/examples/verify-marker-precedence/README.md +97 -0
  12. package/examples/verify-marker-precedence/conftest.py +15 -0
  13. package/examples/verify-marker-precedence/pyproject.toml +8 -0
  14. package/examples/verify-marker-precedence/test_marker_precedence.py +56 -0
  15. package/examples/verify-marker-precedence/verify_precedence.py +67 -0
  16. package/examples/workflows/issue-fix.yml +32 -0
  17. package/examples/workflows/pr-review.yml +34 -0
  18. package/package.json +75 -0
  19. package/scripts/verify.js +478 -0
  20. package/src/agents/agentLoop.js +176 -0
  21. package/src/agents/engineeringAgent.js +51 -0
  22. package/src/agents/reviewCopilot.js +79 -0
  23. package/src/agents/tools.js +486 -0
  24. package/src/cli/output.js +137 -0
  25. package/src/config.js +22 -0
  26. package/src/mapper/fileRelevance.js +113 -0
  27. package/src/mapper/repoMap.js +105 -0
  28. package/src/orchestrator.js +336 -0
  29. package/src/pipeline.js +985 -0
  30. package/src/prompts/engineering.js +189 -0
  31. package/src/prompts/review.js +149 -0
  32. package/src/utils/cost.js +47 -0
  33. package/src/utils/diffLines.js +67 -0
  34. package/src/utils/githubUrl.js +8 -0
  35. package/src/web/public/index.html +128 -0
  36. package/src/web/server.js +51 -0
package/README.md ADDED
@@ -0,0 +1,570 @@
1
+ <h1 align="center">
2
+ <br>
3
+ πŸ€– github-agent
4
+ <br>
5
+ </h1>
6
+
7
+ <h3 align="center">An AI that ships pull requests β€” and reviews its own work before opening them.</h3>
8
+
9
+ <p align="center">
10
+ <a href="#-quick-start">Quick Start</a> β€’
11
+ <a href="#-what-makes-this-different">Why github-agent</a> β€’
12
+ <a href="#-built-for-big-open-source-projects">Big Projects</a> β€’
13
+ <a href="#️-architecture">Architecture</a> β€’
14
+ <a href="#️-safety-guardrails">Safety</a> β€’
15
+ <a href="#️-roadmap">Roadmap</a>
16
+ </p>
17
+
18
+ <p align="center">
19
+ <img src="https://img.shields.io/badge/model-Claude%20Sonnet%204.6-blueviolet?style=flat-square&logo=anthropic" alt="Claude Sonnet 4.6">
20
+ <img src="https://img.shields.io/badge/tests-146%20passing-brightgreen?style=flat-square" alt="146 tests passing">
21
+ <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen?style=flat-square&logo=node.js" alt="Node 18+">
22
+ <img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="MIT License">
23
+ <img src="https://img.shields.io/badge/CI-Linux%20%7C%20macOS%20%7C%20Windows-555?style=flat-square&logo=githubactions" alt="CI matrix">
24
+ </p>
25
+
26
+ ---
27
+
28
+ `github-agent` is an **autonomous engineering pipeline** built on Claude. Give it a GitHub issue URL; it clones the repo, edits the code, runs the tests, **has a second AI instance review the diff**, refuses to ship a PR that fails its own review, and opens a pull request β€” all in one command.
29
+
30
+ ```bash
31
+ node src/pipeline.js issue https://github.com/your/repo/issues/42
32
+ ```
33
+
34
+ ---
35
+
36
+ ## ✨ See it in action
37
+
38
+ ```
39
+ $ node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
40
+
41
+ ╔════════════════════════════════════════════╗
42
+ β•‘ github-agent β€” autonomous PR engineer β•‘
43
+ β•‘ engineering β†’ self-review β†’ ship β•‘
44
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
45
+
46
+ β–Έ Issue qiskit/qiskit#9421
47
+ title: Transpiler drops global phase on conditional gates
48
+ default branch: main
49
+
50
+ β–Έ Cloning + branching
51
+ βœ“ branch: fix/issue-9421
52
+ test command: tox
53
+ lint commands: ruff check ., black --check ., mypy .
54
+ monorepo sub-packages: terra, aer, ibmq
55
+ guessed sub-package for issue: terra
56
+ CONTRIBUTING.md found at CONTRIBUTING.md
57
+ Project requires DCO Signed-off-by β€” will auto-sign commits.
58
+ 20 file(s) prefiltered as likely relevant
59
+ pre-fix HEAD: 3f4a1b2
60
+
61
+ β–Έ Engineering agent β€” autonomous fix loop
62
+ πŸ’­ [turn 1] Scoring the shortlist β€” transpiler/passes/optimization looks like the hit.
63
+ πŸ”§ find_relevant_files(query="transpiler global phase conditional gates")
64
+ πŸ”§ read_file(qiskit/transpiler/passes/optimization/consolidate_blocks.py)
65
+ πŸ’­ [turn 2] Found it β€” line 142 drops .global_phase on IfElseOp. Patching.
66
+ πŸ”§ apply_patch(qiskit/transpiler/passes/optimization/consolidate_blocks.py, ...)
67
+ πŸ”§ run_tests(tox) β†’ PASS
68
+ πŸ”§ run_lint(ruff check .) β†’ PASS
69
+ πŸ”§ run_lint(mypy .) β†’ PASS
70
+ πŸ”§ finish({"pr_summary":"Preserve global_phase through IfElseOp consolidation..."})
71
+ βœ“ Agent finished after 6 turn(s)
72
+
73
+ β–Έ Self-review β€” auditing the diff
74
+ βœ“ Review verdict: APPROVE
75
+
76
+ Token usage (engineering + revision)
77
+ input: 18,204 tok Β· output: 2,131 tok Β· cache_read: 14,067 tok
78
+ cost: $0.4912
79
+
80
+ β–Έ Committing + pushing
81
+ βœ“ added DCO Signed-off-by trailer
82
+ βœ“ pushed fix/issue-9421 to Hadar01/qiskit
83
+
84
+ β–Έ Opening pull request
85
+ βœ“ PR opened: https://github.com/qiskit/qiskit/pull/11504
86
+ βœ“ commented on issue: https://github.com/qiskit/qiskit/issues/9421#issuecomment-...
87
+ ```
88
+
89
+ ---
90
+
91
+ ## πŸ† What makes this different
92
+
93
+ Most AI coding tools **generate code and hand it to a human.** `github-agent` **ships it** β€” and audits itself first, refuses to ship bad work, and handles OSS repos you don't own.
94
+
95
+ | | Copilot / Cursor | Devin / SWE-agent | **github-agent** |
96
+ |---|:---:|:---:|:---:|
97
+ | Generates code | βœ… | βœ… | βœ… |
98
+ | Runs tests autonomously | ❌ | βœ… | βœ… |
99
+ | Runs project linters autonomously | ❌ | partial | βœ… |
100
+ | Opens the PR for you | ❌ | βœ… | βœ… |
101
+ | **Reviews its own diff before shipping** | ❌ | ❌ | βœ… |
102
+ | **Refuses to ship on bad self-review** | ❌ | ❌ | βœ… |
103
+ | **Revises based on its own review** | ❌ | ❌ | βœ… |
104
+ | Knows when to give up | ❌ | ❌ | βœ… |
105
+ | Works on repos you don't own (fork + PR) | ❌ | ❌ | βœ… |
106
+ | Human-readable audit trail in PR body | ❌ | partial | βœ… |
107
+ | Cost estimate + kill switch per run | ❌ | ❌ | βœ… |
108
+
109
+ ### The self-review loop β€” the killer feature
110
+
111
+ A **second Claude instance**, with a completely fresh context and a different system prompt, audits the diff for:
112
+
113
+ - πŸ› **Bug risk** β€” logic errors, off-by-ones, null dereferences, drift from the original issue intent
114
+ - πŸ”² **Edge cases** β€” inputs the engineering agent didn't consider
115
+ - πŸ§ͺ **Test coverage** β€” is the change actually tested?
116
+ - 🎯 **Scope creep** β€” did the agent touch things it shouldn't?
117
+
118
+ Verdict is one of `APPROVE` / `REQUEST_CHANGES` / `NEEDS_DISCUSSION`. On `REQUEST_CHANGES` the engineering agent does a **revision pass** with the review as input. On anything that isn't `APPROVE`, **the pipeline refuses to open the PR** β€” you have to pass `--force-pr` to override. No silent bad PRs.
119
+
120
+ ---
121
+
122
+ ## πŸ”¬ Built for big open-source projects
123
+
124
+ Working on a 50-file toy repo is easy. Working on Qiskit, Cirq, VIO is not. `github-agent` has specific affordances for large scientific-Python-class codebases:
125
+
126
+ | Problem on a Qiskit-scale repo | What github-agent does |
127
+ |---|---|
128
+ | Thousands of files β€” context blows up | **Keyword relevance prefilter** scores every file against issue text; top-20 injected as starting hint. No embeddings API needed. |
129
+ | Narrow language support misses `.pyx`/`.pxd`/`.pyi`/`.rst`/config | Walks all of them, plus `Makefile`, `tox.ini`, `noxfile.py`, `CONTRIBUTING.md`, PR templates. |
130
+ | Monorepos with sub-packages (`qiskit-terra`, `qiskit-aer`, …) | **Auto-detects sub-packages**, guesses from issue text which one the change belongs to, tells the agent. |
131
+ | Test command isn't bare `pytest` β€” it's `tox`, `nox`, `make test` | Priority-ordered detection: Makefile `test:` target β†’ `make test`. `tox.ini` β†’ `tox`. `noxfile.py` β†’ `nox`. Then Python/Node/Rust. |
132
+ | CI gates on `ruff`, `black`, `mypy` β€” not just tests | **Lint gate**: auto-detects configured linters and the agent must pass them all before `finish()`. |
133
+ | Deeply-indented Python makes `apply_patch` brittle | **Whitespace-normalized fallback** + `apply_patch_range` (replace by line numbers) when strings won't disambiguate. |
134
+ | DCO sign-off / PR templates / CONTRIBUTING.md rules | All read and honored. `Signed-off-by:` trailer appended automatically. PR template preserved at top of PR body. |
135
+ | Scientific deps fail to install (BLAS/CUDA/compiled extensions) | `run_tests` detects `ModuleNotFoundError`/`ImportError` and flags `env_error:true`. The agent **gives up gracefully** instead of thrashing. |
136
+ | Complex issues need human judgment | The agent can call `give_up({reason, explanation, blockers})`. With `--comment` it posts the reason on the issue so a human picks up with full context. |
137
+ | Duplicate runs open duplicate PRs | **Duplicate-PR guard** β€” scans open PRs for `Resolves/Fixes/Closes #N` or matching `fix/issue-N` branch before cloning. |
138
+
139
+ > πŸ›‘ **Honest limitation:** we don't provision test environments. If a repo needs GPU / BLAS / conda, you'll want to run the agent inside a pre-warmed Docker image. That executor is on the roadmap.
140
+
141
+ ---
142
+
143
+ ## πŸ§‘β€βš–οΈ For maintainers wary of AI-generated PR noise
144
+
145
+ If you maintain a repo and you're (rightly) sceptical about AI tools dumping
146
+ generic *"consider error handling"* comments into your PR threads β€” read this.
147
+
148
+ **The `review` subcommand is offline by default.**
149
+
150
+ ```bash
151
+ node src/pipeline.js review https://github.com/your-repo/pull/123
152
+ # β†’ writes review-report.md to disk; never posts anywhere
153
+ # β†’ exits 1 on REQUEST_CHANGES, 2 on NEEDS_DISCUSSION/UNKNOWN
154
+ # β†’ exits 0 only on APPROVE
155
+ ```
156
+
157
+ Posting to the PR requires an explicit `--post` flag. The default workflow is:
158
+
159
+ 1. Run `review` offline on a PR you'd otherwise review by hand.
160
+ 2. Read `review-report.md`. Cut anything speculative.
161
+ 3. **Manually** decide whether the curated output is worth pasting into the
162
+ thread. If not, throw it away β€” nothing was posted, no noise added.
163
+
164
+ Bug-risk findings must cite `file:line`. The verdict prompt biases toward
165
+ `NEEDS_DISCUSSION` rather than rubber-stamping `APPROVE`. The exit-code-on-
166
+ verdict design makes it CI-gateable as a *"block merge until a human
167
+ acknowledges the bot's concerns"* check, without ever opening a PR comment.
168
+
169
+ When you *do* post (`--post`), findings land as **inline comments anchored to
170
+ the exact diff line** β€” not one wall-of-text blob. Each finding's `(file, line)`
171
+ is validated against the PR's diff hunks before posting, so a hallucinated line
172
+ number can never 422 the whole review; anything that won't anchor is folded into
173
+ the review summary with its `file:line` instead of being dropped.
174
+
175
+ See [`examples/`](examples/) for sample artifacts produced by real runs.
176
+
177
+ ---
178
+
179
+ ## 🀝 Contributing to repos you don't own
180
+
181
+ You can run `github-agent` on any public open-source project, even without write access. A `public_repo`-scoped PAT is enough.
182
+
183
+ ```bash
184
+ # Fork-and-PR: pushes to your own fork, opens PR upstream, links back to the issue.
185
+ node src/pipeline.js issue https://github.com/qiskit/qiskit/issues/9421 --fork --comment
186
+
187
+ # Review a PR in a project you're not a maintainer of.
188
+ # --post submits the review as a PR comment (falls back to issue comment if permissions block).
189
+ node src/pipeline.js review https://github.com/qiskit/qiskit/pull/11504 --post
190
+
191
+ # Triage multiple issues in one shot.
192
+ node src/pipeline.js triage https://github.com/qiskit/qiskit --label=bug --max=5 --fork --comment
193
+ ```
194
+
195
+ The review subcommand **exits non-zero** on `REQUEST_CHANGES` so you can wire it straight into CI as a pre-merge gate.
196
+
197
+ ---
198
+
199
+ ## πŸš€ Quick start
200
+
201
+ ### Prerequisites
202
+
203
+ - Node.js 18+
204
+ - An [Anthropic API key](https://console.anthropic.com/)
205
+ - A [GitHub Personal Access Token](https://github.com/settings/tokens) β€” `public_repo` for OSS work, `repo` for private repos
206
+
207
+ ### Install
208
+
209
+ **No clone β€” just run it** (published on npm as [`@aarushpandey/gitagent`](https://www.npmjs.com/package/@aarushpandey/gitagent)):
210
+
211
+ ```bash
212
+ # one-off, no install:
213
+ ANTHROPIC_API_KEY=sk-ant-... GITHUB_TOKEN=ghp_... \
214
+ npx @aarushpandey/gitagent review https://github.com/your/repo/pull/123
215
+
216
+ # or install the `github-agent` command globally:
217
+ npm install -g @aarushpandey/gitagent
218
+ github-agent review https://github.com/your/repo/pull/123
219
+ ```
220
+
221
+ > The npm package is named `@aarushpandey/gitagent`; the command it installs is `github-agent`.
222
+
223
+ **Or clone for development:**
224
+
225
+ ```bash
226
+ git clone https://github.com/Hadar01/github-agents.git
227
+ cd github-agents
228
+ npm install
229
+ cp .env.example .env
230
+ # edit .env:
231
+ # ANTHROPIC_API_KEY=sk-ant-...
232
+ # GITHUB_TOKEN=ghp_...
233
+ ```
234
+
235
+ ### Your first run
236
+
237
+ ```bash
238
+ # Dry run first β€” full pipeline, no commits/push/PR
239
+ node src/pipeline.js issue https://github.com/your/repo/issues/42 --dry-run
240
+
241
+ # Ship it for real
242
+ node src/pipeline.js issue https://github.com/your/repo/issues/42
243
+
244
+ # Review an existing PR (no editing β€” just the audit)
245
+ node src/pipeline.js review https://github.com/your/repo/pull/123
246
+ ```
247
+
248
+ Or use the npm shorthand scripts:
249
+
250
+ ```bash
251
+ npm run issue -- https://github.com/your/repo/issues/42
252
+ npm run review -- https://github.com/your/repo/pull/123
253
+ ```
254
+
255
+ ---
256
+
257
+ ## ⚑ Run it in CI β€” the GitHub Action
258
+
259
+ The fastest way to get a whole team using this: don't make anyone install
260
+ anything. Drop a workflow into your repo and `github-agent` reviews every PR
261
+ (and can auto-fix labeled issues) on GitHub's runners. No clone, no `.env` β€”
262
+ just one secret.
263
+
264
+ **Auto-review every PR** (`.github/workflows/pr-review.yml`):
265
+
266
+ ```yaml
267
+ name: PR review (github-agent)
268
+ on:
269
+ pull_request_target:
270
+ types: [opened, synchronize, reopened]
271
+ permissions:
272
+ contents: read
273
+ pull-requests: write
274
+ jobs:
275
+ review:
276
+ runs-on: ubuntu-latest
277
+ steps:
278
+ - uses: Hadar01/github-agents@v1
279
+ with:
280
+ command: review
281
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
282
+ # Advisory by default β€” posts findings, never blocks merge.
283
+ # Set 'true' to make REQUEST_CHANGES fail the check (a merge gate).
284
+ fail-on-request-changes: 'false'
285
+ ```
286
+
287
+ **Auto-fix labeled issues** β€” apply the `agent-fix` label and it opens a PR
288
+ (`.github/workflows/issue-fix.yml`):
289
+
290
+ ```yaml
291
+ on:
292
+ issues:
293
+ types: [labeled]
294
+ permissions: { contents: write, issues: write, pull-requests: write }
295
+ jobs:
296
+ fix:
297
+ if: github.event.label.name == 'agent-fix'
298
+ runs-on: ubuntu-latest
299
+ steps:
300
+ - uses: Hadar01/github-agents@v1
301
+ with:
302
+ command: issue
303
+ anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
304
+ comment: 'true'
305
+ ```
306
+
307
+ Add one secret (**Settings β†’ Secrets β†’ Actions β†’ `ANTHROPIC_API_KEY`**); the
308
+ built-in `GITHUB_TOKEN` handles the rest. Ready-to-copy files live in
309
+ [`examples/workflows/`](examples/workflows/).
310
+
311
+ | Action input | Default | Effect |
312
+ |---|---|---|
313
+ | `command` | β€” | `review`, `issue`, or `triage`. |
314
+ | `target` | event URL | PR/issue/repo URL. Auto-derived from the trigger if omitted. |
315
+ | `anthropic-api-key` | β€” | **Required.** Store as a repo/org secret. |
316
+ | `github-token` | `${{ github.token }}` | Token for GitHub API calls. |
317
+ | `post` | `true` | (review) Post the review back to the PR. |
318
+ | `fail-on-request-changes` | `false` | (review) `true` = block merge on a bad verdict; `false` = advisory. |
319
+ | `comment` | `false` | (issue) Link the PR back on the source issue. |
320
+ | `fork` | `false` | (issue) Push to your fork and PR from there. |
321
+ | `max-cost` | project default | USD ceiling for the run. |
322
+
323
+ The job exposes the verdict as an output (`steps.<id>.outputs.verdict`) and
324
+ renders it in the job summary, so you can branch on it in later steps.
325
+
326
+ > πŸ”’ **Why `pull_request_target`?** Review fetches the PR diff via the GitHub
327
+ > API and sends it to Claude β€” it never checks out or runs the PR's code. That
328
+ > makes `pull_request_target` safe here, and it's what lets reviews on
329
+ > **fork PRs** read the API-key secret (plain `pull_request` can't).
330
+
331
+ ---
332
+
333
+ ## πŸ“– Commands & flags
334
+
335
+ ```
336
+ node src/pipeline.js issue <issue-url> [flags]
337
+ node src/pipeline.js review <pr-url> [flags]
338
+ node src/pipeline.js triage <repo-url> [flags]
339
+ ```
340
+
341
+ | Flag | Subcommand | Effect |
342
+ |---|---|---|
343
+ | `--dry-run` | `issue`, `triage` | Full pipeline β€” skip commit/push/PR. |
344
+ | `--fork` | `issue`, `triage` | Push to your fork; open PR from fork to upstream. |
345
+ | `--comment` | `issue`, `triage` | Post a link-back comment on the original issue after PR opens. |
346
+ | `--post` | `review` | Submit the review, with bug findings as **inline `file:line` comments** anchored to the diff (issue-comment fallback if blocked). |
347
+ | `--advisory` | `review` | Always exit 0 (post findings without failing the run). Powers the Action's non-blocking mode. |
348
+ | `--force-pr` | `issue`, `triage` | Override PR safety gate. Ship on `REQUEST_CHANGES` / no passing tests. |
349
+ | `--web` | any | Start a **live dashboard** at `http://localhost:3000`. |
350
+ | `--port=N` | any | Dashboard port (default `3000`). |
351
+ | `--max-cost=2.50` | any | Hard-abort agent if run cost (USD) exceeds this. Default `$5.00`. |
352
+ | `--label=bug` | `triage` | Only process issues with this label. |
353
+ | `--max=5` | `triage` | Cap batch size. |
354
+
355
+ ---
356
+
357
+ ## πŸ—οΈ Architecture
358
+
359
+ ```
360
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
361
+ β”‚ GitHub Issue β”‚
362
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
363
+ β”‚
364
+ β–Ό
365
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
366
+ β”‚ Project discovery (zero-cost, local) β”‚
367
+ β”‚ Β· detect test command (make/tox/nox/pytest/npm/...) β”‚
368
+ β”‚ Β· detect linters (ruff/black/mypy/eslint/...) β”‚
369
+ β”‚ Β· detect monorepo sub-packages + guess target β”‚
370
+ β”‚ Β· read CONTRIBUTING.md, PR template, DCO requirement β”‚
371
+ β”‚ Β· prefilter top-20 relevant files by keyword score β”‚
372
+ β”‚ Β· check for duplicate open PR β”‚
373
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
374
+ β”‚
375
+ β–Ό
376
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
377
+ β”‚ Engineering Agent (Claude + tool use, cost-capped) β”‚
378
+ β”‚ β”‚
379
+ β”‚ Tools: read_file list_files find_relevant_files β”‚
380
+ β”‚ write_file apply_patch apply_patch_range β”‚
381
+ β”‚ run_tests run_lint git_diff β”‚
382
+ β”‚ git_status finish give_up β”‚
383
+ β”‚ β”‚
384
+ β”‚ Loop: explore β†’ patch β†’ test β†’ lint β†’ repeat β”‚
385
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
386
+ β”‚ diff
387
+ β–Ό
388
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
389
+ β”‚ Self-Review (Claude, fresh context + issue text) β”‚
390
+ β”‚ β”‚
391
+ β”‚ Audits: bug risk Β· edge cases β”‚
392
+ β”‚ test coverage Β· scope creep β”‚
393
+ β”‚ drift from original issue intent β”‚
394
+ β”‚ β”‚
395
+ β”‚ Verdict: APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION β”‚
396
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
397
+ β”‚
398
+ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
399
+ β”‚ APPROVE β”‚ REQUEST_CHANGES
400
+ β”‚ β–Ό
401
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
402
+ β”‚ β”‚ Revision Pass β”‚
403
+ β”‚ β”‚ (engineering agent β”‚
404
+ β”‚ β”‚ + review feedback) β”‚
405
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
406
+ β”‚ β”‚
407
+ β–Ό β–Ό
408
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
409
+ β”‚ Safety gate: require passing tests + clean verdict β”‚
410
+ β”‚ On pass β†’ commit (with DCO) β†’ push (fork or upstream) β”‚
411
+ β”‚ β†’ open PR (honors PR template) β”‚
412
+ β”‚ β†’ optional: comment on source issue β”‚
413
+ β”‚ On fail β†’ audit-trail.md written, PR blocked β”‚
414
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
415
+ ```
416
+
417
+ ---
418
+
419
+ ## πŸ›‘οΈ Safety guardrails
420
+
421
+ The agent has real write access to files on disk, real API tokens, and real cost. We've put real fences around it:
422
+
423
+ | Guardrail | Detail |
424
+ |---|---|
425
+ | **Path traversal blocked** | `read_file`, `write_file`, `apply_patch*` reject any path escaping the repo root |
426
+ | **No shell interpretation** | `run_tests` / `run_lint` tokenize the command, reject shell metacharacters (`;`, `&&`, backticks, `$(…)`), and spawn with `shell: false` |
427
+ | **PR gate on bad self-review** | `REQUEST_CHANGES`, `NEEDS_DISCUSSION`, unparseable verdict, or no passing tests β†’ PR is **blocked**. `--force-pr` to override |
428
+ | **Review exits non-zero for CI** | `pipeline.js review` exits `1` on `REQUEST_CHANGES`, `2` on `NEEDS_DISCUSSION`/`UNKNOWN` |
429
+ | **Iteration cap** | Hard stop at 18 agent turns per pass |
430
+ | **Cost kill-switch** | Configurable per-run USD ceiling (default $5.00) β€” aborts before overspending |
431
+ | **Token leak prevention** | GitHub PAT used for clone + push but never written to `.git/config` (remote URL stripped after clone) |
432
+ | **Patch uniqueness** | `apply_patch` requires a unique match; fallback to whitespace-normalized match; errors include closest-line hints |
433
+ | **No accidental file wipes** | `write_file` refuses to overwrite an existing file unless `overwrite:true` is explicitly passed |
434
+ | **Pre-fix HEAD in audit** | Every run records the starting SHA with a ready-to-paste `git reset --hard <sha>` revert |
435
+ | **Flaky-test tolerance** | `run_tests` retries 3Γ— on failure; passes on retry are flagged `flaky:true`, not treated as clean |
436
+ | **Graceful give-up** | Agent can abort with `give_up({reason, explanation, blockers})` β€” no half-fixes shipped |
437
+ | **API retries** | Anthropic calls retry with exponential backoff on 429/529/network errors |
438
+ | **`--dry-run` mode** | Full pipeline simulation without committing, pushing, or opening anything |
439
+
440
+ ---
441
+
442
+ ## πŸ’° Cost transparency
443
+
444
+ Every run prints a token breakdown and a USD estimate. The same numbers land in the audit trail and the PR body.
445
+
446
+ **Typical cost per issue:** $0.20 – $1.50, depending on repo size and whether the self-review triggers a revision pass. Bigger repos (Qiskit-scale) trend toward the upper end.
447
+
448
+ ```
449
+ Token usage (engineering + revision)
450
+ input: 18,204 tok Β· output: 2,131 tok
451
+ cache_read: 14,067 tok Β· cache_create: 0 tok
452
+ ───────────────────────────────────────────────
453
+ cost: $0.4912 (in $0.2731 + out $0.1598 + cache_r $0.0211 + cache_c $0.0000)
454
+ ```
455
+
456
+ > Rates live in `src/config.js` (`COST_INPUT_PER_MTOK`, `COST_OUTPUT_PER_MTOK`, `COST_CACHE_READ_PER_MTOK`, `COST_CACHE_CREATION_PER_MTOK`). Update them if Anthropic pricing changes.
457
+
458
+ ---
459
+
460
+ ## πŸ“‹ Audit trail
461
+
462
+ Every run writes `audit-trail.md` (gitignored). Designed to be skimmable by a human reviewer in under a minute:
463
+
464
+ ```
465
+ # Audit trail β€” issue #9421: Transpiler drops global phase on conditional gates
466
+
467
+ **Issue:** https://github.com/qiskit/qiskit/issues/9421
468
+ **Branch:** fix/issue-9421
469
+ **Pre-fix HEAD:** 3f4a1b2 β€” revert with git reset --hard 3f4a1b2
470
+ **Turns used:** 6 of 18
471
+ **Cost:** $0.4912
472
+
473
+ ## Outcome
474
+ βœ… Finished β€” in single pass
475
+ Preserve global_phase through IfElseOp consolidation...
476
+
477
+ ## Safety gates
478
+ - Self-review verdict: APPROVE
479
+ - Tests observed passing: YES
480
+ - Lint observed passing: YES
481
+
482
+ ## Files touched
483
+ - qiskit/transpiler/passes/optimization/consolidate_blocks.py β€” 1 edit via apply_patch
484
+
485
+ ## Test runs
486
+ - Total invocations: 1 Β· Passed: 1 Β· Failed: 0
487
+
488
+ ## Timeline (condensed)
489
+ - Turn 1 β€” Scoring the shortlist…
490
+ - ranked files for: "transpiler global phase conditional gates"
491
+ - read qiskit/transpiler/passes/optimization/consolidate_blocks.py
492
+ - Turn 2 β€” Found it β€” line 142 drops .global_phase…
493
+ - patched qiskit/transpiler/passes/optimization/consolidate_blocks.py
494
+ - Turn 3 β€” ran tests: tox β†’ PASS; ran lint: ruff check . β†’ PASS; ran lint: mypy . β†’ PASS
495
+ - Turn 4 β€” signalled finish
496
+
497
+ ## Self-review report
498
+ [full reviewer output]
499
+
500
+ ## Full tool transcript
501
+ <details>…raw trace for debugging…</details>
502
+ ```
503
+
504
+ ---
505
+
506
+ ## πŸ“ Project structure
507
+
508
+ ```
509
+ github-agent/
510
+ β”œβ”€β”€ src/
511
+ β”‚ β”œβ”€β”€ pipeline.js ← CLI entry + subcommands
512
+ β”‚ β”œβ”€β”€ orchestrator.js ← engineering β†’ review β†’ revision β†’ PR + project discovery
513
+ β”‚ β”œβ”€β”€ config.js ← model, limits, cost rates
514
+ β”‚ β”œβ”€β”€ agents/
515
+ β”‚ β”‚ β”œβ”€β”€ engineeringAgent.js ← issue β†’ autonomous fix
516
+ β”‚ β”‚ β”œβ”€β”€ reviewCopilot.js ← diff β†’ structured audit
517
+ β”‚ β”‚ β”œβ”€β”€ agentLoop.js ← multi-turn tool-use loop, retries, cost ceiling
518
+ β”‚ β”‚ └── tools.js ← tool schemas + sandboxed handlers
519
+ β”‚ β”œβ”€β”€ prompts/
520
+ β”‚ β”‚ β”œβ”€β”€ engineering.js ← agentic system prompt, monorepo/lint/contrib hints
521
+ β”‚ β”‚ └── review.js ← review system prompt + verdict format
522
+ β”‚ β”œβ”€β”€ mapper/
523
+ β”‚ β”‚ β”œβ”€β”€ repoMap.js ← big-project file walker, ignore-dirs, truncation
524
+ β”‚ β”‚ └── fileRelevance.js ← keyword scorer β€” starting-file prefilter
525
+ β”‚ β”œβ”€β”€ utils/
526
+ β”‚ β”‚ β”œβ”€β”€ cost.js ← pricing math (input/output/cache)
527
+ β”‚ β”‚ β”œβ”€β”€ diffLines.js ← unified-diff parser β€” valid inline-comment anchors
528
+ β”‚ β”‚ └── githubUrl.js ← parse owner/repo/number from URLs
529
+ β”‚ β”œβ”€β”€ cli/
530
+ β”‚ β”‚ └── output.js ← pretty terminal + cost summary
531
+ β”‚ └── web/
532
+ β”‚ β”œβ”€β”€ server.js ← Express SSE dashboard
533
+ β”‚ └── public/index.html ← live agent feed
534
+ β”œβ”€β”€ tests/ ← 127 tests across 9 suites
535
+ └── .github/workflows/test.yml ← CI matrix: Linux/macOS/Windows Γ— Node 18/20/22
536
+ ```
537
+
538
+ ---
539
+
540
+ ## πŸ§ͺ Tests
541
+
542
+ ```bash
543
+ npm test
544
+ ```
545
+
546
+ **146 tests across 12 suites** covering path traversal, shell-injection guards, patch fallback strategies, repo walker truncation, big-project ignore-dirs, orchestrator verdict parsing, monorepo detection, CONTRIBUTING/DCO reading, cost math (including cache creation), audit trail structure, PR body + template honoring, GitHub Action verdict reporting, unified-diff line anchoring, inline-comment parsing/partitioning, and a mocked-SDK end-to-end run with retry semantics.
547
+
548
+ CI runs the full suite on **Linux / macOS / Windows Γ— Node 18 / 20 / 22** for every push and pull request. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the contributor workflow and [`TESTING.md`](TESTING.md) for live, end-to-end feature testing recipes.
549
+
550
+ ---
551
+
552
+ ## πŸ—ΊοΈ Roadmap
553
+
554
+ - [ ] **Docker/devcontainer executor** β€” so `pytest` works on Qiskit-class repos that need BLAS / CUDA / compiled extensions
555
+ - [ ] **Embedding-based relevance** β€” drop-in replacement for the keyword prefilter on very abstract issues
556
+ - [ ] **Parallel triage** β€” one dashboard pane per issue when batching
557
+ - [ ] **LangSmith / Helicone telemetry export**
558
+ - [ ] **Pluggable language adapters** β€” `rustfmt`+`cargo`, `gofmt`+`go vet`, etc.
559
+
560
+ ---
561
+
562
+ ## 🀝 Contributing
563
+
564
+ See [`CONTRIBUTING.md`](CONTRIBUTING.md). Short version: one behaviour change per PR, add a test with every behaviour change, `npm test` must be green on Node 18/20/22.
565
+
566
+ ---
567
+
568
+ ## πŸ“„ License
569
+
570
+ [MIT](LICENSE) β€” use it, fork it, ship it.