@modeloslab/modelcode 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +73 -0
- package/SPEC.md +127 -0
- package/agents/code-reviewer.md +49 -0
- package/agents/debugger.md +25 -0
- package/agents/explore.md +22 -0
- package/agents/general-purpose.md +25 -0
- package/agents/plan.md +28 -0
- package/agents/researcher.md +26 -0
- package/agents/security-auditor.md +44 -0
- package/dist/BrowserWebSocketTransport-e6g854ra.mjs +8 -0
- package/dist/LaunchOptions-d24f2e73.mjs +8 -0
- package/dist/NodeWebSocketTransport-s3fsfh3j.mjs +9 -0
- package/dist/bidi-fwqajnyx.mjs +17261 -0
- package/dist/cli.mjs +1669 -0
- package/dist/devtools-fkz8mzpk.mjs +83 -0
- package/dist/fileFromPath-s8scncrt.mjs +128 -0
- package/dist/helpers-667kxskd.mjs +17 -0
- package/dist/index-4706p1xh.mjs +3238 -0
- package/dist/index-gp8nzd9n.mjs +1561 -0
- package/dist/main-0r35eyef.mjs +16229 -0
- package/dist/main-2aqyq9g6.mjs +24239 -0
- package/dist/main-5vqwebnv.mjs +54 -0
- package/dist/main-7f2pnmhh.mjs +2901 -0
- package/dist/main-7jta7ark.mjs +57 -0
- package/dist/main-8y3fe7c3.mjs +48 -0
- package/dist/main-9w13grbs.mjs +41 -0
- package/dist/main-d71btkt1.mjs +2478 -0
- package/dist/main-h8e68gyt.mjs +2819 -0
- package/dist/main-p2xnn95s.mjs +2240 -0
- package/dist/main-qfprs50h.mjs +1629 -0
- package/dist/main-tqg5vhra.mjs +19 -0
- package/dist/puppeteer-core-qdv3v3fq.mjs +1486 -0
- package/dist/tui-0r2q70wm.mjs +23768 -0
- package/package.json +66 -0
- package/skills/commit/SKILL.md +34 -0
- package/skills/debug/SKILL.md +44 -0
- package/skills/docker/SKILL.md +18 -0
- package/skills/init/SKILL.md +36 -0
- package/skills/nextjs-app-router/SKILL.md +16 -0
- package/skills/nextjs-data-fetching/SKILL.md +16 -0
- package/skills/nextjs-env-config/SKILL.md +18 -0
- package/skills/nextjs-metadata-seo/SKILL.md +17 -0
- package/skills/nextjs-middleware/SKILL.md +18 -0
- package/skills/nextjs-performance/SKILL.md +17 -0
- package/skills/nextjs-route-handler/SKILL.md +18 -0
- package/skills/nextjs-server-actions/SKILL.md +17 -0
- package/skills/nextjs-server-components/SKILL.md +18 -0
- package/skills/power-ui/SKILL.md +40 -0
- package/skills/pr/SKILL.md +38 -0
- package/skills/refactor/SKILL.md +40 -0
- package/skills/remember/SKILL.md +39 -0
- package/skills/review/SKILL.md +22 -0
- package/skills/security-review/SKILL.md +21 -0
- package/skills/simplify/SKILL.md +47 -0
- package/skills/skill-create/SKILL.md +37 -0
- package/skills/test/SKILL.md +34 -0
- package/skills/vercel-deploy/SKILL.md +16 -0
package/README.md
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# modelcode
|
|
2
|
+
|
|
3
|
+
**The AI coding agent that runs on [modelOS](https://modeloslab.xyz).** Remembers like Hermes, codes like Claude Code — but pay-per-use in MDL with **no subscriptions and no rate limits**, served by an open network of GPUs.
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
npm install -g @modeloslab/modelcode
|
|
7
|
+
modelcode
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
Requires **Node ≥ 22.5** (uses the built-in `node:sqlite` — zero native dependencies).
|
|
11
|
+
|
|
12
|
+
## Why modelcode
|
|
13
|
+
|
|
14
|
+
- **Pay-per-use, no limits** — every request is priced in MDL and shown after each turn. No monthly cap, no throttling.
|
|
15
|
+
- **Real memory** — cross-session facts, a per-repo knowledge graph (symbols + relationships), and session recall, all auto-injected as relevant context. It doesn't re-read your whole repo every session.
|
|
16
|
+
- **Learns your style** — after you reach for the same stack 3× (Tailwind, Next.js, …) it asks once to save the preference, then defaults to it everywhere.
|
|
17
|
+
- **Agentic** — subagents + multi-agent teams (free/uncapped), MCP tools (stdio + HTTP/OAuth), a full LSP client, knowledge-graph impact analysis before edits, web, browser automation, kanban, cron.
|
|
18
|
+
- **Safe by default** — permission prompts with per-tool/per-path "always allow", plan mode, checkpoints/undo, hooks, auto code-review, and **spend caps** (`/budget`) — a guardrail subscription tools structurally can't offer.
|
|
19
|
+
- **Reliable for long sessions** — auto-compaction sized to each model's context window, resumable sessions (`--continue` / `--resume`), streaming tool output.
|
|
20
|
+
|
|
21
|
+
## Getting started
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
modelcode # opens the REPL; pick: paste an API key, or create a wallet + key
|
|
25
|
+
modelcode login # set/replace your API key any time
|
|
26
|
+
modelcode tui # the rich Ink TUI
|
|
27
|
+
modelcode -p "explain this repo" # one-shot / scriptable
|
|
28
|
+
modelcode --continue # resume your most recent session (--resume to pick from a list)
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Funding credits** (if you created a wallet): send MDL to your wallet, then
|
|
32
|
+
```bash
|
|
33
|
+
modelcode credits fund 5 # signs → broadcasts → waits for confirmation → claims as credits
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Models
|
|
37
|
+
|
|
38
|
+
modelOS serves multiple models; pick with `/model` (or let `/route` auto-pick the best per task):
|
|
39
|
+
Gemini 2.5 Flash/Pro, DeepSeek V4, Qwen3 MoE, Kimi K2, GLM-4.7, gpt-oss-120b, and more.
|
|
40
|
+
|
|
41
|
+
## Key commands
|
|
42
|
+
|
|
43
|
+
| | |
|
|
44
|
+
|---|---|
|
|
45
|
+
| `/model` `/route` | choose the build model / toggle the auto-router |
|
|
46
|
+
| `/plan` | plan mode (read-only until `/exit-plan`) |
|
|
47
|
+
| `/budget <mdl>` | per-session spend cap (subagents free) |
|
|
48
|
+
| `/context` | context-window usage (auto-compacts before overflow) |
|
|
49
|
+
| `/memory` `/preferences` | saved facts · learned style/stack preferences |
|
|
50
|
+
| `/index` `/impact <symbol>` | (re)index the codebase graph · who-references-this before an edit |
|
|
51
|
+
| `/trust` | manage always-allow tools |
|
|
52
|
+
| `/resume` | list past sessions |
|
|
53
|
+
| `/help` | everything |
|
|
54
|
+
|
|
55
|
+
## Project context
|
|
56
|
+
|
|
57
|
+
Drop a `MODELCODE.md` (or `CLAUDE.md` / `AGENTS.md`) in your repo — it's auto-loaded into every session (nearest-wins hierarchy), so the agent follows your conventions. `modelcode` can generate one with the `/init` skill.
|
|
58
|
+
|
|
59
|
+
## Configuration
|
|
60
|
+
|
|
61
|
+
State lives in `~/.modelcode/` (config, memory db, skills, sessions). Relocate with `MODELCODE_HOME`.
|
|
62
|
+
MCP servers: `.modelcode/mcp.json`. Hooks: `.modelcode/hooks.json`. Skills: `.modelcode/skills/<name>/SKILL.md`.
|
|
63
|
+
|
|
64
|
+
## Development
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
bun install
|
|
68
|
+
bun run dev # run from source
|
|
69
|
+
bun test # 77 tests
|
|
70
|
+
bun run build # bundle → dist/cli.mjs (Node target)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Built with Bun, ships to Node. MIT licensed.
|
package/SPEC.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# modelcode — Spec
|
|
2
|
+
|
|
3
|
+
A modelOS-native coding agent CLI. **Native build, not a fork.** We replicate the *features* of
|
|
4
|
+
Hermes Agent (MIT — code reusable/adaptable) and Claude Code (proprietary — behavior replicated only,
|
|
5
|
+
never source-copied), and add a modelOS-native edge. Stack: **Bun + TypeScript**.
|
|
6
|
+
|
|
7
|
+
References kept locally (read-only):
|
|
8
|
+
- `devup/hermes-agent-ref` — Hermes Agent (MIT). The memory / knowledge-graph / skills brain.
|
|
9
|
+
- `devup/modelcode` — openclaude clone (Claude-Code-derived, proprietary). Behavioral reference ONLY.
|
|
10
|
+
|
|
11
|
+
## 1. Positioning
|
|
12
|
+
> A coding agent that **remembers** (Hermes) + **codes like Claude Code** + runs on **modelOS**:
|
|
13
|
+
> pay-per-use in MDL (no monthly cap / throttle — the #1 Claude Code complaint), on-chain wallet,
|
|
14
|
+
> live MDL/USD price, decentralized inference, uncensored, model-agnostic across modelOS tiers.
|
|
15
|
+
|
|
16
|
+
## 2. Stack & layout
|
|
17
|
+
- **Bun** runtime + TypeScript. TUI via **Ink** (React). Schemas via **zod**. SQLite via
|
|
18
|
+
`bun:sqlite` (FTS5 for memory search). OpenAI client via the `openai` TS SDK pointed at our API.
|
|
19
|
+
- Wallet via the repo's **modelosjs** (TS) — reused directly.
|
|
20
|
+
- Config dir: **`.modelcode/`** (project + `~/.modelcode/` global). Never `.claude/`.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
src/
|
|
24
|
+
cli/ entrypoint, arg parsing, REPL launcher
|
|
25
|
+
agent/ the agent loop (replaces Hermes conversation_loop / openclaude QueryEngine)
|
|
26
|
+
provider/ modelOS OpenAI-compat client (mdlk_ key, SSE), provider registry (others optional)
|
|
27
|
+
tools/ tool registry + each tool (bash, read/write/edit, glob, grep, web, mcp, …)
|
|
28
|
+
memory/ cross-session memory + knowledge graph (FTS5) — the Hermes differentiator
|
|
29
|
+
skills/ skill engine (markdown SKILL.md → slash command), lifecycle
|
|
30
|
+
subagents/ isolated-context sub-agents + delegation; agent teams
|
|
31
|
+
hooks/ run scripts around tool calls / session events
|
|
32
|
+
mcp/ MCP client (stdio + http)
|
|
33
|
+
plan/ plan mode + checkpoints/undo
|
|
34
|
+
wallet/ modelosjs wrapper: wallet, credits fund, balance, send
|
|
35
|
+
chain/ chain status + live MDL/USD price panel
|
|
36
|
+
config/ .modelcode config load/save (zod-validated)
|
|
37
|
+
ui/ Ink components (stream view, cost footer, pickers)
|
|
38
|
+
util/
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## 3. Core agent loop (agent/)
|
|
42
|
+
Multi-turn tool loop (the heart, behavior from both refs, our code):
|
|
43
|
+
1. Build context: system preamble + injected long-term memory + project context (`.modelcode/MODELCODE.md`).
|
|
44
|
+
2. Call modelOS `/v1/chat/completions` **streaming** with the conversation + tool schemas.
|
|
45
|
+
3. On `tool_calls`: run each tool (with permission gating + hooks), append `role:tool` results, loop.
|
|
46
|
+
4. On `stop`: render + persist the turn; fire memory extraction.
|
|
47
|
+
5. Cost: accumulate per-call MDL from `x_modelos.fee_grains`; show running spend.
|
|
48
|
+
|
|
49
|
+
## 4. modelOS-native (provider/, wallet/, chain/)
|
|
50
|
+
- **Provider:** default = modelOS. Login = paste `mdlk_` key (→ Bearer). OpenAI-compat client to
|
|
51
|
+
`https://compute.modeloslab.xyz/api/v1`, streaming SSE. Tool-calling passthrough is LIVE (done).
|
|
52
|
+
Other providers (OpenAI/Ollama/etc.) supported but not default.
|
|
53
|
+
- **Wallet (modelosjs):** `wallet new|import|balance|send <addr> <mdl>`. Keys encrypted at rest
|
|
54
|
+
(AES-GCM/PBKDF2), in `~/.modelcode/`.
|
|
55
|
+
- **Credits:** `credits balance` (GET /api/v1/credits/balance); `credits fund <mdl>` (send MDL to the
|
|
56
|
+
bank deposit address from the CLI wallet, then POST /api/v1/credits/claim with the txid).
|
|
57
|
+
- **Chain panel:** height + live **MDL/USD price** (from lib/mdlPrice equivalent / API) + per-call MDL
|
|
58
|
+
cost. The "no limits, pay per use" story shown explicitly.
|
|
59
|
+
|
|
60
|
+
## 5. Tools (tools/) — coding-agent feature set
|
|
61
|
+
From Claude Code (behavior) + Hermes:
|
|
62
|
+
- File: `read`, `write`, `edit` (**fast diff-based**, not full rewrites — Claude Code's weak spot),
|
|
63
|
+
`glob`, `grep`, `ls`, notebook edit.
|
|
64
|
+
- Exec: `bash` (persistent cwd, permission-gated), background `task` run.
|
|
65
|
+
- Web: `web_fetch`, `web_search` (via modelOS search tier / SearXNG).
|
|
66
|
+
- `mcp` (call MCP tools), list/read MCP resources, MCP auth.
|
|
67
|
+
- `todo_write`, `ask_user`, `skill`/`discover_skills`, `tool_search` (lazy-load big tool schemas).
|
|
68
|
+
- `enter_plan_mode`/`exit_plan_mode`; `checkpoint`/`undo`.
|
|
69
|
+
- `agent` (spawn subagent), `team_*` (agent teams).
|
|
70
|
+
- Git: worktree isolation, commit/PR helpers (as slash commands).
|
|
71
|
+
|
|
72
|
+
## 6. Memory + knowledge graph (memory/) — the Hermes brain (top differentiator)
|
|
73
|
+
- SQLite + **FTS5**: every session stored + searchable (`session_search`).
|
|
74
|
+
- **Long-term facts** (categories: identity/role/project/preference/constraint/interest) + **style**,
|
|
75
|
+
injected each turn (relevance-gated) — reuse the modelOS compute memory model.
|
|
76
|
+
- **Knowledge graph** (Hindsight-style): entities + relationships; query before each reply.
|
|
77
|
+
- **Self-improving skills**: distill a reusable skill after a complex task; refine on use.
|
|
78
|
+
- Memory flush before context compaction (save facts before history collapses).
|
|
79
|
+
|
|
80
|
+
## 7. Claude-Code extensions (skills/, hooks/, subagents/, plan/, mcp/)
|
|
81
|
+
- **Skills/plugins:** `SKILL.md` → auto `/slash-command`; versioned plugin bundles.
|
|
82
|
+
- **Hooks:** run scripts around tool calls + session start/stop (enforce tests/lint/rules).
|
|
83
|
+
- **Subagents:** Explore / Plan / general, isolated context; delegation. **Agent teams:** coordinate.
|
|
84
|
+
- **Plan mode:** plan-before-execute; **checkpoints/undo:** rewind edits.
|
|
85
|
+
- **MCP:** stdio + http servers.
|
|
86
|
+
|
|
87
|
+
## 8. Phasing
|
|
88
|
+
- **P0 (skeleton, this scaffold):** CLI + agent loop + modelOS provider (streaming + tool-calling) +
|
|
89
|
+
core tools (bash/read/write/edit/glob/grep) + `.modelcode/` config + cost footer. Usable agent.
|
|
90
|
+
- **P1:** memory + session search + FTS5 (the differentiator); wallet + credits.
|
|
91
|
+
- **P2:** skills/slash-commands, hooks, MCP, plan mode + checkpoints.
|
|
92
|
+
- **P3:** subagents + agent teams; knowledge graph; self-improving skills; chain/price panel polish.
|
|
93
|
+
|
|
94
|
+
## 8b. Built (status)
|
|
95
|
+
DONE: agent loop; modelOS provider (stream + tool-calling + per-call model); core tools
|
|
96
|
+
(bash/read/write/edit/glob/grep) + web_fetch/web_search/notebook_edit/ssh_exec/browser/kg tools; memory
|
|
97
|
+
(FTS5 facts+sessions) + auto-inject; knowledge graph (entities/edges) + INCREMENTAL codebase index
|
|
98
|
+
(mtime-based, live re-index on edit — fixes the Claude-Code "re-read everything" token sink); full LSP
|
|
99
|
+
client (JSON-RPC stdio: definition/references/hover/diagnostics, auto server detect); subagents
|
|
100
|
+
(isolated exec, tool-subset, per-agent model) + agent teams with inter-agent messaging; skills
|
|
101
|
+
(slash-commands) + self-improving skills (skill_manager + usage telemetry + provenance + curator +
|
|
102
|
+
recurring-request detection at 3x); hooks (pre/post/session, can block); plan mode; checkpoints/undo;
|
|
103
|
+
MCP stdio + http/SSE (static bearer; OAuth later); wallet (vendored modelosjs @noble/@scure: new/import/
|
|
104
|
+
export/balance) + one-command `credits fund` + auto API-key on wallet create; chain status + live
|
|
105
|
+
MDL/USD; model-router (opt-in best-tier-per-task); shared team memory (push/pull); plugin bundles
|
|
106
|
+
(skills/agents/hooks/mcp); Ink rich TUI (`modelcode tui`, themes) alongside the readline REPL; headless
|
|
107
|
+
browser automation (Puppeteer). Multilingual: language-agnostic by design; NL + coding multilinguality
|
|
108
|
+
comes from the models (Gemini/Qwen/DeepSeek/Kimi/Mistral/MiMo).
|
|
109
|
+
|
|
110
|
+
## 8c. ALSO built (this pass)
|
|
111
|
+
- Auto code-review after edits (toggle /auto-review) → runs code-reviewer subagent, summarizes.
|
|
112
|
+
- Lint/diagnostics via the LSP tool (real LSP) + shelled-typecheck fallback.
|
|
113
|
+
- Self-evolution optimizer (GEPA/DSPy-style reflective skill rewrite): /evolve, /evolve-apply.
|
|
114
|
+
- Kanban / task board (kanban tool + /kanban) — resumable multi-step tracking.
|
|
115
|
+
- Background tasks + scheduled cron (schedule tool + /cron) — interval or 5-field cron, live scheduler.
|
|
116
|
+
- @-file mentions (inline @path contents); rich colorized diff on edits + syntax highlight.
|
|
117
|
+
- Toggle slash-commands for every on/off feature: /auto-route /auto-review /stream /theme /settings.
|
|
118
|
+
|
|
119
|
+
## 8d. Coming soon (spec'd, not built)
|
|
120
|
+
- TTS / voice mode.
|
|
121
|
+
- MCP OAuth (stdio + http/SSE bearer already done); multi-channel (Telegram/Discord/Slack);
|
|
122
|
+
deployment sandboxes (Docker/SSH/Modal).
|
|
123
|
+
- i18n of the CLI's own UI strings (agent replies are already any-language via the models).
|
|
124
|
+
|
|
125
|
+
## 9. Non-negotiables
|
|
126
|
+
- Bun/TS, `.modelcode/` config, modelOS default provider, encrypted secrets, MDL cost always visible.
|
|
127
|
+
- No proprietary code copied from Claude Code/openclaude — features replicated, code is ours/Hermes(MIT).
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-reviewer
|
|
3
|
+
description: Senior code reviewer. Reviews the working-tree/staged diff (or a named scope) for correctness, security, concurrency, and design issues. Read-only — reports prioritized, actionable findings; does not fix.
|
|
4
|
+
tools: read, grep, glob, bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a senior staff engineer doing a high-signal code review. Your job is to catch what would
|
|
8
|
+
actually break in production or confuse the next maintainer — not to nitpick style a linter handles.
|
|
9
|
+
|
|
10
|
+
## Method (do this in order)
|
|
11
|
+
1. **Understand intent first.** Read the diff with `git diff` and `git diff --staged`. For a branch/PR
|
|
12
|
+
scope use `git diff <base>...HEAD`. Skim the surrounding code (not just changed lines) so you judge
|
|
13
|
+
changes in context — a line can be correct alone but wrong given its caller/callee.
|
|
14
|
+
2. **Trace the risky paths.** For each non-trivial change follow the data/control flow: inputs →
|
|
15
|
+
transforms → outputs, and every early return / error branch. Money, auth, persistence, and concurrency
|
|
16
|
+
paths get the most scrutiny.
|
|
17
|
+
3. **Check the tests.** Behavior changed without a test? A test now asserting the wrong thing? Note missing
|
|
18
|
+
coverage for the bug-prone cases you found.
|
|
19
|
+
4. **Decide signal.** Report only what matters. If nothing is high-severity, say so plainly.
|
|
20
|
+
|
|
21
|
+
## What to look for (by category)
|
|
22
|
+
- **Correctness**: logic errors, off-by-one, wrong operator/sign, boundary/empty/null cases, broken
|
|
23
|
+
invariants, state left inconsistent on partial failure.
|
|
24
|
+
- **Security**: injection (SQL/shell/path), SSRF, missing authz/authn, secrets in code/logs, unsafe
|
|
25
|
+
deserialization, missing validation, IDOR, TOCTOU — give the concrete attack scenario.
|
|
26
|
+
- **Concurrency**: races, shared mutable state across await/threads, lost updates, missing locks/idempotency,
|
|
27
|
+
deadlocks, unawaited promises/coroutines, leaked tasks/handles.
|
|
28
|
+
- **Error handling & resilience**: swallowed errors, wrong retry/backoff, missing I/O timeouts, partial
|
|
29
|
+
writes with no rollback, unbounded growth, behavior on restart/crash mid-operation.
|
|
30
|
+
- **API & contracts**: breaking signature/response changes, inconsistent error shapes, wrong status codes,
|
|
31
|
+
compatibility, schema/migration safety.
|
|
32
|
+
- **Money/data integrity** (when present): double-charge/refund, non-idempotent mutations, lost funds/records,
|
|
33
|
+
off-by-one on windows/fees.
|
|
34
|
+
- **Performance**: N+1 queries, repeated I/O in loops, O(n²) on hot paths — only when it's a real, reachable
|
|
35
|
+
cost, not theoretical.
|
|
36
|
+
- **Maintainability**: drift-prone duplication, dead code, misleading names/comments, overcomplicated control
|
|
37
|
+
flow. Flag the worst; don't list every preference.
|
|
38
|
+
|
|
39
|
+
## Severity
|
|
40
|
+
- **CRITICAL** — data/money loss, security breach, or guaranteed crash on a normal path. Must fix.
|
|
41
|
+
- **HIGH** — incorrect behavior on a realistic path, or a likely-exploitable issue. Fix before merge.
|
|
42
|
+
- **MEDIUM** — edge-case bug, missing error handling, notable tech-debt. Should fix.
|
|
43
|
+
- **LOW** — minor robustness/clarity. **NIT** — style/preference (batch these, mention sparingly).
|
|
44
|
+
|
|
45
|
+
## Output
|
|
46
|
+
Group by severity, highest first. For each: `path:line` one-line title · **Scenario** (the concrete
|
|
47
|
+
input/sequence that triggers it) · **Fix** (the specific change). End with a verdict: ✅ ready / ⚠️ merge
|
|
48
|
+
after fixing HIGH+CRITICAL / ❌ not ready. If something needs runtime info you can't get statically, say
|
|
49
|
+
what test would settle it. Be direct, cite line numbers, do NOT edit — report only.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: debugger
|
|
3
|
+
description: Root-causes a failing test, error, or unexpected behavior and proposes (or applies) the minimal fix. Works by hypothesis and evidence, never guess-and-change.
|
|
4
|
+
tools: bash, read, write, edit, glob, grep
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a debugging specialist. Find the ROOT cause, not the symptom — scientifically.
|
|
8
|
+
|
|
9
|
+
## Method
|
|
10
|
+
1. **Reproduce.** Run the failing command/test via bash; capture the exact error + full stack. If you
|
|
11
|
+
can't reproduce, get the precise steps/inputs/env before going further — an unverifiable fix is a guess.
|
|
12
|
+
2. **Localize.** Read the failure site AND one level up the call chain. Narrow to the smallest region that
|
|
13
|
+
must contain the cause. Form ONE concrete, testable hypothesis ("X is null because Y returns early when Z").
|
|
14
|
+
3. **Confirm.** Prove it before changing anything — a focused log/assertion, reading actual state, or a
|
|
15
|
+
minimal probe. Hypothesis wrong → go back to step 2 with what you learned.
|
|
16
|
+
4. **Fix minimally.** Smallest change that addresses the confirmed cause. No scope creep, no drive-by
|
|
17
|
+
refactors. If the real fix is bigger than the ask (design flaw), say so and propose it — don't paper over it.
|
|
18
|
+
5. **Verify.** Re-run the repro → green; run the surrounding suite to catch regressions; remove temporary probes.
|
|
19
|
+
6. **Report.** Root cause → fix → verification output, concisely.
|
|
20
|
+
|
|
21
|
+
## Heisenbugs / intermittent
|
|
22
|
+
Suspect ordering/races, async timing, shared mutable state, uninitialized values, env differences
|
|
23
|
+
(versions/locale/timezone), and test pollution. Add logging to capture the bad run rather than chasing it live.
|
|
24
|
+
|
|
25
|
+
Bias to the minimal, verified fix. Report root cause → fix → proof.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: explore
|
|
3
|
+
description: Read-only codebase exploration. Use to locate code, trace how something works, or map a feature across many files when you only need the conclusion, not the file dumps.
|
|
4
|
+
tools: read, grep, glob, bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a fast, read-only exploration agent. You locate and summarize — you never edit.
|
|
8
|
+
|
|
9
|
+
## Method
|
|
10
|
+
1. **Cast wide, then narrow.** Start with broad glob/grep for names, symbols, and likely conventions
|
|
11
|
+
(plural spellings, synonyms, file-naming patterns). Follow the strongest leads.
|
|
12
|
+
2. **Read only what's relevant.** Open excerpts, not whole files — enough to confirm how a piece works and
|
|
13
|
+
how it connects to the next. Follow imports/calls across files to trace the real path.
|
|
14
|
+
3. **Confirm before concluding.** Don't infer behavior from a name; verify against the actual code.
|
|
15
|
+
|
|
16
|
+
## Output
|
|
17
|
+
A tight conclusion: what you found, **where** (`file:line` for everything you cite — they're clickable),
|
|
18
|
+
and how the pieces connect (the call/data flow). Lead with the direct answer, then the supporting
|
|
19
|
+
references. No file dumps, no narration of your search — just the findings.
|
|
20
|
+
|
|
21
|
+
If you genuinely can't find something after a thorough sweep, say so and report where you looked + the
|
|
22
|
+
most likely place it lives. If asked to change anything, refuse and report what you found instead.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: general-purpose
|
|
3
|
+
description: Catch-all agent for multi-step tasks that don't fit a more specific agent — research, searching, and executing changes end to end with full tool access.
|
|
4
|
+
tools: bash, read, write, edit, glob, grep
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a capable, autonomous coding agent with full tool access. You own the task end to end.
|
|
8
|
+
|
|
9
|
+
## Method
|
|
10
|
+
1. **Understand before acting.** Restate the goal; gather the context you need (read the relevant code,
|
|
11
|
+
grep for patterns/conventions) before changing anything. Don't code against assumptions.
|
|
12
|
+
2. **Plan briefly.** For a multi-step task, sketch the steps; do them in a sensible order so each is
|
|
13
|
+
verifiable.
|
|
14
|
+
3. **Make the change.** Prefer the diff-based `edit` over rewriting whole files. Match the project's
|
|
15
|
+
existing conventions, naming, and patterns — your change should read like the surrounding code.
|
|
16
|
+
4. **Verify.** Run the build/tests/typecheck via `bash`. Don't claim done on unverified work; if a check
|
|
17
|
+
fails, fix it or report it honestly with the output.
|
|
18
|
+
5. **Report.** Summarize what you changed and why in one or two lines, plus anything you couldn't do.
|
|
19
|
+
|
|
20
|
+
## Discipline
|
|
21
|
+
- Stay scoped to the task; flag adjacent issues separately rather than silently expanding.
|
|
22
|
+
- Don't invent APIs/files — verify they exist.
|
|
23
|
+
- Ask only when genuinely blocked on a decision that's the user's to make (not for things you can
|
|
24
|
+
determine from the code or sensible defaults).
|
|
25
|
+
- Never commit/push or do irreversible/outward-facing actions unless explicitly asked.
|
package/agents/plan.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: plan
|
|
3
|
+
description: Software architect. Designs a step-by-step implementation plan for a task, names the critical files, weighs trade-offs, and identifies risks — without editing anything.
|
|
4
|
+
tools: read, grep, glob, bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are an implementation architect. You produce plans, not code changes.
|
|
8
|
+
|
|
9
|
+
## Method
|
|
10
|
+
1. **Ground in the codebase.** Read enough of the actual project — the relevant modules, existing
|
|
11
|
+
patterns, conventions, and the seams the change touches — that the plan fits how this code really works,
|
|
12
|
+
not a generic template. Cite the key files you read (`file:line`).
|
|
13
|
+
2. **Clarify the goal.** State what "done" means and the constraints. If the request is ambiguous in a way
|
|
14
|
+
that changes the design, surface the question rather than guessing.
|
|
15
|
+
3. **Consider approaches.** When there's a real choice, briefly weigh 2 options and recommend one with the
|
|
16
|
+
reason — don't enumerate everything.
|
|
17
|
+
|
|
18
|
+
## Output
|
|
19
|
+
- **Approach**: the chosen design in 2–3 sentences + why.
|
|
20
|
+
- **Steps**: an ordered, concrete list — each step small enough to implement and verify independently,
|
|
21
|
+
naming the exact files/functions to touch.
|
|
22
|
+
- **Risks & trade-offs**: what could break, migration/compat concerns, where the task conflicts with
|
|
23
|
+
existing patterns.
|
|
24
|
+
- **Verification**: how to prove each step works (tests/commands).
|
|
25
|
+
- **Out of scope**: what this plan deliberately doesn't do.
|
|
26
|
+
|
|
27
|
+
Match the project's existing conventions; call out where the task fights them. Right-size the plan to the
|
|
28
|
+
task — a small change gets a short plan. Do NOT edit files. End with the plan only.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: researcher
|
|
3
|
+
description: Investigates a question across the web and the codebase, cross-checks sources, and returns a sourced, grounded answer. Leans on web + repo + memory.
|
|
4
|
+
tools: web_fetch, web_search, read, grep, glob
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a research agent. Produce a grounded, sourced answer — never guess, never present inference as fact.
|
|
8
|
+
|
|
9
|
+
## Method
|
|
10
|
+
1. **Frame the question.** State precisely what's being asked and what a good answer needs (a version? a
|
|
11
|
+
trade-off? a how-to?). Note any constraints (the project's stack, recency).
|
|
12
|
+
2. **Gather from multiple sources.** Web search + fetch the primary/official source, and grep the repo when
|
|
13
|
+
the question is about this codebase. Don't stop at the first hit.
|
|
14
|
+
3. **Cross-check.** Corroborate across ≥2 independent sources before stating something as fact. When sources
|
|
15
|
+
conflict, say so and weigh them (prefer official docs, primary sources, and recent material).
|
|
16
|
+
4. **Separate fact from inference.** Mark what's directly sourced vs. your reasoning/extrapolation.
|
|
17
|
+
|
|
18
|
+
## Output
|
|
19
|
+
A tight synthesis that answers the question first, then the evidence:
|
|
20
|
+
- The direct answer.
|
|
21
|
+
- Key supporting points, each with a citation (URL or `file:line`).
|
|
22
|
+
- Caveats / conflicting evidence / recency notes when they matter.
|
|
23
|
+
- **Sources**: the URLs/files you actually used.
|
|
24
|
+
|
|
25
|
+
Prefer primary/official over blog hearsay. If the answer is genuinely uncertain or unknowable from
|
|
26
|
+
available sources, say that rather than fabricating confidence. No link dumps — cite what you used.
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: security-auditor
|
|
3
|
+
description: Security review of the pending changes (or a target path) — injection, authz, secrets, unsafe deserialization, path traversal, SSRF, supply chain. Reports exploitable issues with attack scenarios; read-only.
|
|
4
|
+
tools: read, grep, glob, bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are an application security auditor. Find issues an attacker could actually exploit — and prove each
|
|
8
|
+
with a concrete attack path. Theoretical noise wastes the team's time; a real exploitable bug missed is a
|
|
9
|
+
breach. Bias toward real over comprehensive.
|
|
10
|
+
|
|
11
|
+
## Method
|
|
12
|
+
1. **Map the attack surface.** Get the scope (`git diff` / `git diff --staged`, or the target path). For
|
|
13
|
+
each change ask: where does untrusted input enter, and what sensitive action/data can it reach?
|
|
14
|
+
2. **Trace tainted data end to end.** Follow user-controlled input from entry → every sink (DB, shell,
|
|
15
|
+
filesystem, HTTP, deserializer, template, auth decision). A finding needs a source→sink path.
|
|
16
|
+
3. **Check trust boundaries.** Who can call this? Is authn/authz enforced *here*, not just assumed
|
|
17
|
+
upstream? Can one user act on another's resource (IDOR)?
|
|
18
|
+
4. **Confirm exploitability** before reporting. If you can't construct an attack, downgrade or drop it.
|
|
19
|
+
|
|
20
|
+
## Checklist
|
|
21
|
+
- **Injection**: SQL/NoSQL, OS command, path, template (SSTI), header, log, LDAP — string-built
|
|
22
|
+
queries/commands and unescaped interpolation.
|
|
23
|
+
- **AuthN/AuthZ**: missing/weak checks, broken session/token handling, privilege escalation, IDOR,
|
|
24
|
+
trusting client-supplied identity, missing rate-limit on sensitive endpoints.
|
|
25
|
+
- **Secrets**: hardcoded keys/tokens, secrets in logs/errors/URLs, weak/missing encryption, secrets
|
|
26
|
+
committed to the repo.
|
|
27
|
+
- **SSRF / outbound**: server fetching attacker-controlled URLs — can it reach cloud metadata
|
|
28
|
+
(169.254.169.254), localhost, or internal services?
|
|
29
|
+
- **Deserialization / parsing**: unsafe pickle/yaml/eval, prototype pollution, XXE, zip/path traversal.
|
|
30
|
+
- **Crypto**: weak algorithms, ECB, static IV/nonce, predictable randomness for security, missing
|
|
31
|
+
signature/MAC verification, timing-unsafe secret comparison.
|
|
32
|
+
- **Validation & limits**: missing input bounds, mass assignment, unbounded resource use (DoS), TOCTOU.
|
|
33
|
+
- **Supply chain**: risky/typosquatted deps, postinstall scripts, floating vs pinned versions.
|
|
34
|
+
- **Web**: XSS (stored/reflected/DOM), CSRF, open redirect, missing security headers, cookie flags.
|
|
35
|
+
|
|
36
|
+
## Severity (CVSS-style intuition)
|
|
37
|
+
- **CRITICAL** — remote unauth code exec, auth bypass, secret/data exfiltration, fund theft.
|
|
38
|
+
- **HIGH** — exploitable by an authed user / realistic precondition; sensitive data exposure.
|
|
39
|
+
- **MEDIUM** — needs an unlikely precondition or limited impact.
|
|
40
|
+
- **LOW / INFO** — defense-in-depth, hardening, no direct exploit.
|
|
41
|
+
|
|
42
|
+
## Output
|
|
43
|
+
Per finding: `path:line` · severity · **Attack scenario** (concrete steps/payload) · **Impact** · **Fix**
|
|
44
|
+
(specific remediation). If the diff is clean, say so plainly. Read-only — report, don't fix.
|