@alexandrealvaro/agentic 0.1.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Alexandre Alvaro
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,108 @@
1
+ # Agentic Development
2
+
3
+ A starter kit for engineering production code with LLMs. Lean templates and init prompts grounded in established standards: [Anthropic Skills](https://code.claude.com/docs/en/skills), [Claude Code subagents](https://code.claude.com/docs/en/sub-agents), [agents.md](https://agents.md), Nygard ADRs, and [Google Labs DESIGN.md](https://github.com/google-labs-code/design.md).
4
+
5
+ > **Status:** v0.1.0-beta — early. The CLI works for AGENTS.md bootstrap (`npx @alexandrealvaro/agentic@beta init`); other artifact commands are in development. The manual workflow below stays fully usable for everything. Report rough edges via [GitHub Issues](https://github.com/alexandremendoncaalvaro/agentic-development/issues).
6
+
7
+ ## Prerequisites
8
+
9
+ An agentic coding tool that reads markdown files. Examples here use **Claude Code** and **Codex CLI** (primary tools the author uses); the kit also works with [Antigravity](https://antigravity.google), [Gemini CLI](https://github.com/google-gemini/gemini-cli), Cursor, Continue, Aider, and any other tool that follows the [agents.md](https://agents.md) open standard.
10
+
11
+ For the CLI path: Node.js 18+ (only needed if you use `npx`/`npm install`; the manual workflow has no runtime dependencies).
12
+
13
+ ## What's here
14
+
15
+ - **[WORKFLOW.md](WORKFLOW.md)** — the philosophy: how to engineer with LLMs without vibe-coding. Read this first.
16
+ - **[templates/](templates/)** — pure templates. Copy and fill, or have your agent fill them via the matching prompt.
17
+ - **[prompts/](prompts/)** — init prompts to paste into your agent. Each generates one artifact from its template.
18
+
19
+ ## How to use
20
+
21
+ This kit is a **reference repository, not a per-project dependency.** Templates and prompts stay here; only the generated artifacts (AGENTS.md, ARCHITECTURE.md, ADRs, etc.) end up in your target project. The pattern matches how [GitHub's spec-kit](https://github.com/github/spec-kit) and [cookiecutter](https://cookiecutter.readthedocs.io/) handle distribution — templates in one place, outputs in another, never mixed.
22
+
23
+ ### Quickstart (CLI, beta)
24
+
25
+ For AGENTS.md bootstrap, the fastest path is the CLI — no clone needed:
26
+
27
+ ```bash
28
+ cd your-project
29
+ npx @alexandrealvaro/agentic@beta init
30
+ ```
31
+
32
+ The CLI auto-detects greenfield/brownfield/audit mode, then prints a self-contained prompt (templates inlined) for you to paste into your agent. Add `--mode greenfield|brownfield|audit` to override detection.
33
+
34
+ > **Beta scope:** Only `init` (AGENTS.md) is in v0.1.0-beta. For ARCHITECTURE.md, ADRs, design tokens, skills, and subagents, use the [manual workflow](#manual-workflow) below until those CLI commands ship.
35
+
36
+ ### Manual workflow
37
+
38
+ Required for every artifact except AGENTS.md right now, and useful as a fallback for `init` if you want to run prompts directly without the CLI.
39
+
40
+ #### Setup (once)
41
+
42
+ 1. Read [`WORKFLOW.md`](WORKFLOW.md) — the philosophy that everything else follows from.
43
+ 2. Clone this repo to a location outside your projects:
44
+
45
+ ```bash
46
+ git clone https://github.com/alexandremendoncaalvaro/agentic-development.git ~/dev/agentic-development
47
+ ```
48
+
49
+ **Never copy this kit into your target project.** Only the generated artifacts go there.
50
+
51
+ #### Give the agent access to templates
52
+
53
+ When you paste a prompt from `prompts/`, the agent needs to read the matching template in `templates/`. Two ways to grant access:
54
+
55
+ - **Recommended** — start the agent with the kit added. In Claude Code, from your target project's directory:
56
+ ```
57
+ claude --add-dir <path-to-this-kit>
58
+ ```
59
+ The agent can then read templates via the relative paths the prompts use.
60
+ - **Standalone** — copy the content of both `prompts/<X>.md` and `templates/<X>.md` into the agent session. No setup, slightly more friction.
61
+
62
+ #### Workflows by scenario
63
+
64
+ **New project (greenfield).** Initialize git and set up the project structure, then paste `prompts/agents.md`. The agent interviews you about stack, build, test, conventions, and security boundaries, then writes `AGENTS.md` at the repo root. As architectural and design decisions emerge, use `prompts/architecture.md`, `prompts/adr.md`, `prompts/design.md`, `prompts/skill.md`, and `prompts/subagent.md` for each new artifact.
65
+
66
+ **Existing project (brownfield).** Same prompts. The project-wide ones (`prompts/agents.md`, `prompts/architecture.md`) instruct the agent to read the codebase, verify what you told them, and flag any mismatch before writing — so contradictions get surfaced instead of trusted. The per-artifact prompts (ADR, design, skill, subagent) work on a single decision or asset and don't need this verification. Backfill ADRs only for decisions that matter going forward; don't try to rewrite history.
67
+
68
+ **Revisiting / auditing existing specs.** When specs may have drifted from code, paste:
69
+
70
+ > *"Read AGENTS.md (or ARCHITECTURE.md). Compare with the current state of the codebase. For every place where the spec disagrees with the code, list the disagreement and suggest whether the spec or the code should change. Do not rewrite the spec yourself — flag and wait."*
71
+
72
+ Apply judgment manually; don't let the agent rewrite specs unattended.
73
+
74
+ **Project already built with agents.** Treat missing artifacts as brownfield (generate via the relevant prompt) and existing artifacts as audit (run the comparison prompt above).
75
+
76
+ #### What ends up in your target project
77
+
78
+ Only the generated outputs — never templates, prompts, or this guide:
79
+
80
+ ```
81
+ your-project/
82
+ ├── AGENTS.md
83
+ ├── ARCHITECTURE.md
84
+ ├── DESIGN.md (optional, UI projects)
85
+ ├── doc/
86
+ │ └── adr/
87
+ │ └── NNNN-<title>.md
88
+ └── .claude/
89
+ ├── skills/<name>/SKILL.md
90
+ └── agents/<name>.md
91
+ ```
92
+
93
+ ### Reference table
94
+
95
+ | Artifact | Template | Prompt | Lives at |
96
+ | ----------------- | --------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------- |
97
+ | `AGENTS.md` | [agents-general](templates/agents-general.md) + [agents-project](templates/agents-project.md) | [prompts/agents.md](prompts/agents.md) | repo root |
98
+ | `ARCHITECTURE.md` | [architecture](templates/architecture.md) | [prompts/architecture.md](prompts/architecture.md) | repo root |
99
+ | ADR | [adr](templates/adr.md) | [prompts/adr.md](prompts/adr.md) | `doc/adr/NNNN-<title>.md` |
100
+ | `DESIGN.md` | (no template — bootstrap from existing tokens) | [prompts/design.md](prompts/design.md) | repo root |
101
+ | Skill | [skill](templates/skill.md) | [prompts/skill.md](prompts/skill.md) | `.claude/skills/<name>/SKILL.md` |
102
+ | Subagent | [subagent](templates/subagent.md) | [prompts/subagent.md](prompts/subagent.md) | `.claude/agents/<name>.md` |
103
+
104
+ Templates carry pure structure. Prompts carry the spec links, conventions, and the literal text to paste.
105
+
106
+ ## License
107
+
108
+ MIT — see [LICENSE](LICENSE).
package/WORKFLOW.md ADDED
@@ -0,0 +1,209 @@
1
+ # Pragmatic Workflow: Engineering with LLMs
2
+
3
+ Engineering production code with LLMs. Agentic, not vibe coding.
4
+
5
+ **The principle behind the rest:** context engineering beats prompt engineering. Context is finite and decays as it fills — aim for the smallest set of high-signal tokens that gets the outcome.
6
+
7
+ ## TL;DR
8
+
9
+ Agents do not replace engineering. They speed up execution, but they make specification, context, validation, and review *more* important than before.
10
+
11
+ What to keep in mind:
12
+
13
+ 1. **Context is the product.** The agent performs only as well as the context you give it. Small, clear, relevant context beats large, noisy context.
14
+ 2. **Spec before code.** Define rules, constraints, architecture, acceptance criteria, and expected output before any implementation.
15
+ 3. **Docs are for *why*, code for *what*.** History lives in git. Comments justify non-obvious choices, never restate the line.
16
+ 4. **Real examples beat generic instructions.** "Follow this existing file" lands harder than "follow best practices."
17
+ 5. **Always know the canonical path.** If you deviate, do it deliberately — never by forgetting the happy path exists.
18
+ 6. **Outcome before path.** Give the finish line — raw input plus exact expected output — and let the agent build the algorithm to connect them.
19
+ 7. **Pin load-bearing architectural decisions.** The agent will invent what isn't specified. Lock architecture into `AGENTS.md`.
20
+ 8. **A good prompt has a stop condition.** Say what to do, what not to do, and where to stop.
21
+ 9. **Plan before execution.** For non-trivial work: explore, plan, review the plan, implement, verify.
22
+ 10. **Format helps, but does not save bad thinking.** Markdown, XML, YAML, and JSON only reduce ambiguity. They don't replace clarity.
23
+ 11. **The bottleneck is judgment, not generation.** Agents generate fast; the hard part is catching what's almost right but wrong.
24
+ 12. **Review needs distance.** The context that produced a solution tends to defend it. Review with a fresh context — diff plus spec, no history.
25
+ 13. **Automation needs rails.** Hooks, tests, lint, CI, sandboxing, and permissions matter more than advisory text the agent can forget.
26
+ 14. **Autonomy requires observability.** If the agent makes decisions, log the trajectory: tool calls, intermediate outputs, failures.
27
+ 15. **Staged spikes when the technique is uncertain.** When the *how* is unknown — a library choice, a CV technique, a multi-stage transformation — break the problem into staged spikes against golden fixtures with per-stage debug artifacts.
28
+
29
+ > Working with agents means trading typing for technical direction. The value is in giving the right context, setting boundaries, validating the result, and keeping "almost right" out of production.
30
+
31
+ ## 1. Spec-Driven Design
32
+
33
+ Define the rules before the agent writes a line. The temptation is to dump everything into `AGENTS.md` and hope it works — but bloat causes the model to ignore the file. Keep one topic per Markdown file: lean and focused. And treat the three kinds of context as distinct artifacts with distinct jobs.
34
+
35
+ **Operational context is advisory.** `AGENTS.md` (or `CLAUDE.md` for Claude Code, which can mirror or import the same content via `@AGENTS.md`) tells the agent how to build, test, follow conventions, and where the security boundaries are. The agent reads it as a guide, not a contract. Open standard `AGENTS.md` is native in most agentic IDEs.
36
+
37
+ **Canonical specs are constraints, not advice.** `DESIGN.md` (the visual contract — YAML tokens plus Markdown rationale, per Google Labs' open standard), `ARCHITECTURE.md` (system patterns and boundaries), and ADRs in `doc/adr/*.md` (Michael Nygard's pattern, with status lifecycle and superseded markers) are facts the agent must obey. If a token or pattern isn't declared here, it doesn't exist. The agent must never invent one.
38
+
39
+ **On-demand context is `SKILL.md`.** Description loads at session start (the listing is capped at 1,536 characters per the spec) and body loads only when the skill is invoked. Use it for repeatable workflows or domain knowledge that shouldn't pay a token cost on every turn.
40
+
41
+ Two rules apply across all three:
42
+
43
+ - **Acceptance criteria must be measurable.** "Build a dashboard" fails. "Loads in under 2 seconds, shows 6 months of history, passes axe accessibility" succeeds.
44
+ - **Prune.** If removing a line wouldn't make the agent fail, cut it.
45
+
46
+ ## 2. Docs vs. Code
47
+
48
+ Avoid putting implementation code in docs unless it's executable, generated, or a minimal API/contract surface. Docs define intent, constraints, contracts, and decisions; production logic lives in code.
49
+
50
+ The split is simple. **Docs are for the *why*** — decisions, not history. Git tracks history; docs explain the reasoning that won't survive otherwise. **Code is for the *what*** — clean naming and small units make logic self-evident, and the more your code does this work, the less your docs need to.
51
+
52
+ Comments are exceptions. They justify *why* a non-obvious choice was made — never *what* the line does. No commented-out code, and no orphan `TODO` or `FIXME`: every deferred item references an issue, ADR, or explicit follow-up.
53
+
54
+ ## 3. Format by Evidence
55
+
56
+ Structure reduces ambiguity, but format isn't magic. Pick the right one for the surface:
57
+
58
+ - **Markdown** for repo files (`AGENTS.md`, `CLAUDE.md`, `SKILL.md`, specs, ADRs). Readable, diffable, agent-friendly.
59
+ - **XML-style tags** inside prompts when boundaries matter: `<instructions>`, `<context>`, `<examples>`, `<input>`, `<constraints>`, `<output_format>`.
60
+ - **YAML** for metadata, frontmatter, and declarative config.
61
+ - **JSON or schema** for machine-validated output.
62
+
63
+ Use XML when the prompt mixes instructions, retrieved context, examples, user input, and expected output — the separation pays off when there's noise to fight. Skip it for simple prompts; if Markdown headings or plain text are clear enough, use them.
64
+
65
+ No format is universally best. **An observation from my practice, not benchmarked:** I've seen consistent gains when shifting prompts to XML — most noticeably with autonomous agents, where the prompt has to land alone without conversational refinement. Direct interactive use (Claude Code, Codex) tolerates loose Markdown; unattended agents don't. Claude in particular seems to respond well to XML, which I attribute to its training, but I haven't benchmarked it. Treat this as a starting hypothesis worth testing on your own target model and task before standardizing.
66
+
67
+ ## 4. Find the Happy Path
68
+
69
+ Before implementing, ask:
70
+
71
+ > *"What is the canonical, idiomatic way to implement [X] in [stack]? Cite official docs. List common deviations and why people take them."*
72
+
73
+ Then check continuously, especially mid-implementation:
74
+
75
+ > *"We are at step Y. Are we still on the happy path? If we deviated, was it deliberate?"*
76
+
77
+ Sometimes you can't follow the happy path — that's fine. But always know where it is and why you left it.
78
+
79
+ ## 5. Ground in Real Patterns
80
+
81
+ Don't dump the codebase into context. Anchor the model in a specific, project-relevant example.
82
+
83
+ > *"Find an existing example of [similar feature]; use that exact structure."*
84
+
85
+ Cite specific files, not "the codebase." Use just-in-time retrieval: pass paths or IDs and let the agent fetch via tools when it needs to read them.
86
+
87
+ ## 6. Explore → Plan → Implement → Commit
88
+
89
+ For non-trivial changes, four phases:
90
+
91
+ 1. **Explore (read-only).** Plan mode in your agent. Read, build a mental model, no edits.
92
+ 2. **Plan.** Agent writes a Markdown plan. You edit before approving.
93
+ 3. **Implement.** Execute the approved plan; verify each step before moving to the next.
94
+ 4. **Commit.** One logical change per commit.
95
+
96
+ Skip this for diffs you can describe in one sentence.
97
+
98
+ ## 7. Action Commands With Stop Criteria
99
+
100
+ Leave no room for interpretation. Tell the model where to stop.
101
+
102
+ - **Avoid:** *"Here is the data. What do you think?"*
103
+ - **Prefer:** *"Analyze this data. List the top 3 bottlenecks. Stop there — don't propose fixes unless I ask."*
104
+
105
+ The stop criterion is as important as the action. Without it, the agent generalizes outward and you end up trimming output you didn't ask for.
106
+
107
+ ## 8. Architectural Boundaries
108
+
109
+ Lock the load-bearing decisions into `AGENTS.md` or `CLAUDE.md` so the agent doesn't relitigate them every session:
110
+
111
+ > "Apply: **Clean Architecture** — isolate core logic from frameworks. **Small units** — single-responsibility, low indentation, no `else` chains. **Modular and testable** — no over-engineering."
112
+
113
+ The agent will follow what's specified and invent what isn't. Prefer specifying.
114
+
115
+ ## 9. Outcome-Based Prompting (TDG)
116
+
117
+ Give the finish line first, not the path:
118
+
119
+ 1. **Ground truth.** Raw input plus exact expected output.
120
+ 2. **Command the implementation.** The algorithm that connects the two.
121
+ 3. **Iterate by criterion.** Ask for three approaches; pick by *one* explicit criterion (readability, performance, *or* testability — not all three at once).
122
+ 4. **Test Dependency Map, not procedural TDD.** Don't tell the agent "do TDD" — tell it *which* tests cover the file. *"Before modifying X.ts, list which tests cover it. Run. Modify. Run. If none cover it, write one first."*
123
+
124
+ ## 10. Reviewer With Fresh Context
125
+
126
+ The agent that wrote the code is biased about it. The same reasoning that produced the solution defends the solution.
127
+
128
+ > *"Open a fresh agent with no history. Give it only the diff and the spec. Review as a strict Senior reviewing a Junior PR. Be ruthless about bugs, coupling, edge cases."*
129
+
130
+ In Claude Code, this means a subagent (the `Task` tool, or a custom `.claude/agents/*.md` file). Without that infrastructure: `/clear`, new context, paste diff plus spec.
131
+
132
+ ## 11. Quality Gates: Determinism Over Persuasion
133
+
134
+ `AGENTS.md` is advisory. Hooks and CI are deterministic. The difference matters: text you write hoping the agent obeys is not the same as a script that exits non-zero when a rule is violated.
135
+
136
+ - **Hooks for inviolable rules** (formatter, secret-scan, lint). Not text the agent might forget.
137
+ - **Pre-commit fast** (lint, format, secrets); **pre-push thorough** (build, unit tests, integration tests). Slow pre-commits push devs to `--no-verify`.
138
+ - **Visual or E2E for UI.** Type-check confirms the code compiles, not that the feature works. Open the browser (Claude in Chrome, DevTools MCP).
139
+ - **Sandboxing plus scoped permissions** for autonomy: allowlists, OS sandbox, classifier-reviewed auto mode. The bigger the autonomy, the more rails you need.
140
+ - **Never bypass.** No `--no-verify`. Failing tests means not ready.
141
+
142
+ ## 12. The Bottleneck Is Discrimination, Not Generation
143
+
144
+ Modern agents handle most routine implementation. The work has shifted to catching what they got wrong.
145
+
146
+ Two 2025 industry surveys point at the same wall. JetBrains' DevEcosystem 2025 reports that only **44%** of developers have AI fully or partially integrated into their workflow. Stack Overflow's 2025 Developer Survey adds: **66%** of developers cite "AI solutions that are almost right, but not quite" as their top frustration, and **45%** say debugging AI-generated code is more time-consuming.
147
+
148
+ The takeaway: §10 (Reviewer) and §11 (Quality Gates) are not optional. Skipping them is where bug density grows.
149
+
150
+ ## 13. Evals for Anything Autonomous
151
+
152
+ If your agent is making decisions on its own, you need evals. A few principles:
153
+
154
+ - **Trajectory beats final output.** Output-only eval hides failures in tool calls, retrieval, and intermediate decisions that the final answer can mask. Log tool calls and intermediate states.
155
+ - **Observability before evals.** Get traces first; build the eval suite on top.
156
+ - **LLM-as-judge for breadth, humans for depth.**
157
+ - **The unit under test is prompt + scaffold + model.** Changing any of the three is a release.
158
+
159
+ ## 14. Staged Spikes With Golden Fixtures
160
+
161
+ Sometimes the spec is clear but the *technique* is uncertain — you don't know which library, which CV approach, which decomposition. Don't ask the agent to solve it end-to-end. Break the problem into staged spikes and validate each one against curated ground truth.
162
+
163
+ The flow has four parts:
164
+
165
+ 1. **Discovery first.** Ask the agent to list canonical approaches grounded in official docs and real examples. Pick one by an explicit criterion. The output of this step is information, not code.
166
+ 2. **Golden fixture.** Curate inputs with rich expected outputs. For computer vision, that means bounding boxes, sizes, lighting, difficulty tags, edge cases — not just "three circles." Keep the fixture as JSON keyed by input path.
167
+ 3. **Pipeline with gates.** One technique per stage; each gate emits a debug artifact: an image to `debug/01-preprocess/`, intermediate JSON, a log row — whatever makes the stage's output inspectable.
168
+ 4. **Two layers of evaluation.** End-to-end against the fixture, *and* per-stage debug to locate where things diverged when it failed.
169
+
170
+ **Why this beats end-to-end:** §9 (TDG) assumes the path is known. When you don't know it, end-to-end evaluation tells you *that* it failed, not *where*. Stage-level artifacts make the divergence inspectable, so you fix the right gate instead of guessing at the final output.
171
+
172
+ **When to use it:** the unknown is *how* — a library choice, a CV technique, a multi-stage transformation. Skip it when the *how* is routine.
173
+
174
+ This is a combination of established practices, not new terminology: spike (XP), golden datasets, stage-segmented error analysis, trajectory evaluation, and visual debugging in CV pipelines.
175
+
176
+ ---
177
+
178
+ These are starting points. Prune what doesn't fit your codebase.
179
+
180
+ ## How this guide was built
181
+
182
+ This is not theory I read and copied. Most of the practices here come from years of shipping production code, with and without LLMs.
183
+
184
+ **Patterns I was already using when I drafted this guide.** Several of them I used before knowing they had established names; once the industry converged on a label, I adopted it to make the conversation easier: **Spec-Driven Design** (§1), **Docs-vs-Code separation** (§2), **pattern matching by real examples** (§5), **explicit Action Commands** (§7), **Architectural Boundaries** (§8), **Outcome-Based Prompting / TDG** (§9), the **senior-reviewer technique** (§10), and **deterministic Quality Gates** (§11).
185
+
186
+ **The XML observation in §3** is also drawn from practice, not benchmarks. It's a hypothesis worth testing on your own setup, not a settled finding.
187
+
188
+ **Practices that came in through iteration on this guide.** They weren't in my original draft, but each matches a problem I'd already encountered or a habit I'd only formalized loosely: **Find the Happy Path** (§4), **Explore → Plan → Implement → Commit** (§6), **The Bottleneck Is Discrimination, Not Generation** (§12 — the 2025 industry statistics ground the principle, they didn't generate it), and **Evals for Anything Autonomous** (§13).
189
+
190
+ **§14 (Staged Spikes With Golden Fixtures) is my own working technique.** I haven't seen it documented end-to-end as a single named pattern, but each component (spike, golden dataset, stage-segmented error analysis, trajectory evaluation, visual CV debugging) has its own lineage in the literature listed under Sources. The combination — discovery → fixture → staged pipeline with debug artifacts → two-layer evaluation — is how I attack problems where the *technique* itself is uncertain.
191
+
192
+ External claims (specific percentages, named frameworks) are cited under Sources. Everything else is operational guidance from practice or synthesis across that material — a working model, refined over time, not academic claim.
193
+
194
+ ## Sources
195
+
196
+ **§1 — Spec-Driven Design**
197
+ - DESIGN.md spec (Google Labs): https://github.com/google-labs-code/design.md
198
+ - SKILL.md spec (Anthropic): https://code.claude.com/docs/en/skills
199
+
200
+ **§12 — The Bottleneck Is Discrimination, Not Generation**
201
+ - JetBrains *DevEcosystem 2025*: https://devecosystem-2025.jetbrains.com/artificial-intelligence
202
+ - Stack Overflow *2025 Developer Survey* (AI section): https://survey.stackoverflow.co/2025/ai
203
+
204
+ **§14 — Staged Spikes With Golden Fixtures**
205
+ - Spike (XP) — Wikipedia: https://en.wikipedia.org/wiki/Spike_(software_development)
206
+ - Golden datasets — Arize: https://arize.com/resource/golden-dataset/
207
+ - Stage-segmented error analysis — Hamel Husain's evals FAQ: https://hamel.dev/blog/posts/evals-faq/
208
+ - Trajectory evaluation — LangSmith docs: https://docs.langchain.com/langsmith/trajectory-evals
209
+ - Visual CV debugging — OpenCV cvv tutorial: https://docs.opencv.org/3.4/d7/dcf/tutorial_cvv_introduction.html
package/bin/agentic.js ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env node
2
+ import { run } from '../src/index.js';
3
+
4
+ run(process.argv).catch((err) => {
5
+ console.error(`agentic: ${err.message}`);
6
+ process.exit(1);
7
+ });
package/package.json ADDED
@@ -0,0 +1,54 @@
1
+ {
2
+ "name": "@alexandrealvaro/agentic",
3
+ "version": "0.1.0-beta.1",
4
+ "description": "Bootstrap and audit AGENTS.md, ARCHITECTURE.md, ADRs, skills, and subagents for engineering production code with LLMs",
5
+ "type": "module",
6
+ "bin": {
7
+ "agentic": "./bin/agentic.js"
8
+ },
9
+ "files": [
10
+ "bin/",
11
+ "src/",
12
+ "templates/",
13
+ "prompts/",
14
+ "WORKFLOW.md",
15
+ "README.md",
16
+ "LICENSE"
17
+ ],
18
+ "engines": {
19
+ "node": ">=18"
20
+ },
21
+ "scripts": {
22
+ "start": "node bin/agentic.js",
23
+ "test": "node bin/agentic.js --help && node bin/agentic.js init --help",
24
+ "prepublishOnly": "npm test"
25
+ },
26
+ "keywords": [
27
+ "llm",
28
+ "agents",
29
+ "agentic",
30
+ "spec-driven",
31
+ "scaffolding",
32
+ "ai",
33
+ "claude-code",
34
+ "agents-md"
35
+ ],
36
+ "license": "MIT",
37
+ "author": "Alexandre Alvaro <alexandre.alvaro@hotmail.com>",
38
+ "repository": {
39
+ "type": "git",
40
+ "url": "git+https://github.com/alexandremendoncaalvaro/agentic-development.git"
41
+ },
42
+ "bugs": {
43
+ "url": "https://github.com/alexandremendoncaalvaro/agentic-development/issues"
44
+ },
45
+ "homepage": "https://github.com/alexandremendoncaalvaro/agentic-development#readme",
46
+ "publishConfig": {
47
+ "access": "public"
48
+ },
49
+ "dependencies": {
50
+ "@clack/prompts": "^1.3.0",
51
+ "clipboardy": "^5.3.1",
52
+ "commander": "^12.1.0"
53
+ }
54
+ }
package/prompts/adr.md ADDED
@@ -0,0 +1,7 @@
1
+ # Bootstrap an ADR
2
+
3
+ Format: Michael Nygard's pattern (Context, Decision, Consequences, Alternatives). Status lifecycle: `proposed` → `accepted` → `deprecated` | `superseded by ADR-NNNN`.
4
+
5
+ ## Paste to your agent
6
+
7
+ > Use [`templates/adr.md`](../templates/adr.md) to draft `doc/adr/<NNNN>-<short-title>.md` (next available number) for the decision: `<one-line decision>`. Status `proposed`, today's date. Fill Context, Decision, Consequences, and Alternatives Considered from our conversation only — don't invent. Stop after writing the file. Wait for review before changing status to `accepted`.
@@ -0,0 +1,13 @@
1
+ # Bootstrap AGENTS.md
2
+
3
+ Open standard: [agents.md](https://agents.md) (Linux Foundation / Agentic AI Foundation). Claude Code reads `CLAUDE.md`; mirror or import via `@AGENTS.md`.
4
+
5
+ ## Paste to your agent
6
+
7
+ > Read [`templates/agents-general.md`](../templates/agents-general.md) and [`templates/agents-project.md`](../templates/agents-project.md). Interview me one section at a time to fill `agents-project.md`'s placeholders: stack, build/test/lint commands, repo layout (only what's not obvious from the tree), commit conventions, security boundaries, real gotchas you can confirm by reading the code.
8
+ >
9
+ > **Do not invent values.** When I don't know something, leave `<TODO>` and move on. After the interview, read the codebase to verify what I told you and flag any mismatch.
10
+ >
11
+ > Output a single `AGENTS.md` at the repo root: filled `agents-project.md` content first, then `agents-general.md` content appended (drop its H1 — `agents-project.md`'s `# AGENTS.md` owns the merged document).
12
+ >
13
+ > Cut every line you can — bloat makes the agent ignore rules.
@@ -0,0 +1,11 @@
1
+ # Bootstrap ARCHITECTURE.md
2
+
3
+ Pairs with ADRs in `doc/adr/`. Architecture is the binding pattern; ADRs are individual decisions with status.
4
+
5
+ ## Paste to your agent
6
+
7
+ > Read [`templates/architecture.md`](../templates/architecture.md). Interview me one section at a time to fill its placeholders: overview, layers and boundaries, patterns (data access, HTTP handlers, async/messaging, errors, validation), naming, observability, deployment topology.
8
+ >
9
+ > Before writing each section, read the codebase to verify the claim is real. **Do not invent.** Leave `<TODO>` for unknowns.
10
+ >
11
+ > Output `ARCHITECTURE.md` at the repo root. At the end, list any decision that should become an ADR — don't write them yet, just flag.
@@ -0,0 +1,19 @@
1
+ # Bootstrap DESIGN.md
2
+
3
+ Spec: [github.com/google-labs-code/design.md](https://github.com/google-labs-code/design.md) (Apache 2.0). Format: YAML frontmatter (W3C-compatible `$value`/`$type` tokens) + Markdown body (rationale, do's/don'ts).
4
+
5
+ Canonical sections: `Overview`, `Colors`, `Typography`, `Layout`, `Elevation & Depth`, `Shapes`, `Components`, `Do's and Don'ts`. **Add `Motion`** (easings, durations) — not in the official spec yet.
6
+
7
+ There is no template — bootstrap from existing tokens.
8
+
9
+ ## Workflow
10
+
11
+ 1. **Retrieval** — point the agent at the source: Figma file, `tailwind.config.js`, `tokens.json`, design system docs, or stylesheet.
12
+ 2. **Extraction** — agent extracts tokens (colors, typography, spacing, radii, shadows, motion) into YAML frontmatter.
13
+ 3. **Synthesis** — agent writes the Markdown body: rationale per token group, application rules, do's/don'ts. Cite the source.
14
+ 4. **Validation** — `npx @google/design.md lint DESIGN.md`. Fix errors before shipping.
15
+ 5. **Component mapping** — if you use Figma, set up Code Connect to map each Figma component to its code component. Without this, the agent guesses.
16
+
17
+ ## Paste to your agent
18
+
19
+ > Read the DESIGN.md spec at https://github.com/google-labs-code/design.md. Extract tokens from `<source: Figma URL / tailwind.config.js / tokens.json / stylesheet>`. Generate `DESIGN.md` at the repo root with YAML frontmatter (W3C-compliant `$value`/`$type`) + Markdown body explaining rationale and application rules per token group. Add a `## Motion` section if any easings/durations are defined. Validate with `npx @google/design.md lint`. **Do not invent tokens not in the source** — if a category is missing, leave the section with a `<TODO>` note.
@@ -0,0 +1,21 @@
1
+ # Bootstrap a Skill
2
+
3
+ Spec: [code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills) | open standard [agentskills.io](https://agentskills.io) | examples [github.com/anthropics/skills](https://github.com/anthropics/skills)
4
+
5
+ ## Where it lives
6
+
7
+ - Personal: `~/.claude/skills/<skill-name>/SKILL.md`
8
+ - Project: `.claude/skills/<skill-name>/SKILL.md`
9
+
10
+ Directory name becomes the command (`/<skill-name>`).
11
+
12
+ ## Paste to your agent
13
+
14
+ > Read the SKILL spec at https://code.claude.com/docs/en/skills and the structure in [`templates/skill.md`](../templates/skill.md). Create a skill at `<.claude/skills/<name>/SKILL.md | ~/.claude/skills/<name>/SKILL.md>` that `<does X when Y>`.
15
+ >
16
+ > Constraints:
17
+ > - `description` is the triggering signal — include keywords a user would naturally say. Combined `description` + `when_to_use` is capped at 1,536 chars.
18
+ > - Body ≤500 lines; move long material to sibling files (`reference.md`, `examples.md`, `scripts/`).
19
+ > - Body stays in context after invocation — every line is recurring token cost. State *what to do*, not narration.
20
+ > - Don't restate AGENTS.md.
21
+ > - Only declare frontmatter fields you actually need. **Do not invent fields not in the spec.**
@@ -0,0 +1,34 @@
1
+ # Bootstrap a Subagent
2
+
3
+ Spec: [code.claude.com/docs/en/sub-agents](https://code.claude.com/docs/en/sub-agents)
4
+
5
+ A custom Claude Code subagent runs with isolated context, scoped tools, and its own system prompt. The body of the file becomes the subagent's full system prompt — it does **not** inherit AGENTS.md / CLAUDE.md from the parent.
6
+
7
+ ## Where it lives
8
+
9
+ - Project (versioned): `.claude/agents/<name>.md`
10
+ - Personal: `~/.claude/agents/<name>.md`
11
+
12
+ Edits to files on disk require a session restart; agents created via `/agents` take effect immediately.
13
+
14
+ ## Common patterns
15
+
16
+ | Pattern | Tools | Model | Notes |
17
+ | ---------------------- | --------------------------- | ---------- | ------------------------------------------------------- |
18
+ | Fresh-context reviewer | `Read, Glob, Grep, Bash` | `sonnet` | Matches WORKFLOW §10. No write tools. |
19
+ | Codebase researcher | use built-in **Explore** | inherit | Don't build custom unless you need different tools. |
20
+ | Diff-only auditor | `Read, Bash` | `sonnet` | Pair with `permissionMode: dontAsk` for read-only runs. |
21
+
22
+ Built-in subagents (`Explore`, `Plan`, `general-purpose`) cover most cases. Build a custom one only when you need a specific role, scoped tools, persistent memory, or a different model.
23
+
24
+ ## Paste to your agent
25
+
26
+ > Read the subagents spec at https://code.claude.com/docs/en/sub-agents and the structure in [`templates/subagent.md`](../templates/subagent.md). Create a project subagent at `.claude/agents/<name>.md` for `<role and trigger>`.
27
+ >
28
+ > Constraints:
29
+ > - `description` is the routing signal — Claude reads it to decide whether to delegate. Be specific.
30
+ > - Body = full system prompt; every line costs tokens on every subagent turn. Be terse.
31
+ > - Don't restate AGENTS.md. Subagents don't read it. State only what differs.
32
+ > - Limit tools deliberately. A reviewer with `Write` access stops being a reviewer.
33
+ > - Tell the subagent where to stop.
34
+ > - Frontmatter: `name`, `description`, plus only the fields you actually need from the spec. **Do not invent fields not in the spec.**
@@ -0,0 +1,108 @@
1
+ import { writeFileSync } from 'node:fs';
2
+ import { resolve } from 'node:path';
3
+ import * as p from '@clack/prompts';
4
+ import clipboard from 'clipboardy';
5
+ import { detectMode } from '../lib/detect.js';
6
+ import { renderAgentsBootstrap } from '../lib/render.js';
7
+
8
+ const VALID_MODES = ['auto', 'greenfield', 'brownfield', 'audit'];
9
+
10
+ const MODE_LABEL = {
11
+ greenfield: 'greenfield — empty project',
12
+ brownfield: 'brownfield — existing code, no AGENTS.md',
13
+ audit: 'audit — AGENTS.md exists, compare drift',
14
+ };
15
+
16
+ export async function initCommand(opts) {
17
+ if (!VALID_MODES.includes(opts.mode)) {
18
+ throw new Error(
19
+ `invalid mode "${opts.mode}". Use one of: ${VALID_MODES.join(', ')}`
20
+ );
21
+ }
22
+
23
+ const flagDest = opts.copy
24
+ ? 'clipboard'
25
+ : opts.out
26
+ ? 'file'
27
+ : opts.stdout
28
+ ? 'stdout'
29
+ : null;
30
+ const interactive = process.stdout.isTTY && flagDest === null;
31
+
32
+ const detected = detectMode(process.cwd());
33
+ let mode = opts.mode === 'auto' ? detected : opts.mode;
34
+ let dest = flagDest ?? 'stdout';
35
+ let outPath = opts.out ?? null;
36
+
37
+ if (interactive) {
38
+ p.intro('agentic init');
39
+ p.note(MODE_LABEL[detected], 'Detected context');
40
+
41
+ if (opts.mode === 'auto') {
42
+ const choice = await p.select({
43
+ message: 'Confirm mode?',
44
+ options: [
45
+ { value: detected, label: `Use detected (${detected})` },
46
+ ...['greenfield', 'brownfield', 'audit']
47
+ .filter((m) => m !== detected)
48
+ .map((m) => ({ value: m, label: `Override → ${MODE_LABEL[m]}` })),
49
+ ],
50
+ });
51
+ if (p.isCancel(choice)) {
52
+ p.cancel('Cancelled.');
53
+ return;
54
+ }
55
+ mode = choice;
56
+ }
57
+
58
+ const chosen = await p.select({
59
+ message: 'Where to send the prompt?',
60
+ options: [
61
+ { value: 'clipboard', label: 'Copy to clipboard', hint: 'recommended' },
62
+ { value: 'stdout', label: 'Print to stdout', hint: 'for piping' },
63
+ { value: 'file', label: 'Save to file' },
64
+ ],
65
+ });
66
+ if (p.isCancel(chosen)) {
67
+ p.cancel('Cancelled.');
68
+ return;
69
+ }
70
+ dest = chosen;
71
+
72
+ if (dest === 'file') {
73
+ const path = await p.text({
74
+ message: 'File path',
75
+ initialValue: './agentic-init-prompt.md',
76
+ });
77
+ if (p.isCancel(path)) {
78
+ p.cancel('Cancelled.');
79
+ return;
80
+ }
81
+ outPath = path;
82
+ }
83
+ }
84
+
85
+ const prompt = renderAgentsBootstrap({ mode });
86
+
87
+ if (dest === 'clipboard') {
88
+ await clipboard.write(prompt);
89
+ if (interactive) {
90
+ p.outro('Copied to clipboard. Paste it into Claude Code, Codex, or any agentic tool.');
91
+ } else {
92
+ process.stderr.write('Copied to clipboard.\n');
93
+ }
94
+ } else if (dest === 'file') {
95
+ const absPath = resolve(outPath);
96
+ writeFileSync(absPath, prompt);
97
+ if (interactive) {
98
+ p.outro(`Saved to ${absPath}. Open it and paste into your agent.`);
99
+ } else {
100
+ process.stderr.write(`Saved to ${absPath}\n`);
101
+ }
102
+ } else {
103
+ if (interactive) {
104
+ p.outro('Prompt below — copy and paste into your agent.');
105
+ }
106
+ process.stdout.write(prompt);
107
+ }
108
+ }
package/src/index.js ADDED
@@ -0,0 +1,30 @@
1
+ import { Command } from 'commander';
2
+ import { readFileSync } from 'node:fs';
3
+ import { fileURLToPath } from 'node:url';
4
+ import { dirname, join } from 'node:path';
5
+ import { initCommand } from './commands/init.js';
6
+
7
+ const __dirname = dirname(fileURLToPath(import.meta.url));
8
+ const pkg = JSON.parse(
9
+ readFileSync(join(__dirname, '..', 'package.json'), 'utf8')
10
+ );
11
+
12
+ export async function run(argv) {
13
+ const program = new Command();
14
+
15
+ program
16
+ .name('agentic')
17
+ .description(pkg.description)
18
+ .version(pkg.version);
19
+
20
+ program
21
+ .command('init')
22
+ .description('Bootstrap AGENTS.md for the current project — interactive by default')
23
+ .option('-m, --mode <mode>', 'force mode: auto | greenfield | brownfield | audit', 'auto')
24
+ .option('--copy', 'skip TUI and copy the prompt to clipboard')
25
+ .option('--stdout', 'skip TUI and print the prompt to stdout')
26
+ .option('-o, --out <file>', 'skip TUI and write the prompt to a file')
27
+ .action(initCommand);
28
+
29
+ await program.parseAsync(argv);
30
+ }
@@ -0,0 +1,36 @@
1
+ import { existsSync, readdirSync } from 'node:fs';
2
+ import { join } from 'node:path';
3
+
4
+ const TRIVIAL_ENTRIES = new Set([
5
+ '.git',
6
+ 'node_modules',
7
+ '.DS_Store',
8
+ '.idea',
9
+ '.vscode',
10
+ '.gitignore',
11
+ '.gitattributes',
12
+ '.env',
13
+ '.env.local',
14
+ '.env.example',
15
+ 'README.md',
16
+ 'LICENSE',
17
+ 'LICENSE.md',
18
+ ]);
19
+
20
+ /**
21
+ * Detect the project's current state to pick a default mode for `init`.
22
+ * - audit: AGENTS.md already exists; compare against the codebase
23
+ * - greenfield: only trivial files (no real code yet)
24
+ * - brownfield: has code but no AGENTS.md
25
+ */
26
+ export function detectMode(dir) {
27
+ if (existsSync(join(dir, 'AGENTS.md'))) {
28
+ return 'audit';
29
+ }
30
+
31
+ const meaningful = readdirSync(dir).filter(
32
+ (name) => !TRIVIAL_ENTRIES.has(name) && !name.startsWith('.')
33
+ );
34
+
35
+ return meaningful.length === 0 ? 'greenfield' : 'brownfield';
36
+ }
@@ -0,0 +1,66 @@
1
+ import { readFileSync } from 'node:fs';
2
+ import { fileURLToPath } from 'node:url';
3
+ import { dirname, join } from 'node:path';
4
+
5
+ const __dirname = dirname(fileURLToPath(import.meta.url));
6
+ const KIT_ROOT = join(__dirname, '..', '..');
7
+
8
+ function readKit(relativePath) {
9
+ return readFileSync(join(KIT_ROOT, relativePath), 'utf8');
10
+ }
11
+
12
+ const MODE_CONTEXT = {
13
+ greenfield:
14
+ "Empty project. Interview me from scratch — there's no existing code yet to verify against.",
15
+ brownfield:
16
+ 'Existing codebase, no AGENTS.md yet. After interviewing me, read the actual code to verify what I told you and flag any mismatch before writing anything.',
17
+ audit:
18
+ "AGENTS.md already exists. Do not rewrite it. Compare it against the current codebase and against the template structure below. List every drift, then wait — I'll decide what to change.",
19
+ };
20
+
21
+ export function renderAgentsBootstrap({ mode }) {
22
+ const general = readKit('templates/agents-general.md').trim();
23
+ const project = readKit('templates/agents-project.md').trim();
24
+
25
+ const outputInstruction =
26
+ mode === 'audit'
27
+ ? 'Output: a list of disagreements. Format each as `[file or section]: spec says X, code says Y. Suggested resolution: [change spec / change code / discuss].`'
28
+ : "Output: a single `AGENTS.md` at the repo root — filled `agents-project.md` content first, then `agents-general.md` content appended. Drop `agents-general.md`'s H1; `agents-project.md`'s `# AGENTS.md` owns the merged document.";
29
+
30
+ return `# Bootstrap AGENTS.md (mode: ${mode})
31
+
32
+ Paste this entire output into your agent. Both templates are inlined below — no \`--add-dir\` needed.
33
+
34
+ ---
35
+
36
+ ## Mode context
37
+
38
+ ${MODE_CONTEXT[mode]}
39
+
40
+ ---
41
+
42
+ ## Template: agents-general.md (universal agent behavior)
43
+
44
+ ${general}
45
+
46
+ ---
47
+
48
+ ## Template: agents-project.md (per-project structure with placeholders)
49
+
50
+ ${project}
51
+
52
+ ---
53
+
54
+ ## Instructions
55
+
56
+ > Read both templates above.
57
+ >
58
+ > Interview me one section at a time to fill \`agents-project.md\`'s \`<placeholders>\`: stack, build/test/lint commands, repo layout (only what's not obvious), commit conventions, security boundaries, and real gotchas you can confirm by reading the code.
59
+ >
60
+ > **Do not invent values.** When I don't know something, leave \`<TODO>\` and move on.
61
+ >
62
+ > ${outputInstruction}
63
+ >
64
+ > Cut every line you can — bloat makes the agent ignore rules.
65
+ `;
66
+ }
@@ -0,0 +1,22 @@
1
+ # ADR-NNNN: `<short imperative title>`
2
+
3
+ **Status:** `<proposed | accepted | deprecated | superseded by ADR-NNNN>`
4
+ **Date:** `<YYYY-MM-DD>`
5
+ **Deciders:** `<names or roles>`
6
+
7
+ ## Context
8
+
9
+ `<What is the issue motivating this decision? What forces are at play — technical, organizational, regulatory, cost?>`
10
+
11
+ ## Decision
12
+
13
+ `<State as a directive: "We will…". One decision per ADR.>`
14
+
15
+ ## Consequences
16
+
17
+ `<What becomes easier, harder, or different. List positive, negative, and neutral consequences.>`
18
+
19
+ ## Alternatives Considered
20
+
21
+ * `<option>` — `<why rejected>`
22
+ * `<option>` — `<why rejected>`
@@ -0,0 +1,82 @@
1
+ # Universal Agent Behavior
2
+
3
+ **Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
4
+
5
+ ## Think Before Coding
6
+
7
+ **Don't assume. Don't hide confusion. Surface tradeoffs.**
8
+
9
+ Before implementing:
10
+ - State your assumptions explicitly. If uncertain, ask.
11
+ - If multiple interpretations exist, present them - don't pick silently.
12
+ - If a simpler approach exists, say so. Push back when warranted.
13
+ - If something is unclear, stop. Name what's confusing. Ask.
14
+
15
+ ## Ground Before Coding
16
+
17
+ **Anchor in real patterns. Research the canonical path.**
18
+
19
+ - Find the canonical/idiomatic way to do it. Note where you deviate and why.
20
+ - Find an existing example in the codebase; reuse its structure.
21
+ - Cite specific files, not "the codebase". Fetch via tools — don't dump code into context.
22
+ - For non-trivial changes, explore (read-only) → plan → implement → commit. Skip for diffs you can describe in one sentence.
23
+
24
+ ## Simplicity First
25
+
26
+ **Minimum code that solves the problem. Nothing speculative.**
27
+
28
+ - No features beyond what was asked.
29
+ - No abstractions for single-use code.
30
+ - No "flexibility" or "configurability" that wasn't requested.
31
+ - No error handling for impossible scenarios.
32
+ - Comments justify *why* a non-obvious choice was made, not *what* the line does. No commented-out code; no orphan `TODO`/`FIXME` — every deferred item references an issue, ADR, or follow-up.
33
+ - If you write 200 lines and it could be 50, rewrite it.
34
+
35
+ Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
36
+
37
+ ## Surgical Changes
38
+
39
+ **Touch only what you must. Clean up only your own mess.**
40
+
41
+ When editing existing code:
42
+ - Don't "improve" adjacent code, comments, or formatting.
43
+ - Don't refactor things that aren't broken.
44
+ - Match existing style, even if you'd do it differently.
45
+ - If you notice unrelated dead code, mention it - don't delete it.
46
+
47
+ When your changes create orphans:
48
+ - Remove imports/variables/functions that YOUR changes made unused.
49
+ - Don't remove pre-existing dead code unless asked.
50
+
51
+ The test: Every changed line should trace directly to the user's request.
52
+
53
+ ## Goal-Driven Execution
54
+
55
+ **Define success criteria. Loop until verified.**
56
+
57
+ Transform tasks into verifiable goals:
58
+ - "Add validation" → "Write tests for invalid inputs, then make them pass"
59
+ - "Fix the bug" → "Write a test that reproduces it, then make it pass"
60
+ - "Refactor X" → "Ensure tests pass before and after"
61
+
62
+ Before modifying a file, list which tests cover it. Run. Modify. Run. If none, write one first.
63
+
64
+ For multi-step tasks, state a brief plan:
65
+ ```
66
+ 1. [Step] → verify: [check]
67
+ 2. [Step] → verify: [check]
68
+ 3. [Step] → verify: [check]
69
+ ```
70
+
71
+ Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
72
+
73
+ ## Verify Before Claiming Done
74
+
75
+ - Type-check and tests verify *code*, not *feature*.
76
+ - For UI/runtime changes, exercise the feature in a browser.
77
+ - Can't verify it? Say so. Don't claim success.
78
+ - Never bypass gates (`--no-verify`, skipped hooks, deleted failing tests).
79
+
80
+ ---
81
+
82
+ **These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
@@ -0,0 +1,91 @@
1
+ # AGENTS.md
2
+
3
+ ## Project Overview
4
+
5
+ `<what this project does, who uses it, the quality bar that matters most>`
6
+
7
+ **Stack:** `<languages, runtimes, frameworks, database>`
8
+ **Entry points:** `<main services and where to find them>`
9
+
10
+ ## Setup, Build, Test
11
+
12
+ ```bash
13
+ # Install
14
+ <command>
15
+
16
+ # Build
17
+ <command>
18
+
19
+ # Test (single file preferred over full suite)
20
+ <single test command>
21
+ <full suite command>
22
+
23
+ # Run before any commit
24
+ <lint>
25
+ <format>
26
+ <typecheck>
27
+ ```
28
+
29
+ Document non-obvious flags or env vars inline.
30
+
31
+ ## Quality Gates
32
+
33
+ Deterministic enforcement — agent cannot skip.
34
+
35
+ * Pre-commit hook (fast): `<lint, format, secret-scan>`
36
+ * Pre-push hook (thorough): `<build + unit + integration>`
37
+ * Visual/E2E for UI (if applicable): `<e.g., Cypress, Playwright, Claude in Chrome — leave blank for non-UI projects>`
38
+ * Hook config lives in: `<.husky/, .pre-commit-config.yaml, .claude/settings.json — see code.claude.com/docs/en/hooks>`
39
+ * CI blocks on: `<list>`
40
+
41
+ ## Code Style
42
+
43
+ Only what differs from language defaults.
44
+
45
+ * `<e.g., ES modules, not CommonJS>`
46
+ * `<e.g., destructure imports>`
47
+ * `<e.g., no `any` outside `internal/types/`>`
48
+ * `<e.g., Pydantic for all request/response shapes>`
49
+
50
+ ## Architectural Principles
51
+
52
+ Decisions the agent must follow, not reinvent.
53
+
54
+ * `<e.g., Clean Architecture — core logic isolated from frameworks>`
55
+ * `<e.g., Repository pattern for all DB access>`
56
+ * `<e.g., All HTTP handlers go through middleware in `src/middleware/`>`
57
+ * `<e.g., Single responsibility, no `else` chains, low indentation>`
58
+
59
+ ## Repository Layout
60
+
61
+ `<where logic, tests, docs, infra live — only if not obvious from the tree>`
62
+ `<.claude/skills/ — list of available skills, if any>`
63
+ `<.claude/agents/ — list of custom subagents, if any>`
64
+
65
+ ## Commit & PR Conventions
66
+
67
+ * Commits: `<conventional / project-specific>`
68
+ * Branches: `<feat/, fix/, chore/>`
69
+ * PRs require: `<green CI, one review, linked issue>`
70
+ * Never push to `<main>` directly.
71
+
72
+ ## Security & Privacy
73
+
74
+ * Secrets: `<location — never committed>`
75
+ * Files the agent must not read or modify: `<list>`
76
+ * Data classification: `<e.g., no PII in logs>`
77
+ * Pre-approved commands (no prompt): `<e.g., gh, npm test, npm run lint>`
78
+ * MCP servers approved: `<list>`
79
+
80
+ ## Gotchas
81
+
82
+ Real traps. Each one should map to an incident.
83
+
84
+ * `<e.g., migrations not idempotent — never edit, always create new>`
85
+ * `<e.g., DB is UTC, app displays America/Sao_Paulo>`
86
+
87
+ ## External Resources
88
+
89
+ * Docs: `<URL>`
90
+ * Issues: `<URL>`
91
+ * Runbook: `<URL>`
@@ -0,0 +1,40 @@
1
+ # Architecture
2
+
3
+ System-level patterns and boundaries. Pair with ADRs in `doc/adr/` for individual decisions.
4
+
5
+ ## Overview
6
+
7
+ `<one paragraph: what the system does, key external dependencies, deployment shape>`
8
+
9
+ ## Layers & Boundaries
10
+
11
+ `<the layered/hexagonal/clean structure: what lives in each layer, what crosses boundaries, what doesn't>`
12
+
13
+ ## Patterns
14
+
15
+ * **Data access:** `<e.g., Repository pattern; raw SQL only inside `internal/db/`>`
16
+ * **HTTP handlers:** `<e.g., all go through middleware in `src/middleware/`>`
17
+ * **Async/messaging:** `<e.g., Kafka topics owned by their producer service>`
18
+ * **Error handling:** `<e.g., domain errors in `errors/`; HTTP mapping at handler edge>`
19
+ * **Validation:** `<e.g., Pydantic at boundary, never inside core>`
20
+
21
+ ## Naming Conventions
22
+
23
+ `<module/file/class naming rules that aren't obvious from language defaults>`
24
+
25
+ ## Observability
26
+
27
+ * Logs: `<format, level conventions, where they ship>`
28
+ * Metrics: `<library, dashboards>`
29
+ * Traces: `<provider, sampling strategy>`
30
+
31
+ ## Deployment Topology
32
+
33
+ `<how services run in prod: containers, orchestration, scaling rules>`
34
+
35
+ ## Active ADRs
36
+
37
+ Currently-binding decisions. Link each to `doc/adr/`.
38
+
39
+ * ADR-0001 — `<title>`
40
+ * ADR-0002 — `<title>`
@@ -0,0 +1,27 @@
1
+ ---
2
+ description: <what the skill does + when to invoke — primary triggering signal>
3
+
4
+ # Optional — declare only what you need.
5
+ # name: <skill-name> # lowercase + numbers + hyphens, ≤64 chars; defaults to directory name
6
+ # when_to_use: <trigger phrases or example requests; appended to description>
7
+ # argument-hint: <[issue-number] or [filename] [format]>
8
+ # arguments: <space-separated names mapped to positions>
9
+ # allowed-tools: Read Grep # pre-approves tools while skill is active
10
+ # disable-model-invocation: true # only the user can invoke (no auto-load)
11
+ # user-invocable: false # only Claude can invoke (background knowledge)
12
+ # context: fork # run in a forked subagent
13
+ # agent: Explore # Explore | Plan | general-purpose | <custom>
14
+ # paths: <glob,glob> # auto-load only when working in matching files
15
+ # model: <sonnet | opus | haiku | inherit>
16
+ # effort: <low | medium | high | xhigh | max>
17
+ # hooks: <see code.claude.com/docs/en/hooks>
18
+ # shell: <bash | powershell>
19
+ ---
20
+
21
+ <Imperative instructions: what to do, not why. State as standing rules — once
22
+ loaded, the body stays in context for the rest of the session.>
23
+
24
+ ## Additional resources
25
+
26
+ - For detailed reference: see [reference.md](reference.md)
27
+ - For examples: see [examples.md](examples.md)
@@ -0,0 +1,27 @@
1
+ ---
2
+ # Required
3
+ name: <unique-name> # lowercase + hyphens
4
+ description: <when Claude should delegate to this subagent>
5
+
6
+ # Optional — declare only what you need.
7
+ # tools: Read, Glob, Grep # comma-separated; omit to inherit all parent tools
8
+ # disallowedTools: Write, Edit # subtract from inherited or specified list
9
+ # model: sonnet # sonnet | opus | haiku | <full-id> | inherit (default)
10
+ # permissionMode: default # default | acceptEdits | auto | dontAsk | bypassPermissions | plan
11
+ # skills: skill-a, skill-b # full skill content injected at startup
12
+ # maxTurns: 10
13
+ # memory: project # user | project | local — enables cross-session learnings
14
+ # isolation: worktree # run in a temporary git worktree
15
+ # background: false
16
+ # effort: medium # low | medium | high | xhigh | max
17
+ # color: blue # red | blue | green | yellow | purple | orange | pink | cyan
18
+ # mcpServers: <see code.claude.com/docs/en/mcp>
19
+ # hooks: <see code.claude.com/docs/en/hooks>
20
+ # initialPrompt: <auto-submitted as first user turn when run as main agent>
21
+ ---
22
+
23
+ <System prompt: the subagent's role, scope, and stop criteria.
24
+
25
+ State the role in one sentence. Define what it should do when invoked, what
26
+ output format to return, and what NOT to do. The subagent does not read
27
+ AGENTS.md — restate any convention it must follow.>