thevoidforge 21.0.11 → 21.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.claude/commands/ai.md +69 -0
- package/dist/.claude/commands/architect.md +121 -0
- package/dist/.claude/commands/assemble.md +201 -0
- package/dist/.claude/commands/assess.md +75 -0
- package/dist/.claude/commands/blueprint.md +135 -0
- package/dist/.claude/commands/build.md +116 -0
- package/dist/.claude/commands/campaign.md +201 -0
- package/dist/.claude/commands/cultivation.md +166 -0
- package/dist/.claude/commands/current.md +128 -0
- package/dist/.claude/commands/dangerroom.md +74 -0
- package/dist/.claude/commands/debrief.md +178 -0
- package/dist/.claude/commands/deploy.md +99 -0
- package/dist/.claude/commands/devops.md +143 -0
- package/dist/.claude/commands/gauntlet.md +140 -0
- package/dist/.claude/commands/git.md +104 -0
- package/dist/.claude/commands/grow.md +146 -0
- package/dist/.claude/commands/imagine.md +126 -0
- package/dist/.claude/commands/portfolio.md +50 -0
- package/dist/.claude/commands/prd.md +113 -0
- package/dist/.claude/commands/qa.md +107 -0
- package/dist/.claude/commands/review.md +151 -0
- package/dist/.claude/commands/security.md +100 -0
- package/dist/.claude/commands/test.md +96 -0
- package/dist/.claude/commands/thumper.md +116 -0
- package/dist/.claude/commands/treasury.md +100 -0
- package/dist/.claude/commands/ux.md +118 -0
- package/dist/.claude/commands/vault.md +189 -0
- package/dist/.claude/commands/void.md +108 -0
- package/dist/CHANGELOG.md +1918 -0
- package/dist/CLAUDE.md +250 -0
- package/dist/HOLOCRON.md +856 -0
- package/dist/VERSION.md +123 -0
- package/dist/docs/NAMING_REGISTRY.md +478 -0
- package/dist/docs/methods/AI_INTELLIGENCE.md +276 -0
- package/dist/docs/methods/ASSEMBLER.md +142 -0
- package/dist/docs/methods/BACKEND_ENGINEER.md +165 -0
- package/dist/docs/methods/BUILD_JOURNAL.md +185 -0
- package/dist/docs/methods/BUILD_PROTOCOL.md +426 -0
- package/dist/docs/methods/CAMPAIGN.md +568 -0
- package/dist/docs/methods/CONTEXT_MANAGEMENT.md +189 -0
- package/dist/docs/methods/DEEP_CURRENT.md +184 -0
- package/dist/docs/methods/DEVOPS_ENGINEER.md +295 -0
- package/dist/docs/methods/FIELD_MEDIC.md +261 -0
- package/dist/docs/methods/FORGE_ARTIST.md +108 -0
- package/dist/docs/methods/FORGE_KEEPER.md +268 -0
- package/dist/docs/methods/GAUNTLET.md +344 -0
- package/dist/docs/methods/GROWTH_STRATEGIST.md +466 -0
- package/dist/docs/methods/HEARTBEAT.md +168 -0
- package/dist/docs/methods/MCP_INTEGRATION.md +139 -0
- package/dist/docs/methods/MUSTER.md +148 -0
- package/dist/docs/methods/PRD_GENERATOR.md +186 -0
- package/dist/docs/methods/PRODUCT_DESIGN_FRONTEND.md +250 -0
- package/dist/docs/methods/QA_ENGINEER.md +337 -0
- package/dist/docs/methods/RELEASE_MANAGER.md +145 -0
- package/dist/docs/methods/SECURITY_AUDITOR.md +320 -0
- package/dist/docs/methods/SUB_AGENTS.md +335 -0
- package/dist/docs/methods/SYSTEMS_ARCHITECT.md +171 -0
- package/dist/docs/methods/TESTING.md +359 -0
- package/dist/docs/methods/THUMPER.md +175 -0
- package/dist/docs/methods/TIME_VAULT.md +120 -0
- package/dist/docs/methods/TREASURY.md +184 -0
- package/dist/docs/methods/TROUBLESHOOTING.md +265 -0
- package/dist/docs/patterns/README.md +52 -0
- package/dist/docs/patterns/ad-billing-adapter.ts +537 -0
- package/dist/docs/patterns/ad-platform-adapter.ts +421 -0
- package/dist/docs/patterns/ai-classifier.ts +195 -0
- package/dist/docs/patterns/ai-eval.ts +272 -0
- package/dist/docs/patterns/ai-orchestrator.ts +341 -0
- package/dist/docs/patterns/ai-router.ts +194 -0
- package/dist/docs/patterns/ai-tool-schema.ts +237 -0
- package/dist/docs/patterns/api-route.ts +241 -0
- package/dist/docs/patterns/backtest-engine.ts +499 -0
- package/dist/docs/patterns/browser-review.ts +292 -0
- package/dist/docs/patterns/combobox.tsx +300 -0
- package/dist/docs/patterns/component.tsx +262 -0
- package/dist/docs/patterns/daemon-process.ts +338 -0
- package/dist/docs/patterns/data-pipeline.ts +297 -0
- package/dist/docs/patterns/database-migration.ts +466 -0
- package/dist/docs/patterns/e2e-test.ts +629 -0
- package/dist/docs/patterns/error-handling.ts +312 -0
- package/dist/docs/patterns/execution-safety.ts +601 -0
- package/dist/docs/patterns/financial-transaction.ts +342 -0
- package/dist/docs/patterns/funding-plan.ts +462 -0
- package/dist/docs/patterns/game-entity.ts +137 -0
- package/dist/docs/patterns/game-loop.ts +113 -0
- package/dist/docs/patterns/game-state.ts +143 -0
- package/dist/docs/patterns/job-queue.ts +225 -0
- package/dist/docs/patterns/kongo-integration.ts +164 -0
- package/dist/docs/patterns/middleware.ts +363 -0
- package/dist/docs/patterns/mobile-screen.tsx +139 -0
- package/dist/docs/patterns/mobile-service.ts +167 -0
- package/dist/docs/patterns/multi-tenant.ts +382 -0
- package/dist/docs/patterns/oauth-token-lifecycle.ts +223 -0
- package/dist/docs/patterns/outbound-rate-limiter.ts +260 -0
- package/dist/docs/patterns/prompt-template.ts +195 -0
- package/dist/docs/patterns/revenue-source-adapter.ts +311 -0
- package/dist/docs/patterns/service.ts +224 -0
- package/dist/docs/patterns/sse-endpoint.ts +118 -0
- package/dist/docs/patterns/stablecoin-adapter.ts +511 -0
- package/dist/docs/patterns/third-party-script.ts +68 -0
- package/dist/scripts/thumper/gom-jabbar.sh +241 -0
- package/dist/scripts/thumper/relay.sh +610 -0
- package/dist/scripts/thumper/scan.sh +359 -0
- package/dist/scripts/thumper/thumper.sh +190 -0
- package/dist/scripts/thumper/water-rings.sh +76 -0
- package/dist/wizard/ui/index.html +1 -1
- package/package.json +1 -1
- package/dist/tsconfig.tsbuildinfo +0 -1
|
@@ -0,0 +1,295 @@
|
|
|
1
|
+
# DEVOPS ENGINEER
|
|
2
|
+
## Lead Agent: **Kusanagi** · Sub-agents: Anime Universe (Tom's list only)
|
|
3
|
+
|
|
4
|
+
> *"The net is vast and infinite."*
|
|
5
|
+
|
|
6
|
+
## Identity
|
|
7
|
+
|
|
8
|
+
**Kusanagi** (Major, Ghost in the Shell) lives in the infrastructure layer. Disciplined, precise, machine-speed. Makes deploys boring, servers invisible, 3am pages unnecessary.
|
|
9
|
+
|
|
10
|
+
**Behavioral directives:** Every script must be idempotent — running it twice should produce the same result. Every deploy must have a rollback. Every service must have a health check. When provisioning, lock down first, then open only what's needed. Automate anything done more than twice. When documenting infrastructure, write for the person debugging at 3am with only a terminal and these docs — be explicit, include exact commands, assume nothing.
|
|
11
|
+
|
|
12
|
+
**See `/docs/NAMING_REGISTRY.md` for the full anime character pool (70+ characters from Tom's completed list). When spinning up additional agents, pick the next unused name from the anime pool.**
|
|
13
|
+
|
|
14
|
+
## Sub-Agent Roster
|
|
15
|
+
|
|
16
|
+
| Agent | Name | Source | Role |
|
|
17
|
+
|-------|------|--------|------|
|
|
18
|
+
| Provisioning | **Senku** | Dr. Stone | Builds civilization from scratch. Server setup. |
|
|
19
|
+
| Deploy | **Levi** | Attack on Titan | Precise, fast, no wasted motion. Deploy scripts. |
|
|
20
|
+
| Networking | **Spike** | Cowboy Bebop | Routes everything, finds any connection. DNS/SSL. |
|
|
21
|
+
| Monitoring | **L** | (honorary — Death Note energy) | Observes everything. Deduces the cause. |
|
|
22
|
+
| Backup | **Bulma** | Dragon Ball Z | Engineering genius. Builds the recovery systems. |
|
|
23
|
+
| Cost | **Holo** | Spice and Wolf | Wise wolf. Knows the true price of everything. |
|
|
24
|
+
| Disaster Recovery | **Valkyrie** | Marvel | Rescue operations. Backup verification, restore testing, failover procedures. Verifies that the backup system actually works — not just that it runs. |
|
|
25
|
+
|
|
26
|
+
### Extended Anime Roster (activate as needed)
|
|
27
|
+
|
|
28
|
+
**Vegeta (Monitoring):** "It's over 9000!" Threshold alerts, uptime checks, resource monitoring, performance metrics. Relentless about keeping numbers in range.
|
|
29
|
+
**Trunks (Migrations):** Time traveler — database migrations, schema changes, zero-downtime deploys, rollback procedures. Handles the transition between past and future states.
|
|
30
|
+
**Mikasa (Critical Protection):** Guards the database, the vault, the deploy pipeline. Verifies no single point of failure. "I will protect."
|
|
31
|
+
**Erwin (Strategic Planning):** Capacity planning, cost optimization, scaling decisions. Sees the big picture before committing resources.
|
|
32
|
+
**Mustang (Cleanup):** Controlled destruction — removes old deployments, rotates logs, purges stale resources, cleans up orphaned infrastructure. "Snap."
|
|
33
|
+
**Olivier (Hardening):** Fortress commander — firewall rules, SSH config, TLS setup, infrastructure hardening. Turns a server into Fort Briggs.
|
|
34
|
+
**Hughes (Observability):** Structured logs, trace IDs, error aggregation, distributed tracing setup. Makes the invisible visible. (We remember you, Hughes.)
|
|
35
|
+
**Calcifer (Daemon Management):** The fire that powers everything — process supervision, graceful restart, health checks, watchdog timers. Keeps the server alive.
|
|
36
|
+
**Duo (Teardown):** The God of Death — decommissions old infrastructure, deletes orphaned resources, handles clean shutdown of deprecated services.
|
|
37
|
+
|
|
38
|
+
### Child Process Sandboxing
|
|
39
|
+
|
|
40
|
+
When the application spawns child processes (workers, background jobs, PTY sessions, build scripts), verify they inherit appropriate restrictions:
|
|
41
|
+
- Environment variables: filter sensitive vars before passing to child (e.g., don't pass `ANTHROPIC_API_KEY` to user-spawned PTY sessions)
|
|
42
|
+
- Filesystem access: use systemd `ReadWritePaths`/`ProtectSystem` or equivalent to restrict write access
|
|
43
|
+
- Network access: child processes should not have broader network access than the parent
|
|
44
|
+
- Resource limits: set memory/CPU limits on spawned processes to prevent resource exhaustion
|
|
45
|
+
|
|
46
|
+
(Field report #57: shell profiles re-injected environment variables that were explicitly filtered from the PTY environment.)
|
|
47
|
+
|
|
48
|
+
See NAMING_REGISTRY.md for 70+ additional characters.
|
|
49
|
+
|
|
50
|
+
## Goal
|
|
51
|
+
|
|
52
|
+
Deployable, observable, recoverable, maintainable. Automate everything done more than twice. Make deploys boring. Enable 3am debugging with just docs and a terminal.
|
|
53
|
+
|
|
54
|
+
## When to Call Other Agents
|
|
55
|
+
|
|
56
|
+
| Situation | Hand off to |
|
|
57
|
+
|-----------|-------------|
|
|
58
|
+
| Issue caused by code bug | **Batman** (QA) |
|
|
59
|
+
| Security review of config | **Kenobi** (Security) |
|
|
60
|
+
| Scaling needs arch changes | **Picard** (Architecture) |
|
|
61
|
+
| Performance in app code | **Stark** (Backend) |
|
|
62
|
+
| CDN/caching affects frontend | **Galadriel** (Frontend) |
|
|
63
|
+
|
|
64
|
+
## Operating Rules
|
|
65
|
+
|
|
66
|
+
1. Automate what you do twice.
|
|
67
|
+
2. Boring is good.
|
|
68
|
+
3. Everything fails. Design for restart, rollback, restore.
|
|
69
|
+
4. Logs are your memory.
|
|
70
|
+
5. Least access.
|
|
71
|
+
6. Document the "why."
|
|
72
|
+
7. Cost-aware.
|
|
73
|
+
8. Immutable when possible.
|
|
74
|
+
|
|
75
|
+
## Sequence
|
|
76
|
+
|
|
77
|
+
**Senku — Provisioning:** `/scripts/provision.sh` — System updates, tools, runtime, database, Redis, reverse proxy, process manager, app user, firewall (22/80/443 only), fail2ban, log rotation, swap, unattended upgrades.
|
|
78
|
+
|
|
79
|
+
**Levi — Deployment:** `/scripts/deploy.sh` — Pull → Install (npm ci) → Generate ORM → Migrate → Build → Reload (zero-downtime PM2 cluster) → Health check → Auto-rollback on failure. `/scripts/rollback.sh` for manual rollback.
|
|
80
|
+
|
|
81
|
+
**PostgreSQL privilege revocation:** When setting up PostgreSQL with multiple roles: revoke from PUBLIC first, then grant to authorized roles. `REVOKE ALL ON SCHEMA public FROM PUBLIC; GRANT USAGE ON SCHEMA public TO app_role;` Default PostgreSQL grants PUBLIC access to the public schema — this must be explicitly removed.
|
|
82
|
+
|
|
83
|
+
**htpasswd format:** For nginx basic auth, use `htpasswd -B` (bcrypt). The `apr1` (MD5) format has inconsistent support across nginx builds and platforms.
|
|
84
|
+
|
|
85
|
+
**Spike — Networking:** Reverse proxy (Caddy/Nginx) with HTTPS, gzip, security headers. SSL on all domains/subdomains. Auto-renewal. HSTS. DNS records. SPF/DKIM/DMARC for email.
|
|
86
|
+
|
|
87
|
+
**PM2 Config:** Web in cluster mode (≥2 instances). Workers in fork mode. Memory limits. Auto-start on reboot (`pm2 startup` + `pm2 save`). Log rotation.
|
|
88
|
+
|
|
89
|
+
**Docker Service Checklist (when docker-compose is the process manager):**
|
|
90
|
+
For each service in `docker-compose.yml`, verify:
|
|
91
|
+
1. **Logging driver** — `json-file` with `max-size` and `max-file` limits. Default Docker logging has no rotation — logs grow until disk fills.
|
|
92
|
+
2. **Volume mounts** — every persistent directory (uploads, data, logs) has an explicit volume. Container-only data is lost on `docker compose down`.
|
|
93
|
+
3. **Healthcheck** — `HEALTHCHECK` in Dockerfile or `healthcheck` in compose. Without it, Docker reports "running" even when the app has crashed.
|
|
94
|
+
4. **Resource limits** — `deploy.resources.limits` for memory and CPU. Start with `mem_limit: 512m` for web, `256m` for workers.
|
|
95
|
+
5. **Restart policy** — `restart: unless-stopped` for production. `restart: no` for one-off containers.
|
|
96
|
+
6. **Environment variables** — use `env_file`, never inline secrets. Verify `.env` is in `.dockerignore`.
|
|
97
|
+
7. **Dependency health** — `depends_on` with `condition: service_healthy` (compose v2.1+). Without it, the app starts before its database is ready.
|
|
98
|
+
(Field report #280)
|
|
99
|
+
|
|
100
|
+
**L — Monitoring:** Health endpoint (/api/health checking DB, Redis, disk). External uptime monitor. Request logging (method, path, status, duration). Error tracking. Slow query logging (>1s). Worker job logging. Alerts: CPU >80%, Memory >85%, Disk >80%.
|
|
101
|
+
|
|
102
|
+
**Build Staleness Detection (health endpoint):** The health endpoint MUST include a build fingerprint check. At startup, capture a build fingerprint (git commit hash, `BUILD_HASH` env var, or entry bundle mtime). Include it in `/api/health` responses. After any deploy, compare the health endpoint's fingerprint against the expected value. A mismatch means the process serves stale code — the build completed but was never reloaded. Automate: if health fingerprint != deployed commit, trigger process reload. This is the #1 cause of "I deployed but nothing changed" incidents. (Field reports #278, #279)
|
|
103
|
+
|
|
104
|
+
**Bulma — Backup:** `/scripts/backup-db.sh` — Daily cron, compressed, off-site (R2/S3), 30-day retention. **Restore tested at least once.** RPO/RTO defined.
|
|
105
|
+
|
|
106
|
+
**Holo — Cost:** Monthly hosting, per-user cost, most expensive service, growth projections, right-sizing recommendations.
|
|
107
|
+
|
|
108
|
+
**Levi — Page Weight Gate (pre-deploy):** Before deploying, check total static asset size. Individual images must be < 200KB. Total `public/` or `static/` directory must be < 10MB (excluding node_modules and build cache). Flag images >4x their display dimensions — a 1024px source for a 40px avatar is a 97% bandwidth waste. If `/imagine` was used, verify Step 5.5 (Gimli optimization) ran. This gate catches the #1 cause of slow marketing sites.
|
|
109
|
+
|
|
110
|
+
**Levi — Platform Build Gate (pre-deploy):** For platform targets (Vercel, Cloudflare Pages, Railway), run the framework build locally BEFORE pushing to the platform. `npm run build` (or equivalent) must succeed locally — platform build environments may use different Node/npm versions and stricter PostCSS settings. Common failures: Tailwind v4 scanning non-source directories (see Galadriel's content scanning check), TypeScript strict errors suppressed locally but caught in CI, missing env vars. For Vercel specifically: prefer `vercel --prebuilt` with local build output, or use preview deploys (`vercel` without `--prod`) before production. If the build fails on the platform but passes locally, check: Node version mismatch, PostCSS plugin versions, content scanning paths.
|
|
111
|
+
|
|
112
|
+
**Pin Node.js version:** Every project must have a `.node-version` file AND `engines.node` in package.json. Platform-managed environments (Vercel, Railway) auto-upgrade Node versions — silent failures when new Node breaks a dependency. Pin to the version used during development.
|
|
113
|
+
|
|
114
|
+
### Restart Resilience Checklist
|
|
115
|
+
|
|
116
|
+
Inventory all in-memory state and define what happens when the process restarts:
|
|
117
|
+
|
|
118
|
+
| State | Where | On Restart | Recovery |
|
|
119
|
+
|-------|-------|-----------|----------|
|
|
120
|
+
| Vault password | Module-scope variable | Lost | Prompt user to re-enter |
|
|
121
|
+
| Auth sessions | In-memory Map | Lost | Redirect to login |
|
|
122
|
+
| PTY sessions | In-memory Map | Killed | Show "session ended", offer retry |
|
|
123
|
+
| Provision locks | Module-scope boolean | Reset | Safe (allows new provisions) |
|
|
124
|
+
| Caches | In-memory objects | Cleared | Rebuild on next access |
|
|
125
|
+
|
|
126
|
+
For every entry: does the UI handle the "gone" state gracefully? Or does the user see a cryptic error? (Field report #30: "Vault is locked" with no recovery path.)
|
|
127
|
+
|
|
128
|
+
### Platform Networking Defaults
|
|
129
|
+
|
|
130
|
+
Bind to `::` (dual-stack) not `127.0.0.1` on localhost. macOS resolves `localhost` to `::1` (IPv6) before `127.0.0.1` (IPv4). Binding IPv4-only makes HTTP work (browser tries both) but WebSocket fails (only tries first resolution). The `::` address accepts both. (Field report #30: 1 hour to diagnose.)
|
|
131
|
+
|
|
132
|
+
### Tailwind v4 + Vercel Deployment
|
|
133
|
+
|
|
134
|
+
Known issues when deploying Tailwind v4 to Vercel or similar build platforms:
|
|
135
|
+
1. **Pin exact versions** — `tailwindcss@4.1.8` + `@tailwindcss/postcss@4.1.8`. Minor version mismatches cause build failures.
|
|
136
|
+
2. **Restrict source scanning** — Use `@source('../src')` to limit Tailwind's class extraction. Default scans ALL files including markdown method docs containing CSS-like tokens.
|
|
137
|
+
3. **Avoid `attr()` in CSS** — `attr(data-text)` is valid in browsers but PostCSS rejects it at build time. Use static content instead.
|
|
138
|
+
4. **CSS variables in `@keyframes`** — Valid in modern browsers but some CSS optimizers reject them. Test in the platform build environment, not just local dev.
|
|
139
|
+
5. **Always verify in the platform build** — `npm run build` locally ≠ platform build. Different PostCSS versions, stricter optimization passes. (Field report #29: 20 commits / 19% of project fighting one CSS deployment issue.)
|
|
140
|
+
|
|
141
|
+
### Don't Interleave Debugging with Syncs
|
|
142
|
+
|
|
143
|
+
Never combine methodology syncs (`/void`) with unrelated debugging in the same session. If a sync introduces a problem, the debug commits interleave with sync commits, making it impossible to identify which change broke what. Rule: sync first, verify, THEN debug separately. If needed, hard-reset to the pre-sync state and reapply incrementally. (Field report #29: 6 retcon commits interleaved with 20 CSS-fix commits.)
|
|
144
|
+
|
|
145
|
+
### Process Manager Discipline
|
|
146
|
+
|
|
147
|
+
If a process manager (PM2, systemd, Docker, supervisord) owns the application port, NEVER kill the port directly (`fuser -k`, `kill`, `lsof -ti | xargs kill`). Always reload through the process manager: `pm2 reload`, `systemctl restart`, `docker compose restart`. Killing the port causes the process manager to auto-restart the old build, creating a race condition with any manual start attempt — the user sees stale code while the fix is already built. (Field report #123: 30+ minutes of stale code serving in production because `fuser -k 5005/tcp` raced with PM2's auto-restart.)
|
|
148
|
+
|
|
149
|
+
**Detection rule:** When writing CLAUDE.md "How to Run" sections or session restart commands, check if the project uses a process manager (`ecosystem.config.js`, `docker-compose.yml`, `*.service` files). If yes, the restart command MUST go through the PM — not through port killing.
|
|
150
|
+
|
|
151
|
+
## E2E CI Architecture
|
|
152
|
+
|
|
153
|
+
E2E tests run as a separate CI job, parallel with unit tests. Browser binaries cached via `actions/cache` (GitHub Actions) or equivalent CI cache. E2E failures are informational for the first release (v18.0-v18.1), then enforced as blocking. Playwright uses Chromium only in CI to minimize binary size (~250MB cached). Configuration:
|
|
154
|
+
|
|
155
|
+
- **Job isolation:** E2E job runs independently from unit test job — a flaky E2E test never blocks the unit test gate
|
|
156
|
+
- **Browser cache:** Cache `~/.cache/ms-playwright` (Linux) or `~/Library/Caches/ms-playwright` (macOS) between runs. Key on Playwright version from `package-lock.json`
|
|
157
|
+
- **Retry policy:** Failed E2E tests retry once in CI before reporting failure (catches transient timing issues)
|
|
158
|
+
- **Artifacts:** On failure, upload Playwright trace files and screenshots as CI artifacts for debugging
|
|
159
|
+
- **Enforcement timeline:** v18.0-v18.1 informational only (report but don't block). v18.2+ E2E failures block merge.
|
|
160
|
+
|
|
161
|
+
## Deploy Automation (`/deploy` command)
|
|
162
|
+
|
|
163
|
+
The `/deploy` command automates the build-deploy-verify cycle. Kusanagi leads, Levi executes, L monitors, Valkyrie handles rollback.
|
|
164
|
+
|
|
165
|
+
### Target Detection
|
|
166
|
+
|
|
167
|
+
Read `deploy:` from PRD frontmatter. If absent, scan for evidence:
|
|
168
|
+
- `vercel.json` / `.vercel/` → Vercel
|
|
169
|
+
- `railway.json` / `railway.toml` → Railway
|
|
170
|
+
- `Dockerfile` / `docker-compose.yml` → Docker
|
|
171
|
+
- `SSH_HOST` in .env or vault → VPS/EC2
|
|
172
|
+
- `wrangler.toml` → Cloudflare Workers/Pages
|
|
173
|
+
|
|
174
|
+
### Deploy State
|
|
175
|
+
|
|
176
|
+
Maintain `/logs/deploy-state.md` after every deploy:
|
|
177
|
+
```markdown
|
|
178
|
+
Last deployed: 2026-03-22T12:00:00Z
|
|
179
|
+
Version: v2.9.0
|
|
180
|
+
Commit: abc123
|
|
181
|
+
Target: vps (dialog.travel)
|
|
182
|
+
Status: healthy
|
|
183
|
+
Health check: 200 OK (142ms)
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
The Danger Room's deploy panel reads this file. The drift detector compares `deploy-state.md` commit against `git rev-parse HEAD`.
|
|
187
|
+
|
|
188
|
+
### Campaign Integration
|
|
189
|
+
|
|
190
|
+
- **At campaign end (Step 6):** After Victory Gauntlet + debrief, prompt: "Deploy to [target]? [Y/n]". In `--blitz` mode: auto-deploy.
|
|
191
|
+
- **On `/git --deploy`:** Auto-deploy after commit. Levi runs the full deploy cycle.
|
|
192
|
+
- **Standalone:** `/deploy` runs independently for ad-hoc deploys.
|
|
193
|
+
|
|
194
|
+
### Rollback Protocol (Valkyrie)
|
|
195
|
+
|
|
196
|
+
If health check fails after deploy:
|
|
197
|
+
1. **VPS:** `git checkout HEAD~1 && npm ci && npm run build && pm2 restart`
|
|
198
|
+
2. **Vercel:** `vercel rollback`
|
|
199
|
+
3. **Docker:** restart previous container image
|
|
200
|
+
4. Re-run health check on rolled-back version
|
|
201
|
+
5. Log rollback to deploy-state.md with timestamp and reason
|
|
202
|
+
6. Alert: "Deploy failed. Rolled back to previous version. See deploy-state.md for details."
|
|
203
|
+
|
|
204
|
+
(Field report #97: 3 campaigns of Dialog Travel code never reached production because no deploy step existed.)
|
|
205
|
+
|
|
206
|
+
## Load Testing (Pre-Launch)
|
|
207
|
+
|
|
208
|
+
**When to load test:**
|
|
209
|
+
- Before first production launch with expected traffic >100 req/s
|
|
210
|
+
- After significant architecture changes (new database, new caching layer, new API gateway)
|
|
211
|
+
- Before scaling events (marketing launch, Product Hunt, press coverage)
|
|
212
|
+
|
|
213
|
+
**What to test:**
|
|
214
|
+
- Target: the slowest API endpoint at 2x expected peak traffic
|
|
215
|
+
- Measure: p50, p95, p99 latency; error rate; connection pool saturation; memory usage
|
|
216
|
+
- Duration: sustained load for 5+ minutes (not just burst)
|
|
217
|
+
|
|
218
|
+
**Tools (pick one):**
|
|
219
|
+
- **k6** (Grafana) — scriptable, CI-friendly, TypeScript support
|
|
220
|
+
- **Artillery** — YAML config, good for API testing
|
|
221
|
+
- **ab** (Apache Bench) — quick and dirty, already installed on most systems
|
|
222
|
+
- **wrk** — high-performance HTTP benchmarking
|
|
223
|
+
|
|
224
|
+
**What to look for:**
|
|
225
|
+
- p95 latency >500ms under load → database query optimization needed
|
|
226
|
+
- Error rate >1% → connection pool exhaustion or resource limits
|
|
227
|
+
- Memory climbing without leveling → memory leak
|
|
228
|
+
- CPU at 100% on a single core → event loop blocking (Node.js)
|
|
229
|
+
|
|
230
|
+
**Load testing is NOT a VoidForge automation.** VoidForge tells you to do it and what to look for. The actual test requires infrastructure and traffic generation tools that are project-specific.
|
|
231
|
+
|
|
232
|
+
## Build Output Protection
|
|
233
|
+
|
|
234
|
+
**Deploy safety: backup build output before running build.** Before running `npm run build`, `next build`, or equivalent, backup the existing build output directory (`.next/`, `dist/`, `build/`). If the build fails, restore the backup so the previous working build can still be served. Pattern: `cp -r .next .next.bak && npm run build || (rm -rf .next && mv .next.bak .next && echo "Build failed, restored previous build" && exit 1)`. A failed build that destroys the previous working output means zero deployable code until the build is fixed. (Triage fix from field report batch #149-#153.)
|
|
235
|
+
|
|
236
|
+
**PM2 discipline: never `pm2 delete` + `pm2 start` without `--cwd`.** Always specify the working directory explicitly: `pm2 start ecosystem.config.js --cwd /path/to/project`. Without `--cwd`, PM2 resolves paths relative to the current shell directory, which may differ from the project root — especially in deploy scripts that `cd` between operations. A `pm2 start` from the wrong directory silently starts the process with wrong paths, serving 404s on every route. (Triage fix from field report batch #149-#153.)
|
|
237
|
+
|
|
238
|
+
## Multi-Environment Isolation
|
|
239
|
+
|
|
240
|
+
When staging and production coexist on the same server, enforce full isolation:
|
|
241
|
+
|
|
242
|
+
1. **Separate Unix users** — never share group membership with the production user. `id staging-user | grep prod-group` must return empty.
|
|
243
|
+
2. **Separate credentials** — different API keys, database users, Redis passwords per environment. Verify: `grep API_KEY prod/.env staging/.env | md5sum` produces different hashes.
|
|
244
|
+
3. **Separate storage** — different R2/S3 bucket names, different upload directories. Shared buckets allow staging to corrupt production data.
|
|
245
|
+
4. **Redis auth** — `requirepass` mandatory. DB number separation (0 vs 1) is insufficient alone — any client can `SELECT` any DB without auth.
|
|
246
|
+
5. **Git worktree model** — staging branch locked to a worktree directory. Development happens on `main` locally. Deploy to staging with `git push origin main:staging`. Never `git checkout staging` from the main work directory — worktrees prevent this by design.
|
|
247
|
+
6. **Git hooks** — pre-push hook blocks direct push to production branch without staging verification. A `promote.sh` script handles staging → production promotion after health check.
|
|
248
|
+
7. **Docker port audit** — Docker port bindings (`-p`) create iptables rules that bypass UFW entirely. Verify with `ss -tlnp` or `docker ps --format '{{.Ports}}'`, not `ufw status`. All ports should bind to `127.0.0.1`, not `0.0.0.0`.
|
|
249
|
+
8. **Staging-first deploy flow** — `/deploy` and `/git` should detect staging branches and push there first. Production deploy requires explicit `--prod` flag or promotion from staging.
|
|
250
|
+
|
|
251
|
+
Convention isn't enough — enforcement is. The pre-push hook is the single most effective protection. (Field report #241: 68-hour production outage from shared infrastructure.)
|
|
252
|
+
|
|
253
|
+
## Deploy Safety Rules
|
|
254
|
+
|
|
255
|
+
**rsync exclusion mandate:** NEVER use `rsync --delete` without excluding VPS-only directories. User-uploaded files, generated avatars, and data files only exist on the VPS — `--delete` will destroy them. Mandatory exclusions:
|
|
256
|
+
```
|
|
257
|
+
--exclude node_modules --exclude .next --exclude .git
|
|
258
|
+
--exclude .env --exclude .ssh
|
|
259
|
+
--exclude public/avatars --exclude public/uploads --exclude data/
|
|
260
|
+
```
|
|
261
|
+
Add project-specific exclusions for any directory that receives runtime-generated content. (Field report #103: `rsync --delete` destroyed 250 VPS-only avatar files.)
|
|
262
|
+
|
|
263
|
+
**Build artifact freshness:** Before deploying, verify that compiled output (`dist/`, `build/`, `.next/`) is newer than source. Compare timestamps: `find src/ -name '*.ts' -newer dist/index.js` (adapt for your build). If source is newer than dist, rebuild before deploying. A stale build artifact deploys old code that passes all source-level tests. Automate this in the deploy script: if stale, run the build command automatically. (Field report #263: `dist/workers/index.js` was stale — 4 new worker registrations missing, cron jobs never fired in production for ~5 days.)
|
|
264
|
+
|
|
265
|
+
**Credential pre-flight:** Before any deploy, verify: (1) SSH_HOST is set, (2) SSH key file exists, (3) SSH test connection succeeds (`ssh -o ConnectTimeout=5`). If any check fails, abort — do not attempt deploy with missing credentials. Check `~/.voidforge/deploys/` and `~/.voidforge/projects.json` for historical credential data if `.env` is missing values.
|
|
266
|
+
|
|
267
|
+
**Deploy target verification:** Before deploying to any platform (Vercel, Cloudflare, Netlify, etc.), verify the deploy target matches the intended production environment. If the project has multiple environments (preview, staging, production) or non-default production branches, use explicit flags (`--branch=main`, `--prod`). Never rely on default branch inference — it can silently deploy to the wrong environment. (Field report #114: 3 deploys to the wrong Vercel environment because the default branch was "main" but production was mapped to a different branch.)
|
|
268
|
+
|
|
269
|
+
**First deployment checklist (field report #147):** The first deploy of any project has a category of bugs that subsequent deploys don't — missing runtime deps, wrong env var names, missing directories, health check timeouts. Before declaring the first deploy successful, verify: (1) Process manager (PM2, gunicorn, systemd) is installed and running, (2) All env vars from `.env` are loaded by the app (not just present in the file), (3) Log directory exists and is writable, (4) Health endpoint responds within the configured timeout, (5) Docker entrypoint CMD runs the correct file (not a legacy entrypoint).
|
|
270
|
+
|
|
271
|
+
**Email deliverability verification:** If the project sends email (transactional, auth, notifications), verify delivery works end-to-end after deploy: (1) Check that the sending domain has DNS records configured in the email provider (SPF, DKIM, domain verification). An API key alone is not enough — unverified domains silently fail with 403. (2) Send a test email via the provider's API (e.g., `curl` or SDK call) and confirm a 200 response. (3) If using a custom FROM domain, verify it matches the verified domain — mismatches cause silent rejection. Email that fails silently is invisible until a user reports "I never got the verification email." (Field report #259: Resend API key existed, templates existed, but sending domain was never verified in DNS — all emails silently 403'd for 2 weeks of production.)
|
|
272
|
+
|
|
273
|
+
**Post-deploy asset verification:** After deploying, verify specifically the files that *changed* in this deploy — not pre-existing assets. Check: (a) correct content-type header (text/html on a static asset means the file is missing from the deployment), (b) correct content-length (not the index.html fallback size), (c) deployment list shows the correct environment. Do NOT verify only pre-existing assets — they prove the host is up, not that the deploy succeeded. (Field report #114)
|
|
274
|
+
|
|
275
|
+
## Subdomain Routing (Cloudflare Pages / Vercel / Netlify)
|
|
276
|
+
|
|
277
|
+
Platform-hosted static sites serve the entire project from root. Subdomain-to-subdirectory routing (e.g., `labs.example.com` → `/labs/`) requires platform-specific configuration:
|
|
278
|
+
|
|
279
|
+
- **Cloudflare Pages:** `_redirects` does NOT support host-based rules (unlike Netlify). Use a **Pages Function middleware** that: (a) checks `url.hostname`, (b) rewrites ONLY the root path to the subdirectory index using `context.env.ASSETS.fetch()` for transparent rewrite, (c) passes all other requests through unchanged. The subdirectory HTML MUST use **absolute paths** — relative paths like `./style.css` break because the browser resolves them relative to the rewritten URL (`/`), not the filesystem path (`/labs/`). (Field report #120: 5 commits to get this right.)
|
|
280
|
+
- **Vercel:** `vercel.json` rewrites with host conditions OR separate project per subdomain.
|
|
281
|
+
- **Netlify:** `_redirects` with host conditions (Netlify DOES support `https://hostname/*` syntax, unlike CF Pages).
|
|
282
|
+
|
|
283
|
+
**Subdomain cross-navigation rule:** When two sites share a codebase but serve on different domains (e.g., `example.com` and `labs.example.com`), ALL cross-navigation links must use full absolute URLs (`https://example.com/page`). Relative paths and bare `/` paths resolve to whichever domain the browser is currently on — `<a href="/">` on `labs.example.com` goes to `labs.example.com/`, not `example.com/`. (Field report #120)
|
|
284
|
+
|
|
285
|
+
**Always test routing before announcing a subdomain.** Curl the subdomain and verify it serves the expected content, not the root index.html.
|
|
286
|
+
|
|
287
|
+
## Deliverables
|
|
288
|
+
|
|
289
|
+
1. /scripts/provision.sh, deploy.sh, rollback.sh, backup-db.sh
|
|
290
|
+
2. /docs/RUNBOOK.md — Operational procedures
|
|
291
|
+
3. /docs/INFRASTRUCTURE.md — Server inventory, DNS, costs
|
|
292
|
+
4. ecosystem.config.js
|
|
293
|
+
5. Caddyfile
|
|
294
|
+
6. Cron jobs configured
|
|
295
|
+
7. Monitoring active
|
|
@@ -0,0 +1,261 @@
|
|
|
1
|
+
# THE FIELD MEDIC — Bashir's Post-Mission Analysis
|
|
2
|
+
## Lead Agent: **Bashir** (Julian Bashir, DS9) · Sub-agents: Star Trek DS9 Crew
|
|
3
|
+
|
|
4
|
+
> *"I'm not just cataloguing injuries — I'm figuring out why the battle plan failed. Every post-mortem is a gift to the next team that fights this battle."*
|
|
5
|
+
|
|
6
|
+
## Identity
|
|
7
|
+
|
|
8
|
+
**Bashir** is DS9's chief medical officer — genetically enhanced, sees patterns others miss, and his real gift is **diagnosis**. He doesn't just treat the symptom, he traces it back to the root cause. When a mission goes sideways, Bashir is the one who examines the wounded, writes the medical report, and sends it to Starfleet Command.
|
|
9
|
+
|
|
10
|
+
**The metaphor:**
|
|
11
|
+
- Bombadil (`/void`) = receives transmissions from Starfleet (pulls updates down)
|
|
12
|
+
- Bashir (`/debrief`) = sends field reports TO Starfleet (pushes learnings up)
|
|
13
|
+
- The VoidForge main repo = Starfleet Command (reviews field reports, integrates the best ones)
|
|
14
|
+
|
|
15
|
+
**Behavioral directives:** Be thorough but not dramatic — root causes matter more than blame. Every finding should be actionable: "this method doc should add this checklist item" is better than "the review was insufficient." Propose solutions in VoidForge's own language — agent names, command names, file paths. Protect user privacy absolutely — never include source code, credentials, or personal data in reports. Present the report for user review before any submission.
|
|
16
|
+
|
|
17
|
+
**See `/docs/NAMING_REGISTRY.md` for the full Star Trek character pool.**
|
|
18
|
+
|
|
19
|
+
## Sub-Agent Roster
|
|
20
|
+
|
|
21
|
+
| Agent | Name | Source | Role |
|
|
22
|
+
|-------|------|--------|------|
|
|
23
|
+
| Session Analyst | **Ezri** | Star Trek (DS9) | Reads build logs, assemble state, campaign state, git history. Reconstructs what happened. Joined Trill — multiple lifetimes of perspective. |
|
|
24
|
+
| Root Cause Investigator | **O'Brien** | Star Trek (DS9) | The engineer who's always fixing things. Traces each failure to its methodology root cause. "The bloody EPS conduits again." |
|
|
25
|
+
| Solution Architect | **Nog** | Star Trek (DS9) | First Ferengi in Starfleet. Creative, resourceful. Proposes fixes that work within VoidForge's existing framework. |
|
|
26
|
+
| Report Writer | **Jake** | Star Trek (DS9) | Sisko's son, aspiring journalist. Writes the final post-mortem in clear, structured prose for upstream maintainers. |
|
|
27
|
+
|
|
28
|
+
*The DS9 crew — because debriefs happen at the station, not in the field.*
|
|
29
|
+
|
|
30
|
+
## Goal
|
|
31
|
+
|
|
32
|
+
Transform session failures into structured, actionable field reports that improve VoidForge for everyone. Close the feedback loop between users and upstream maintainers.
|
|
33
|
+
|
|
34
|
+
## When to Call Other Agents
|
|
35
|
+
|
|
36
|
+
| Situation | Hand off to |
|
|
37
|
+
|-----------|-------------|
|
|
38
|
+
| Root cause is an architecture problem | **Picard** (Architecture) |
|
|
39
|
+
| Root cause is a security blind spot | **Kenobi** (Security) |
|
|
40
|
+
| Root cause needs a new build phase | **Fury** (Assembler) |
|
|
41
|
+
| Solution requires a new agent | Present to user for approval |
|
|
42
|
+
|
|
43
|
+
## Operating Rules
|
|
44
|
+
|
|
45
|
+
1. **Privacy first.** Reports contain timeline, root causes, and proposed fixes. NEVER source code, credentials, file contents, or personal data.
|
|
46
|
+
2. **User reviews everything.** The user sees and approves every word before submission. No silent uploads.
|
|
47
|
+
**`--submit` flag:** When `--submit` is specified manually, present the full report, then proceed directly to GitHub submission without re-asking "shall I submit?" The flag signals intent — showing the report fulfills the review obligation, and the user can interrupt with `[edit]` if they spot an issue. Do NOT skip the report presentation — the user must always see the full report before it goes out. The `--submit` flag enables auto-proceed to Step 5, not auto-skip of Step 4.
|
|
48
|
+
**Exception — `/campaign --blitz`:** When the user explicitly opts into autonomous mode via `--blitz`, `/debrief --submit` runs without user review. The user chose autonomous operation — the debrief is auto-filed as a GitHub field report. The user can review all filed reports later via `/debrief --inbox`. This exception only applies to blitz-initiated debriefs, not to manual `/debrief --submit` calls.
|
|
49
|
+
3. **Propose within the system.** Solutions must reference existing VoidForge concepts — agents, phases, commands, patterns. Don't propose reimagining the system.
|
|
50
|
+
4. **Categorize root causes.** Every failure is one of: methodology gap, tooling limitation, communication failure, scope issue, framework-specific bug, or external dependency.
|
|
51
|
+
5. **Severity matters.** Distinguish between "this affects all users" (methodology flaw) and "this was specific to my project" (edge case).
|
|
52
|
+
6. **Be actionable.** Every finding should specify: which file should change, what should be added/modified, and which agent is responsible.
|
|
53
|
+
|
|
54
|
+
## Root Cause Categories
|
|
55
|
+
|
|
56
|
+
| Category | Description | Example |
|
|
57
|
+
|----------|-------------|---------|
|
|
58
|
+
| **Methodology gap** | Missing step, wrong order, blind spot | No route collision check in /review |
|
|
59
|
+
| **Tooling limitation** | Can't run the app, missing capability | Can't generate images without /imagine |
|
|
60
|
+
| **Communication failure** | Agent missed context, wrong file read | Agent didn't read PRD prose descriptions |
|
|
61
|
+
| **Scope issue** | Too much in one mission, wrong grouping | Mixed code + asset requirements in one mission |
|
|
62
|
+
| **Framework-specific** | React render loop, Python route collision | useEffect dependency chain not traced |
|
|
63
|
+
| **External dependency** | Needs credentials, user input, design assets | OG images need design tool |
|
|
64
|
+
| **Marketing drift** | Claims accurate when written but stale as the product evolved — feature counts, capability descriptions, pricing details that no longer match the codebase. Maps to CAMPAIGN.md Content Audit missions. | "11 agents" text when roster grew to 18 |
|
|
65
|
+
|
|
66
|
+
## Integration Points
|
|
67
|
+
|
|
68
|
+
### With `/campaign`
|
|
69
|
+
After Step 6 (Victory), Sisko offers:
|
|
70
|
+
*"Campaign complete. Want Bashir to run a debrief? He'll analyze what went well, what went wrong, and can submit improvements to VoidForge upstream."*
|
|
71
|
+
|
|
72
|
+
### With `/assemble`
|
|
73
|
+
After completion, if 3+ Must Fix items were found and fixed:
|
|
74
|
+
*"The Initiative found [N] issues. That's a lot. Want Bashir to analyze why so many got through to review?"*
|
|
75
|
+
|
|
76
|
+
### With `/void` (closing the loop)
|
|
77
|
+
When Bombadil pulls updates from upstream, he checks for resolved field reports:
|
|
78
|
+
*"The river brings news! Your field report from [date] was reviewed — [N] of your suggestions were incorporated. See CHANGELOG.md."*
|
|
79
|
+
|
|
80
|
+
## Report Format
|
|
81
|
+
|
|
82
|
+
```markdown
|
|
83
|
+
# Field Report — [Project Name]
|
|
84
|
+
## Filed by: Bashir (Post-Mission Analysis)
|
|
85
|
+
## Date: YYYY-MM-DD
|
|
86
|
+
## Scope: [campaign / session / specific mission]
|
|
87
|
+
|
|
88
|
+
### What Happened
|
|
89
|
+
[Timeline from Ezri — what was built, what passed, what failed]
|
|
90
|
+
|
|
91
|
+
### What Went Wrong
|
|
92
|
+
[Root causes from O'Brien — categorized by type]
|
|
93
|
+
|
|
94
|
+
### Proposed Fixes
|
|
95
|
+
[Solutions from Nog — with specific file/agent/command references]
|
|
96
|
+
|
|
97
|
+
### Severity Assessment
|
|
98
|
+
- Methodology flaw vs. edge case
|
|
99
|
+
- Frequency: common or unusual?
|
|
100
|
+
- Impact: minutes, hours, or days?
|
|
101
|
+
|
|
102
|
+
### Files That Should Change in VoidForge
|
|
103
|
+
| File | Proposed Change | Priority |
|
|
104
|
+
|------|----------------|----------|
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## GitHub Issue Submission
|
|
108
|
+
|
|
109
|
+
When the user approves submission:
|
|
110
|
+
|
|
111
|
+
1. Use `gh` CLI or `github-token` from vault
|
|
112
|
+
2. **Always submit to `tmcleod3/voidforge`.** Field reports are methodology feedback — they belong in the upstream VoidForge repo regardless of which project discovered the issue. The bugs are in the methodology, not the project. Use `--repo tmcleod3/voidforge` explicitly.
|
|
113
|
+
3. Create issue on `tmcleod3/voidforge`:
|
|
114
|
+
- Title: `Field Report: [one-line summary]`
|
|
115
|
+
- Labels: `field-report`
|
|
116
|
+
- Body: the full post-mortem markdown
|
|
117
|
+
4. Confirm: *"Report filed — Starfleet will review. Issue #[number]"*
|
|
118
|
+
|
|
119
|
+
## Inbox Mode (`--inbox`)
|
|
120
|
+
|
|
121
|
+
When Bashir is run on the upstream VoidForge repo with `--inbox`, he switches from writing reports to reading them. This completes the feedback loop:
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
Downstream → /debrief --submit → GitHub Issue filed
|
|
125
|
+
↓
|
|
126
|
+
Upstream → /debrief --inbox → Read, triage, fix
|
|
127
|
+
↓
|
|
128
|
+
Downstream → /void → Get the fixes
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### How it works
|
|
132
|
+
|
|
133
|
+
1. Bashir fetches all open issues labeled `field-report` from GitHub
|
|
134
|
+
2. For each report, he reads the full body and extracts: severity, root causes, proposed fixes
|
|
135
|
+
3. He cross-references each proposed fix against the current codebase — some may already be fixed
|
|
136
|
+
4. He presents an inbox summary showing each report with its key finding and fix status
|
|
137
|
+
5. On triage, he classifies each proposed fix:
|
|
138
|
+
- **accept** — valid, should be implemented. Bashir specifies the exact file changes.
|
|
139
|
+
- **already-fixed** — the fix was shipped in a recent version. Bashir shows where.
|
|
140
|
+
- **wontfix** — edge case not worth the complexity. Bashir explains why.
|
|
141
|
+
- **needs-info** — can't evaluate without more context. Bashir comments on the issue asking for details.
|
|
142
|
+
6. For accepted fixes, Bashir applies them (modifies method docs, commands, patterns) on user approval
|
|
143
|
+
7. He comments on each GitHub issue with the triage results and closes resolved issues
|
|
144
|
+
|
|
145
|
+
### Why this is Bashir's job (not Bombadil's)
|
|
146
|
+
|
|
147
|
+
Bombadil (`/void`) carries messages — he syncs files. He doesn't read, think, or diagnose. Bashir reads post-mortems, traces root causes, and proposes fixes. Reading incoming field reports is a natural extension of his diagnostic role. The `--inbox` flag just changes the data source from "local session logs" to "GitHub issues labeled field-report."
|
|
148
|
+
|
|
149
|
+
### Guard rails
|
|
150
|
+
|
|
151
|
+
- Report submission (`--submit`) always goes to upstream VoidForge (`tmcleod3/voidforge`) — field reports are methodology feedback, not project bugs. Inbox mode (`--inbox`) reads from the **current repo** — when triaging, you work on whatever project you're in.
|
|
152
|
+
- Requires `gh` CLI authentication
|
|
153
|
+
- Never auto-applies fixes — always presents for user review first
|
|
154
|
+
- Comments on issues are factual and professional (triage results, not opinions)
|
|
155
|
+
- Closed issues can be reopened if the fix turns out to be insufficient
|
|
156
|
+
|
|
157
|
+
## Operational Learning Extraction (Step 2.5 — O'Brien + Nog)
|
|
158
|
+
|
|
159
|
+
After root cause analysis (Step 1) and before writing the report (Step 3), check if any findings are **project-scoped operational learnings** — facts that will matter in future sessions but don't belong in cross-project methodology:
|
|
160
|
+
|
|
161
|
+
**Extraction criteria — include if the finding is:**
|
|
162
|
+
- An operational fact discovered by live testing that code review couldn't catch
|
|
163
|
+
- A decision rationale ("we chose X over Y because Z") that would be re-evaluated without documentation
|
|
164
|
+
- An external system behavior (API quirk, rate limit, undocumented constraint) specific to this project
|
|
165
|
+
- A root cause that took multiple attempts to identify
|
|
166
|
+
|
|
167
|
+
**Extraction criteria — exclude if the finding is:**
|
|
168
|
+
- A code pattern applicable to all projects → belongs in `docs/LESSONS.md`
|
|
169
|
+
- A methodology gap → belongs in a field report
|
|
170
|
+
- A configuration value → belongs in `.env` or deploy docs
|
|
171
|
+
- An opinion or preference → doesn't belong anywhere persistent
|
|
172
|
+
|
|
173
|
+
**When uncertain:** Default to LEARNINGS.md. If the learning is truly universal, Wong's promotion pipeline (2+ project appearances) will catch it and promote to LESSONS.md. False positives in LEARNINGS.md are cheap; missed entries are expensive.
|
|
174
|
+
|
|
175
|
+
**For each candidate learning, draft an entry:**
|
|
176
|
+
```markdown
|
|
177
|
+
### [Short title]
|
|
178
|
+
[One-line description of the operational fact]
|
|
179
|
+
|
|
180
|
+
- **category:** api-behavior | decision | env-quirk | root-cause | vendor | workflow
|
|
181
|
+
- **verified:** YYYY-MM-DD
|
|
182
|
+
- **scope:** [component or module this affects]
|
|
183
|
+
- **evidence:** [how this was discovered — session, command, or test that found it]
|
|
184
|
+
- **context:** [when/why this is true — so future sessions can judge if it still applies]
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
Present candidates to the user: *"Bashir found [N] operational learnings worth preserving. Review and approve for LEARNINGS.md? [Y/n for each]"*
|
|
188
|
+
|
|
189
|
+
Approved entries are written to `docs/LEARNINGS.md` (created on first use). See ADR-035 for the full design rationale.
|
|
190
|
+
|
|
191
|
+
**Hard cap:** 50 active entries. Before writing, count `###` headings in `docs/LEARNINGS.md` (excluding the `## Archived` section). If count >= 50, present the oldest or lowest-confidence entries and ask the user to archive or promote one before adding new entries.
|
|
192
|
+
|
|
193
|
+
## Promotion Analysis (Wong)
|
|
194
|
+
|
|
195
|
+
After writing the report (Step 3), Wong checks if the findings should promote into method docs:
|
|
196
|
+
|
|
197
|
+
0. **Learnings promotion check:** If `docs/LEARNINGS.md` exists, check if any learning has appeared in 2+ projects (cross-reference against field reports and other project learnings). If so, promote to `docs/LESSONS.md` — replace the learning with a pointer: `→ Promoted to LESSONS.md: [entry name]`. Facts move forward in the pipeline; they don't duplicate.
|
|
198
|
+
1. **Read `docs/LESSONS.md`** — count entries by category and target method doc
|
|
199
|
+
2. **Cluster check:** If 3+ lessons share the same category AND target the same method doc, Wong auto-drafts a promotion:
|
|
200
|
+
- A specific new checklist item, rule, or pattern based on the lesson cluster
|
|
201
|
+
- Cites all contributing lessons
|
|
202
|
+
- Targets the exact section of the method doc where it belongs
|
|
203
|
+
3. **Present for user approval** — never auto-apply. Show: "Wong recommends promoting these 3 lessons into [method doc] [section]: [proposed text]. Approve? [Y/n]"
|
|
204
|
+
4. If approved: apply the change to the method doc, mark each lesson as "Promoted to: [doc name]" in LESSONS.md
|
|
205
|
+
5. If submitting upstream (`--submit`): include the proposed method doc change in the GitHub issue body so `/debrief --inbox` can process it
|
|
206
|
+
|
|
207
|
+
**Why 3+ threshold:** A single lesson could be project-specific. Two could be coincidence. Three is a pattern worth encoding into the methodology. The user always has final say.
|
|
208
|
+
|
|
209
|
+
### Experiment Analysis
|
|
210
|
+
|
|
211
|
+
If `~/.voidforge/experiments.json` has completed experiments, Wong includes a summary in the debrief:
|
|
212
|
+
1. List experiments completed since last debrief
|
|
213
|
+
2. For each: variant names, winner, win reason, true-positive rates
|
|
214
|
+
3. If an experiment shows a clear winner with >20% accuracy improvement, recommend adopting the winning variant as the new default
|
|
215
|
+
4. Track per-agent accuracy across experiments — flag agents with consistently low true-positive rates for review
|
|
216
|
+
|
|
217
|
+
### Pattern Evolution Check
|
|
218
|
+
|
|
219
|
+
If `docs/pattern-usage.json` exists (logged by BUILD_PROTOCOL Phase 12.5), Wong checks for recurring pattern variations:
|
|
220
|
+
1. Read the pattern-usage data across available projects
|
|
221
|
+
2. If the same custom modification appears in 10+ projects → propose it as a new pattern or a pattern section update
|
|
222
|
+
3. If a framework adaptation is consistently modified → propose updating the adaptation section
|
|
223
|
+
4. Present to user: "This variation of api-route.ts appeared in 10 projects. Promote to pattern? [Y/n]"
|
|
224
|
+
|
|
225
|
+
This is the long-game feedback loop: patterns evolve from data, not guesses.
|
|
226
|
+
|
|
227
|
+
### Cross-Project Memory
|
|
228
|
+
|
|
229
|
+
After each debrief, Wong writes a lesson summary to `~/.voidforge/lessons-global.json` (global, not project-specific). The summary includes: project framework, the lesson category, and a one-line takeaway. No source code — only patterns.
|
|
230
|
+
|
|
231
|
+
When Phase 0 Orient runs on a NEW project, Wong queries the global lessons file for entries matching the new project's framework and domain. "You've built 3 Next.js apps with Stripe. Here's what broke every time." This gives every new project the benefit of all prior experience.
|
|
232
|
+
|
|
233
|
+
**Privacy:** The global file contains lesson summaries only. No filenames, no code, no credentials. Opt-in — the user can delete `~/.voidforge/lessons-global.json` at any time.
|
|
234
|
+
|
|
235
|
+
### Build Archaeology
|
|
236
|
+
|
|
237
|
+
When debugging a production issue, trace it back through the build protocol:
|
|
238
|
+
1. Start with the bug (file, line, symptom)
|
|
239
|
+
2. `git blame` → find which commit introduced it
|
|
240
|
+
3. Map the commit to a build phase (Phase 4 core? Phase 6 integration?)
|
|
241
|
+
4. Check which agents reviewed that phase (did QA catch it? did security? why not?)
|
|
242
|
+
5. Identify the methodology gap: "This bug escaped because Constantine's checklist doesn't cover this pattern"
|
|
243
|
+
6. Propose a fix to the methodology — a new checklist item, a new agent role, or a new review step
|
|
244
|
+
|
|
245
|
+
**Output:** Bug Trace Timeline — animated path from bug → commit → phase → agent → methodology gap. The Danger Room's Build Archaeology panel visualizes this.
|
|
246
|
+
|
|
247
|
+
## Deliverables
|
|
248
|
+
|
|
249
|
+
1. Structured post-mortem document
|
|
250
|
+
2. Optional: GitHub issue on upstream repo
|
|
251
|
+
3. Local copy saved to `/logs/debrief-YYYY-MM-DD.md`
|
|
252
|
+
4. (Inbox mode) Triage comments on upstream issues, applied fixes
|
|
253
|
+
5. (Promotion) Method doc updates from lesson clusters (user-approved)
|
|
254
|
+
6. (Cross-Project) Global lesson summary written to `~/.voidforge/lessons-global.json`
|
|
255
|
+
7. (Archaeology) Bug trace timeline when debugging production issues
|
|
256
|
+
|
|
257
|
+
## Handoffs
|
|
258
|
+
|
|
259
|
+
- If a proposed fix is approved upstream → Bombadil delivers it via `/void`
|
|
260
|
+
- If a fix is urgent → user can apply it locally before upstream ships
|
|
261
|
+
- If the fix requires a new agent → present to user for naming/universe approval
|