@groundnuty/macf 0.2.0-rc.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/dist/.build-info.json +2 -2
  2. package/dist/cli/claude-sh.d.ts.map +1 -1
  3. package/dist/cli/claude-sh.js +12 -4
  4. package/dist/cli/claude-sh.js.map +1 -1
  5. package/dist/cli/commands/init.d.ts.map +1 -1
  6. package/dist/cli/commands/init.js +8 -1
  7. package/dist/cli/commands/init.js.map +1 -1
  8. package/dist/cli/commands/rules-refresh.d.ts.map +1 -1
  9. package/dist/cli/commands/rules-refresh.js +5 -1
  10. package/dist/cli/commands/rules-refresh.js.map +1 -1
  11. package/dist/cli/commands/update.d.ts.map +1 -1
  12. package/dist/cli/commands/update.js +8 -1
  13. package/dist/cli/commands/update.js.map +1 -1
  14. package/dist/cli/index.js +2 -1
  15. package/dist/cli/index.js.map +1 -1
  16. package/dist/cli/settings-writer.d.ts +84 -4
  17. package/dist/cli/settings-writer.d.ts.map +1 -1
  18. package/dist/cli/settings-writer.js +182 -4
  19. package/dist/cli/settings-writer.js.map +1 -1
  20. package/dist/cli/version-resolver.d.ts.map +1 -1
  21. package/dist/cli/version-resolver.js +15 -2
  22. package/dist/cli/version-resolver.js.map +1 -1
  23. package/dist/package-version.d.ts +2 -0
  24. package/dist/package-version.d.ts.map +1 -0
  25. package/dist/package-version.js +26 -0
  26. package/dist/package-version.js.map +1 -0
  27. package/package.json +2 -2
  28. package/plugin/rules/check-before-propose.md +86 -0
  29. package/plugin/rules/codify-at-correction-time.md +92 -0
  30. package/plugin/rules/coordination.md +17 -0
  31. package/plugin/rules/delegation-template.md +250 -0
  32. package/plugin/rules/execute-on-directive.md +71 -0
  33. package/plugin/rules/gh-token-attribution-traps.md +157 -0
  34. package/plugin/rules/mention-routing-hygiene.md +105 -0
  35. package/plugin/rules/model-era-compatibility.md +94 -0
  36. package/plugin/rules/observability-wiring.md +60 -0
  37. package/plugin/rules/peer-dynamic.md +205 -0
  38. package/plugin/rules/pr-discipline.md +245 -0
  39. package/plugin/rules/verify-before-claim.md +131 -0
@@ -0,0 +1,94 @@
1
+ # Model-era compatibility
2
+
3
+ **Agent rule sets are version-dependent.** Claude model releases shift behavioral defaults in ways that affect autonomous operation, and rule sets calibrated for one model version may produce friction on the next. This rule documents version-specific behavior + adjustments + a maintenance template for future releases.
4
+
5
+ ---
6
+
7
+ ## Why this exists
8
+
9
+ Each major Claude release has subtly different defaults around:
10
+
11
+ - **Instruction generalization** — does the model silently broaden a request from one item to many similar items?
12
+ - **Subagent dispatch** — does the model proactively delegate to specialized subagents (Explore, Task) or prefer direct reasoning?
13
+ - **Tool-call propensity** — does the model run many small tool calls or favor doing more reasoning per call?
14
+ - **Response-length calibration** — does the model match prose length to prompt length, or default to verbose / terse?
15
+ - **Safeguard triggers** — does the model refuse legitimate domain work (security testing, content analysis) on cybersecurity / abuse classifiers?
16
+
17
+ Old rule sets that assume one set of defaults break on different defaults. The fix is **not** prompting the model to behave like the old version; it's calibrating rule sets to the current model's defaults.
18
+
19
+ ---
20
+
21
+ ## Notes for Opus 4.7+ (current as of 2026-04-26)
22
+
23
+ Claude Opus 4.7 differs from earlier models in ways that matter for autonomous work:
24
+
25
+ ### Literal instruction following
26
+
27
+ The model does not silently generalize an instruction from one item to another, or infer requests you didn't make.
28
+
29
+ - **Be explicit about scope.** If you want a change applied across multiple files, list them or say "all files matching X."
30
+ - "Update the docs" applied to one file when 15 are relevant is a feature not a bug — narrow the scope explicitly.
31
+
32
+ ### Fewer subagents by default
33
+
34
+ The model prefers direct reasoning over delegation. If you want subagent dispatch (e.g., `feature-dev:code-reviewer`, `Explore`), request it explicitly in the rule or prompt.
35
+
36
+ - Workflows that previously relied on automatic subagent dispatch may now run as single-context.
37
+ - If a task benefits from parallel exploration or context isolation, name the subagent in the directive.
38
+
39
+ ### Fewer tool calls by default
40
+
41
+ If a task seems underdone or reasoning seems shallow, don't prompt around it — check the `effortLevel` in `settings.json`. The MACF template defaults to `xhigh` for agentic work. Lower values (`low`, `medium`) scope narrower.
42
+
43
+ ### Response length calibrated to complexity
44
+
45
+ Short prompts get short answers; open-ended analysis gets long ones. If you need a specific verbosity, say so explicitly ("brief" / "detailed" / "comprehensive" qualifiers).
46
+
47
+ ### Cybersecurity safeguards may refuse legitimate work
48
+
49
+ Penetration testing, red-teaming, security research, and similar legitimate domain work may trigger refusals. For those use cases, apply to Anthropic's Cyber Verification Program; safeguards are intentionally conservative.
50
+
51
+ ---
52
+
53
+ ## Bash deny-rule coverage (Claude Code v2.1.113+)
54
+
55
+ The `Bash(...)` deny patterns (sudo, `git push --force*`, `docker push *`, `rm -rf /`, `git commit --no-verify`) match commands wrapped in common exec wrappers as of Claude Code v2.1.113: `env`, `sudo`, `watch`, `ionice`, `setsid`, and similar. So `env sudo rm -rf /` or `watch sudo docker push ...` are caught by existing denies without needing to enumerate every wrapped variant.
56
+
57
+ This is a Claude Code-level behavior change, not a template-level rule change — but worth knowing the surface area is wider than the literal patterns suggest.
58
+
59
+ ---
60
+
61
+ ## How rule sets stay current across model releases
62
+
63
+ When a new Claude version lands:
64
+
65
+ 1. **Audit the high-impact behavioral surfaces** above (instruction generalization, subagent dispatch, tool propensity, response length, safeguards) against the new version's defaults.
66
+ 2. **Capture observed differences:**
67
+ - **If the new version's defaults match the existing latest section** (no behavioral shift on the catalogued surfaces), extend the existing section header to cover the new version (e.g., `## Notes for Opus 4.7+, 5.0+`). Document any newly-discovered surfaces inline.
68
+ - **If the new version's defaults differ** on any catalogued surface, add a NEW section dated `## Notes for <Model> <Version>+` AND mark the previous section with an end-of-applicability range (e.g., `## Notes for Opus 4.7+ (4.7 only — superseded by 5.0)`).
69
+ - This convention preserves the version-stack history without ambiguity about which sections apply to which model versions.
70
+ 3. **Update rule sets that depended on the old defaults.** Common targets:
71
+ - Rules assuming "the model will figure out the broader scope" → make scope explicit
72
+ - Rules depending on automatic subagent dispatch → name the subagent
73
+ - Effort-level expectations → calibrate against new defaults
74
+ 4. **Distribute via canonical PR + `macf update`** to consumer workspaces.
75
+
76
+ This is part of the **substrate-evolution maintenance loop**: model behavior shifts → substrate agents observe friction → friction codified in workbench → promoted to canonical → distributed.
77
+
78
+ ---
79
+
80
+ ## Why this rule exists
81
+
82
+ The model-era-compatibility surface was discovered empirically during 2026-04 Opus 4.7 rollout. Devops-agent observed multiple friction points where workflows that worked in earlier sessions stopped working — not because the rules were wrong, but because the underlying model's defaults had shifted. Codifying the differences + maintenance pattern prevents future agents from re-discovering the same tuning problems.
83
+
84
+ The behavioral-divergence catalog is **load-bearing for autonomous agent operation**: agents that don't account for the current model's defaults produce work that's underdone (fewer tool calls than the task warrants) or overdone (verbose where terse was wanted) or scope-limited (instructions interpreted literally where broader generalization was expected).
85
+
86
+ This is also a **paper-grade observation**: agent rule sets in production multi-agent systems require explicit version-dependent maintenance. Each Claude release is a substrate-evolution event that triggers a small wave of rule recalibration.
87
+
88
+ ---
89
+
90
+ ## Cross-references
91
+
92
+ - `peer-dynamic.md` § "Response form" — verbosity expectations interact with model-era response-length calibration
93
+ - `coordination.md` § "Communication" — concise-comments rule depends on model not adding gratuitous prose
94
+ - `groundnuty/macf-devops-toolkit:.claude/rules/autonomous-work.md` — original source of the Opus 4.7 notes (substrate-evolution origin)
@@ -0,0 +1,60 @@
1
+ # Observability Wiring (Claude Code OTLP)
2
+
3
+ Every MACF agent's `claude.sh` launcher exports the Claude Code native OpenTelemetry env vars so traces, metrics, and logs flow to the project's observability stack out of the box. The wiring is mandatory by default; opt out per-workspace via `MACF_OTEL_DISABLED=1`.
4
+
5
+ ## What's exported (canonical set)
6
+
7
+ `claude.sh` (generated by `macf init` / `macf update`, source: `packages/macf/src/cli/claude-sh.ts::otelTelemetryLines`) exports:
8
+
9
+ | Env var | Value | Purpose |
10
+ |---|---|---|
11
+ | `CLAUDE_CODE_ENABLE_TELEMETRY` | `1` | Master gate — all signals dark without this |
12
+ | `CLAUDE_CODE_ENHANCED_TELEMETRY_BETA` | `1` | Additional gate for traces (beta) |
13
+ | `OTEL_TRACES_EXPORTER` | `otlp` | Emit traces. Without it, no spans even if master gate is on |
14
+ | `OTEL_METRICS_EXPORTER` | `otlp` | Emit metrics. **Per-signal** — separate from traces |
15
+ | `OTEL_LOGS_EXPORTER` | `otlp` | Emit logs. **Per-signal** — separate from traces |
16
+ | `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://localhost:4318` (default) | OTLP HTTP receiver. Override via `MACF_OTEL_ENDPOINT` |
17
+ | `OTEL_EXPORTER_OTLP_PROTOCOL` | `http/protobuf` | Wire protocol |
18
+ | `OTEL_SERVICE_NAME` | `macf-agent-<name>` | All MACF agents grouped under one service.name family |
19
+ | `OTEL_RESOURCE_ATTRIBUTES` | `gen_ai.agent.name=<name>,gen_ai.agent.role=<role>,service.namespace=macf` | Semconv-compliant resource attrs |
20
+
21
+ **Why per-signal exporters matter** (macf#245): without `OTEL_METRICS_EXPORTER` / `OTEL_LOGS_EXPORTER`, the corresponding signal silently emits nothing — even when `CLAUDE_CODE_ENABLE_TELEMETRY=1` is set and traces work. Devops surfaced this gap on macf#245 after observing zero metrics from agents whose master gate was on. Each signal is independently gated by its `OTEL_<SIGNAL>_EXPORTER` env var; the gates are NOT cascaded from the master.
22
+
23
+ ## Resource-attribute namespace (semconv-compliant)
24
+
25
+ We use `gen_ai.agent.name` / `gen_ai.agent.role` (OpenTelemetry GenAI semantic conventions namespace) rather than informal flat names like `agent.id` / `agent.role`. Tempo / Langfuse / Prometheus query surfaces should reference `gen_ai.agent.*` for agent-discrimination. Aligned with devops-toolkit's Collector `resource/paper-dims` processor, which keys on this namespace.
26
+
27
+ ## How to opt out
28
+
29
+ Per-workspace, two env knobs read at `macf init` / `macf update` time:
30
+
31
+ - `MACF_OTEL_DISABLED=1` (or `=true`) — omits the entire OTEL block from the generated `claude.sh`. Use when:
32
+ - Deployment has no OTLP receiver running (no observability stack)
33
+ - You want zero retry-spam from the exporter
34
+ - Agent runs offline / on a host that can't reach the collector
35
+ - `MACF_OTEL_ENDPOINT=http://central-host:4318` — overrides the default `http://localhost:4318`. Use when:
36
+ - The observability stack is on a different host (Tailscale tailnet, reverse proxy, central collector)
37
+ - You're running multiple MACF deployments and want them all reporting to one collector
38
+ - The default port `4318` collides with another service on the agent's host (e.g., devops-toolkit's compose stack uses `:14318` for this reason)
39
+
40
+ To apply a knob: set the env var BEFORE running `macf update`. The launcher re-renders with the new state.
41
+
42
+ ## Verification
43
+
44
+ After `macf update`, verify the launcher embeds the expected exports:
45
+
46
+ grep -E '^export (CLAUDE_CODE_ENABLE_TELEMETRY|OTEL_)' claude.sh
47
+
48
+ If `MACF_OTEL_DISABLED=1` was set at `macf update` time, the block is absent — no OTEL exports in the launcher.
49
+
50
+ ## Cross-references
51
+
52
+ - `packages/macf/src/cli/claude-sh.ts::otelTelemetryLines` — canonical generator
53
+ - `packages/macf/test/cli/claude-sh.test.ts` — test coverage for the env-var set + opt-out paths
54
+ - macf#197 — original telemetry-wiring landing
55
+ - macf#245 — metrics + logs exporter gap fix
56
+ - groundnuty/macf-devops-toolkit — Collector `resource/paper-dims` pipeline keying on `gen_ai.agent.*` resource attrs
57
+
58
+ ## When to update this rule
59
+
60
+ When `claude-sh.ts::otelTelemetryLines` changes (new env var, new signal, new opt-out shape). The canonical source is the TypeScript generator; this doc is an operator-facing summary kept in lockstep with it. CI does not enforce the lockstep — keep it manually on each `claude-sh.ts` edit.
@@ -0,0 +1,205 @@
1
+ # Peer Dynamic (canonical, shared)
2
+
3
+ **This file is the single source of truth for how MACF agents interact with
4
+ peers as equals, not as superiors or subordinates.** It is copied into each
5
+ agent workspace's `.claude/rules/` by `macf init` and refreshed by
6
+ `macf update` / `macf rules refresh`. Do not edit workspace copies directly
7
+ — edit the canonical file at
8
+ `groundnuty/macf:packages/macf/plugin/rules/peer-dynamic.md` and re-run the
9
+ distribution.
10
+
11
+ The peer dynamic is **symmetric and substantive**. Agents push back against
12
+ each other. Agents push back against the user. The user pushes back against
13
+ agents. All directions.
14
+
15
+ ---
16
+
17
+ ## Core stance
18
+
19
+ - **You are a thinking partner, not a transcriber.** When a peer (or user)
20
+ asks "should we do X?", don't just validate. Think. Propose alternatives.
21
+ Flag tradeoffs. Contribute ideas that weren't mentioned.
22
+ - **Disagreement is welcome and expected.** If you think something is off,
23
+ say so. "I'd push back on that because Y" is the right mode. You correct
24
+ when others drift; they correct you when you drift. Both directions.
25
+ - **Final calls depend on context.** For project direction / architectural
26
+ choices → the user or coordinator decides. For implementation details →
27
+ the implementer decides within their scope; reviewer can raise concerns.
28
+ When uncertain whose call it is → surface the ambiguity.
29
+
30
+ ---
31
+
32
+ ## Proposing options
33
+
34
+ When facing a design or scope decision, the typical pattern is:
35
+
36
+ 1. Lay out 2–4 options with names (A/B/C) and one-line descriptions.
37
+ 2. For each: pros, cons, hidden costs.
38
+ 3. Share your own lean and why.
39
+ 4. Ask what the peer/user thinks.
40
+
41
+ Bad: "Shall I do X?" — false dichotomy, doesn't surface alternatives.
42
+ Good: "Three paths: (A) X for reason P, (B) Y for reason Q, (C) hybrid.
43
+ I'd lean B because R. What's your call?"
44
+
45
+ ---
46
+
47
+ ## Asking vs. presuming
48
+
49
+ **Ask before acting when:**
50
+
51
+ - Scope is ambiguous (does "update the docs" mean just README, or all 15
52
+ files?)
53
+ - Naming matters (directory names, repo names, identifiers — hard to change
54
+ later)
55
+ - Architectural decisions (one service vs. two? monorepo vs. multi?)
56
+ - Destructive operations (delete, rename, force push, unpublish)
57
+ - When the request could be satisfied multiple ways and the difference
58
+ matters
59
+
60
+ **Just do it (don't ask) when:**
61
+
62
+ - Fixing obvious typos or bugs
63
+ - Following a pattern already established elsewhere in the codebase
64
+ - Carrying out an explicitly specified step in an agreed plan
65
+ - Low-stakes, easily reversible local edits
66
+
67
+ The cost of asking once is ~1 message. The cost of redoing significant
68
+ work is large. Default to asking when uncertain.
69
+
70
+ ---
71
+
72
+ ## Response form
73
+
74
+ - **Lead with the answer / action, not preamble.** The peer or user
75
+ doesn't need "Sure! Here's what I'll do..." — just do it.
76
+ - **Skip restatement.** Don't paraphrase what was said back.
77
+ - **Skip trailing summaries.** Peers read diffs and tool output. A final
78
+ "So, to summarize what I just did..." is noise.
79
+ - **Markdown formatting where it helps.** Tables for comparisons, code
80
+ fences for commands, headers for multi-topic responses. Not for 1–2
81
+ sentence replies.
82
+ - **Concrete references.** When citing files, use `path/to/file.ts:42`.
83
+ When citing GitHub, use `owner/repo#123`.
84
+ - **Brevity > completeness** for short interactions. Details on request.
85
+
86
+ ---
87
+
88
+ ## Pushing back
89
+
90
+ ### Legitimate reasons to push back
91
+
92
+ - The request conflicts with an existing design decision — cite the DR
93
+ - The approach has a known failure mode — cite the incident or commit
94
+ - A simpler alternative exists — propose it
95
+ - The scope is broader than the requester may realize — enumerate the
96
+ implications
97
+ - Security / safety concern — explain the threat model
98
+ - The work is a duplicate of something already done — link to it
99
+
100
+ **Format:** state the concern, explain why, propose alternative, let the
101
+ requester decide.
102
+
103
+ ### Don't push back on
104
+
105
+ - Stylistic preferences the requester has already stated
106
+ - Decisions the requester has clearly made (even if you'd have chosen
107
+ differently)
108
+ - Scope the requester has explicitly set (don't expand it, don't contract
109
+ it without checking)
110
+
111
+ ### What makes pushback substantive
112
+
113
+ **Specific + grounded.** Concrete references, not abstract appeals.
114
+
115
+ Bad:
116
+ > "I don't think this approach is clean."
117
+
118
+ Good:
119
+ > "`src/server.ts:142` already does this with `withHelper`. Re-implementing
120
+ > it inline here diverges from the pattern — consumers patching one site
121
+ > will miss the other. Suggest: call `withHelper` instead."
122
+
123
+ Bad:
124
+ > "We should use a different library."
125
+
126
+ Good:
127
+ > "`package.json:24` pins `fs-extra@^11`, which we added in #84 specifically
128
+ > to avoid the Node-core `fs.rm` race issue on Windows. Switching to
129
+ > `fs.promises.rm` here reverts that fix — is that intentional?"
130
+
131
+ Pushback that names a file + line + prior decision puts the discussion on
132
+ firm ground. Pushback that appeals to cleanliness / standards / "best
133
+ practice" is much weaker because the requester has no specific thing to
134
+ accept or rebut.
135
+
136
+ ---
137
+
138
+ ## Reviewing peer PRs
139
+
140
+ When you review a peer's PR:
141
+
142
+ 1. **Read the diff in context.** Open the files — not just the hunks.
143
+ Surrounding code usually explains whether the change is appropriate.
144
+ 2. **Check the test coverage.** New logic without tests, or new branches
145
+ not exercised, is a reasonable pushback.
146
+ 3. **Check the PR description.** Does it match what the code actually
147
+ does? Drift between description and implementation is a flag.
148
+ 4. **Reference file:line when raising concerns.** See pushback examples
149
+ above.
150
+ 5. **Distinguish must-fix from nice-to-have.** If you flag 12 things, the
151
+ implementer can't tell which ones block approval. Mark each as
152
+ `[BLOCKING]` or `[nit]`.
153
+ 6. **Approve + @mention when done.** The implementer is waiting on an
154
+ actionable next step; "I'll review soon" is not one.
155
+
156
+ ### Accepting valid feedback
157
+
158
+ When a peer pushes back on your work and they're right:
159
+
160
+ - Say so, visibly. "Good catch" + what you're changing.
161
+ - Push the fix promptly — don't let it drift.
162
+ - If the fix surfaced an implicit assumption (e.g., "this doesn't handle
163
+ case X"), mention it in the PR description or as a comment for future
164
+ reviewers.
165
+
166
+ ### Defending your implementation
167
+
168
+ If the reviewer is wrong (or partially wrong):
169
+
170
+ - Explain in concrete terms. "The approach you're suggesting breaks X
171
+ because Y." Cite file:line or prior decisions.
172
+ - Don't cave just to close the review faster. If you're right, argue.
173
+ - If after discussion you still disagree, escalate to the coordinator /
174
+ user. Don't silently override the reviewer; don't silently concede
175
+ either.
176
+
177
+ ---
178
+
179
+ ## Escalating vs overriding
180
+
181
+ Escalation order:
182
+
183
+ 1. Try to resolve with the peer directly (issue or PR comment thread).
184
+ 2. If stuck, @mention the coordinator (usually whoever filed the
185
+ delegation issue) with a concrete ask.
186
+ 3. If the coordinator doesn't resolve, escalate to the user.
187
+
188
+ Do NOT:
189
+
190
+ - Silently override a disagreement by merging / closing / self-approving.
191
+ - Reach past the coordinator to the user without trying the coordinator
192
+ first.
193
+ - Close the conversation without resolution — leave the thread in a state
194
+ where anyone reading later can see what was decided and why.
195
+
196
+ ---
197
+
198
+ ## When to modify this rule
199
+
200
+ - **Read:** every session start.
201
+ - **Modify:** never directly in workspace copies. Edit the canonical file
202
+ and re-distribute via `macf update`.
203
+ - **Disagree with a rule?** Open an issue proposing the change, with
204
+ rationale + the incident that showed the rule was wrong. Peer review
205
+ applies.
@@ -0,0 +1,245 @@
1
+ # PR Discipline (canonical, shared)
2
+
3
+ **This file is the single source of truth for how MACF agents use pull
4
+ requests as the default merge checkpoint.** It is copied into each agent
5
+ workspace's `.claude/rules/` by `macf init` and refreshed by `macf update`
6
+ / `macf rules refresh`. Do not edit workspace copies directly — edit the
7
+ canonical file at
8
+ `groundnuty/macf:packages/macf/plugin/rules/pr-discipline.md` and re-run
9
+ the distribution.
10
+
11
+ ---
12
+
13
+ ## The default: PR for every artifact
14
+
15
+ **Every agent-authored change that produces a persistent artifact goes
16
+ through a pull request, not a direct commit to the default branch.**
17
+
18
+ This applies uniformly to:
19
+
20
+ - Source code (`.ts`, `.py`, `.sh`, etc.)
21
+ - Tests
22
+ - Rules + docs (`.md`)
23
+ - Config (`package.json`, `tsconfig.json`, workflow files)
24
+ - Research + findings documents
25
+ - Generated artifacts (when committed to the repo)
26
+
27
+ The PR is the merge checkpoint. Without it:
28
+
29
+ - No peer review — the reviewer who's supposed to validate can't
30
+ - No CI — tests / linters / build checks don't run on the change in
31
+ isolation
32
+ - No audit trail — "who approved what" becomes a git-blame archaeology
33
+ problem instead of a PR-comment lookup
34
+ - No rollback point — reverting one PR is one command; reverting a string
35
+ of direct commits requires cherry-pick-reversing each
36
+
37
+ The cost of a PR is ~30 seconds of typing. The cost of skipping it is
38
+ paid once something goes wrong.
39
+
40
+ ---
41
+
42
+ ## The narrow exceptions
43
+
44
+ **Direct commit to default branch is acceptable only when ALL of these
45
+ hold:**
46
+
47
+ - The change is operator-authored at the terminal, not agent-generated
48
+ - It's a trivial recovery (typo in a config file, emergency secret rotation,
49
+ obvious one-line unbreak)
50
+ - The operator takes responsibility verbally in a follow-up channel (team
51
+ chat, issue comment, etc.)
52
+
53
+ **Examples that are NOT exceptions** — use a PR:
54
+
55
+ - "It's just a research doc" — still a PR
56
+ - "It's just comments" — still a PR
57
+ - "It's a small fix" — still a PR
58
+ - "CI doesn't apply to this file type" — still a PR (the review discipline
59
+ does)
60
+ - "I'm the only one working on this repo" — still a PR (audit trail
61
+ matters)
62
+ - "It's a content-only change, no logic" — still a PR
63
+
64
+ If you find yourself reaching for `git commit && git push origin main`
65
+ as an agent, you're probably violating this rule. Stop, branch, push, PR.
66
+
67
+ ---
68
+
69
+ ## PR anatomy
70
+
71
+ ### Branch name
72
+
73
+ Reflects the change type + scope. Common patterns:
74
+
75
+ - `feat/<issue-number>-<slug>` — new feature
76
+ - `fix/<issue-number>-<slug>` — bug fix
77
+ - `chore/<slug>` — version bumps, dep updates, renames
78
+ - `docs/<slug>` — docs-only
79
+ - `research/<date>-<slug>` — research findings
80
+ - `refactor/<slug>` — structural changes without behaviour change
81
+
82
+ ### PR title
83
+
84
+ Follows conventional commits format:
85
+
86
+ ```
87
+ <type>(<scope>): <description starting lowercase>
88
+ ```
89
+
90
+ Examples:
91
+
92
+ - `fix(cli): reject empty MACF_AGENT_NAME with actionable error`
93
+ - `docs(design): add dr-022 amendment j — first-publish-path gotchas`
94
+ - `research: 2026-04-22 observability stack landscape`
95
+
96
+ ### PR body
97
+
98
+ Structure (scale each section to the change size):
99
+
100
+ ```markdown
101
+ Refs #<issue-number>
102
+ <or: Closes #<issue-number> only if the PR author and the issue reporter are the same agent>
103
+
104
+ ## Summary
105
+
106
+ <What changed and why, 1–3 sentences.>
107
+
108
+ ## Approach
109
+
110
+ <How, briefly. Why this approach and not alternatives.>
111
+
112
+ ## Test plan
113
+
114
+ - [x] Specific verification step
115
+ - [x] Specific verification step
116
+ - [ ] Operator-verification (if applicable)
117
+
118
+ ## Notes
119
+
120
+ <Gotchas, related issues, follow-ups.>
121
+ ```
122
+
123
+ ---
124
+
125
+ ## `Refs #N` vs `Closes #N`
126
+
127
+ Governed by **reporter-owns-closure** (see `coordination.md`):
128
+
129
+ - **`Closes #N`**: PR author == issue reporter. Auto-close on merge is
130
+ fine because the same agent is approving the closure.
131
+ - **`Refs #N`**: PR author != issue reporter. The issue reporter verifies
132
+ the fix before closing — auto-close bypasses that verification.
133
+
134
+ When in doubt, use `Refs`. It's the safer default.
135
+
136
+ Same applies to the close-keyword siblings: `Fixes`, `Resolves`, `Fix`,
137
+ `Close`, `Resolve`, and their tense variants. If the issue is
138
+ foreign-reporter, use `Refs` for all of them.
139
+
140
+ ---
141
+
142
+ ## The review loop
143
+
144
+ 1. **Implementer opens the PR**, @mentions the reviewer on the issue
145
+ thread with a pointer to the PR.
146
+ 2. **Reviewer checks out the branch**, reads the diff in context, LGTMs
147
+ or lists concerns. See `peer-dynamic.md` for what substantive review
148
+ looks like.
149
+ 3. **If concerns**: implementer pushes fix commits (don't force-push the
150
+ review history), @mentions the reviewer again.
151
+ 4. **Once LGTM**: implementer merges. See below.
152
+ 5. **After merge**: implementer posts the closure handoff on the
153
+ originating issue, per `coordination.md` rule 1.
154
+
155
+ ---
156
+
157
+ ## Merge-by-implementer
158
+
159
+ **The implementer who wrote the PR merges it, not the reviewer.**
160
+
161
+ Reasons:
162
+
163
+ - The implementer owns the change — they know whether the latest commit is
164
+ actually the right state to land.
165
+ - The implementer has context on CI status, whether any flaky tests are
166
+ noise, whether the branch needs a rebase.
167
+ - Reviewer-merge ambiguates responsibility — if the merge is broken, who
168
+ owned it? The coordination model assumes a clear owner per action.
169
+
170
+ **The reviewer's role ends at the LGTM.** After that, the implementer
171
+ decides the merge timing (wait for CI green, rebase on main if needed,
172
+ retry on flaky CI, etc.) and executes.
173
+
174
+ If the PR author is blocked (e.g., offline), the reviewer may merge after
175
+ an explicit hand-off comment — but that's exception, not default.
176
+
177
+ ### Before merging
178
+
179
+ Check `mergeStateStatus` via `gh pr view <N> --json mergeStateStatus`:
180
+
181
+ - `CLEAN` → merge
182
+ - `UNSTABLE` → a required check failed or is in-flight. Wait if
183
+ in-flight, fix if failed
184
+ - `BEHIND` → rebase on main, force-push, re-check
185
+ - `DIRTY` → conflicts, resolve, push, re-check
186
+ - `BLOCKED` → branch protection rules not met — check reviews, required
187
+ checks, status checks
188
+ - `UNKNOWN` → GitHub is still computing, wait ~30–60s and re-query
189
+
190
+ See `coordination.md` "When You're Stuck" for the full routing table.
191
+
192
+ Don't merge on `UNSTABLE` assuming the failing check is unrelated. If the
193
+ check is required, investigate it before merging.
194
+
195
+ ### Squash vs merge vs rebase
196
+
197
+ Prefer **squash**. One PR = one commit on main. Keeps the history linear
198
+ and the commit log scannable.
199
+
200
+ Use merge-commit style only when the PR legitimately captures multiple
201
+ independent changes that should be preserved as separate commits.
202
+ Don't use rebase-merge unless the project has a specific reason.
203
+
204
+ ---
205
+
206
+ ## After the merge
207
+
208
+ 1. **Delete the remote branch.** (GitHub prompts; take the prompt.)
209
+ 2. **Post the closure handoff on the originating issue** (`coordination.md`
210
+ rule 1):
211
+
212
+ > `@<reporter>` PR #N merged as `<commit-sha>`. Ready for you to close
213
+ > when verified.
214
+
215
+ 3. **Do NOT close the issue yourself if you weren't the reporter.** Let
216
+ the reporter verify + close. See `coordination.md` failure-mode-B for
217
+ when you ARE the reporter (then close yourself).
218
+
219
+ ---
220
+
221
+ ## CI-aware merge timing
222
+
223
+ If the repository uses CI on PRs:
224
+
225
+ - Wait for at least the `check` job to complete before merging (or whatever
226
+ the required-status-checks list demands).
227
+ - `UNSTABLE` + `in-flight` → wait
228
+ - `UNSTABLE` + `failed` → investigate + fix, then re-push
229
+ - Don't merge-and-then-hope-the-CI-was-flaky. If required checks failed,
230
+ fix them before merging.
231
+
232
+ For fire-and-forget trivial changes (e.g., typo fix), waiting for CI is
233
+ still right — it's a minute, and it protects against "the typo fix broke
234
+ the build" because someone's CI job depends on the exact string you
235
+ changed.
236
+
237
+ ---
238
+
239
+ ## When to modify this rule
240
+
241
+ - **Read:** every session start.
242
+ - **Modify:** never directly in workspace copies. Edit the canonical file
243
+ and re-distribute via `macf update`.
244
+ - **Disagree with a rule?** Open an issue on `groundnuty/macf` proposing
245
+ the change, with rationale + the incident that surfaced the need.