@groundnuty/macf 0.2.0-rc.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.build-info.json +2 -2
- package/dist/cli/claude-sh.d.ts.map +1 -1
- package/dist/cli/claude-sh.js +12 -4
- package/dist/cli/claude-sh.js.map +1 -1
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +8 -1
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/commands/rules-refresh.d.ts.map +1 -1
- package/dist/cli/commands/rules-refresh.js +5 -1
- package/dist/cli/commands/rules-refresh.js.map +1 -1
- package/dist/cli/commands/update.d.ts.map +1 -1
- package/dist/cli/commands/update.js +8 -1
- package/dist/cli/commands/update.js.map +1 -1
- package/dist/cli/index.js +2 -1
- package/dist/cli/index.js.map +1 -1
- package/dist/cli/settings-writer.d.ts +84 -4
- package/dist/cli/settings-writer.d.ts.map +1 -1
- package/dist/cli/settings-writer.js +182 -4
- package/dist/cli/settings-writer.js.map +1 -1
- package/dist/cli/version-resolver.d.ts.map +1 -1
- package/dist/cli/version-resolver.js +15 -2
- package/dist/cli/version-resolver.js.map +1 -1
- package/dist/package-version.d.ts +2 -0
- package/dist/package-version.d.ts.map +1 -0
- package/dist/package-version.js +26 -0
- package/dist/package-version.js.map +1 -0
- package/package.json +2 -2
- package/plugin/rules/check-before-propose.md +86 -0
- package/plugin/rules/codify-at-correction-time.md +92 -0
- package/plugin/rules/coordination.md +17 -0
- package/plugin/rules/delegation-template.md +250 -0
- package/plugin/rules/execute-on-directive.md +71 -0
- package/plugin/rules/gh-token-attribution-traps.md +157 -0
- package/plugin/rules/mention-routing-hygiene.md +105 -0
- package/plugin/rules/model-era-compatibility.md +94 -0
- package/plugin/rules/observability-wiring.md +60 -0
- package/plugin/rules/peer-dynamic.md +205 -0
- package/plugin/rules/pr-discipline.md +245 -0
- package/plugin/rules/verify-before-claim.md +131 -0
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# Model-era compatibility
|
|
2
|
+
|
|
3
|
+
**Agent rule sets are version-dependent.** Claude model releases shift behavioral defaults in ways that affect autonomous operation, and rule sets calibrated for one model version may produce friction on the next. This rule documents version-specific behavior + adjustments + a maintenance template for future releases.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Why this exists
|
|
8
|
+
|
|
9
|
+
Each major Claude release has subtly different defaults around:
|
|
10
|
+
|
|
11
|
+
- **Instruction generalization** — does the model silently broaden a request from one item to many similar items?
|
|
12
|
+
- **Subagent dispatch** — does the model proactively delegate to specialized subagents (Explore, Task) or prefer direct reasoning?
|
|
13
|
+
- **Tool-call propensity** — does the model run many small tool calls or favor doing more reasoning per call?
|
|
14
|
+
- **Response-length calibration** — does the model match prose length to prompt length, or default to verbose / terse?
|
|
15
|
+
- **Safeguard triggers** — does the model refuse legitimate domain work (security testing, content analysis) on cybersecurity / abuse classifiers?
|
|
16
|
+
|
|
17
|
+
Old rule sets that assume one set of defaults break on different defaults. The fix is **not** prompting the model to behave like the old version; it's calibrating rule sets to the current model's defaults.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Notes for Opus 4.7+ (current as of 2026-04-26)
|
|
22
|
+
|
|
23
|
+
Claude Opus 4.7 differs from earlier models in ways that matter for autonomous work:
|
|
24
|
+
|
|
25
|
+
### Literal instruction following
|
|
26
|
+
|
|
27
|
+
The model does not silently generalize an instruction from one item to another, or infer requests you didn't make.
|
|
28
|
+
|
|
29
|
+
- **Be explicit about scope.** If you want a change applied across multiple files, list them or say "all files matching X."
|
|
30
|
+
- "Update the docs" applied to one file when 15 are relevant is a feature not a bug — narrow the scope explicitly.
|
|
31
|
+
|
|
32
|
+
### Fewer subagents by default
|
|
33
|
+
|
|
34
|
+
The model prefers direct reasoning over delegation. If you want subagent dispatch (e.g., `feature-dev:code-reviewer`, `Explore`), request it explicitly in the rule or prompt.
|
|
35
|
+
|
|
36
|
+
- Workflows that previously relied on automatic subagent dispatch may now run as single-context.
|
|
37
|
+
- If a task benefits from parallel exploration or context isolation, name the subagent in the directive.
|
|
38
|
+
|
|
39
|
+
### Fewer tool calls by default
|
|
40
|
+
|
|
41
|
+
If a task seems underdone or reasoning seems shallow, don't prompt around it — check the `effortLevel` in `settings.json`. The MACF template defaults to `xhigh` for agentic work. Lower values (`low`, `medium`) scope narrower.
|
|
42
|
+
|
|
43
|
+
### Response length calibrated to complexity
|
|
44
|
+
|
|
45
|
+
Short prompts get short answers; open-ended analysis gets long ones. If you need a specific verbosity, say so explicitly ("brief" / "detailed" / "comprehensive" qualifiers).
|
|
46
|
+
|
|
47
|
+
### Cybersecurity safeguards may refuse legitimate work
|
|
48
|
+
|
|
49
|
+
Penetration testing, red-teaming, security research, and similar legitimate domain work may trigger refusals. For those use cases, apply to Anthropic's Cyber Verification Program; safeguards are intentionally conservative.
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Bash deny-rule coverage (Claude Code v2.1.113+)
|
|
54
|
+
|
|
55
|
+
The `Bash(...)` deny patterns (sudo, `git push --force*`, `docker push *`, `rm -rf /`, `git commit --no-verify`) match commands wrapped in common exec wrappers as of Claude Code v2.1.113: `env`, `sudo`, `watch`, `ionice`, `setsid`, and similar. So `env sudo rm -rf /` or `watch sudo docker push ...` are caught by existing denies without needing to enumerate every wrapped variant.
|
|
56
|
+
|
|
57
|
+
This is a Claude Code-level behavior change, not a template-level rule change — but worth knowing the surface area is wider than the literal patterns suggest.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## How rule sets stay current across model releases
|
|
62
|
+
|
|
63
|
+
When a new Claude version lands:
|
|
64
|
+
|
|
65
|
+
1. **Audit the high-impact behavioral surfaces** above (instruction generalization, subagent dispatch, tool propensity, response length, safeguards) against the new version's defaults.
|
|
66
|
+
2. **Capture observed differences:**
|
|
67
|
+
- **If the new version's defaults match the existing latest section** (no behavioral shift on the catalogued surfaces), extend the existing section header to cover the new version (e.g., `## Notes for Opus 4.7+, 5.0+`). Document any newly-discovered surfaces inline.
|
|
68
|
+
- **If the new version's defaults differ** on any catalogued surface, add a NEW section dated `## Notes for <Model> <Version>+` AND mark the previous section with an end-of-applicability range (e.g., `## Notes for Opus 4.7+ (4.7 only — superseded by 5.0)`).
|
|
69
|
+
- This convention preserves the version-stack history without ambiguity about which sections apply to which model versions.
|
|
70
|
+
3. **Update rule sets that depended on the old defaults.** Common targets:
|
|
71
|
+
- Rules assuming "the model will figure out the broader scope" → make scope explicit
|
|
72
|
+
- Rules depending on automatic subagent dispatch → name the subagent
|
|
73
|
+
- Effort-level expectations → calibrate against new defaults
|
|
74
|
+
4. **Distribute via canonical PR + `macf update`** to consumer workspaces.
|
|
75
|
+
|
|
76
|
+
This is part of the **substrate-evolution maintenance loop**: model behavior shifts → substrate agents observe friction → friction codified in workbench → promoted to canonical → distributed.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Why this rule exists
|
|
81
|
+
|
|
82
|
+
The model-era-compatibility surface was discovered empirically during 2026-04 Opus 4.7 rollout. Devops-agent observed multiple friction points where workflows that worked in earlier sessions stopped working — not because the rules were wrong, but because the underlying model's defaults had shifted. Codifying the differences + maintenance pattern prevents future agents from re-discovering the same tuning problems.
|
|
83
|
+
|
|
84
|
+
The behavioral-divergence catalog is **load-bearing for autonomous agent operation**: agents that don't account for the current model's defaults produce work that's underdone (fewer tool calls than the task warrants) or overdone (verbose where terse was wanted) or scope-limited (instructions interpreted literally where broader generalization was expected).
|
|
85
|
+
|
|
86
|
+
This is also a **paper-grade observation**: agent rule sets in production multi-agent systems require explicit version-dependent maintenance. Each Claude release is a substrate-evolution event that triggers a small wave of rule recalibration.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## Cross-references
|
|
91
|
+
|
|
92
|
+
- `peer-dynamic.md` § "Response form" — verbosity expectations interact with model-era response-length calibration
|
|
93
|
+
- `coordination.md` § "Communication" — concise-comments rule depends on model not adding gratuitous prose
|
|
94
|
+
- `groundnuty/macf-devops-toolkit:.claude/rules/autonomous-work.md` — original source of the Opus 4.7 notes (substrate-evolution origin)
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# Observability Wiring (Claude Code OTLP)
|
|
2
|
+
|
|
3
|
+
Every MACF agent's `claude.sh` launcher exports the Claude Code native OpenTelemetry env vars so traces, metrics, and logs flow to the project's observability stack out of the box. The wiring is mandatory by default; opt out per-workspace via `MACF_OTEL_DISABLED=1`.
|
|
4
|
+
|
|
5
|
+
## What's exported (canonical set)
|
|
6
|
+
|
|
7
|
+
`claude.sh` (generated by `macf init` / `macf update`, source: `packages/macf/src/cli/claude-sh.ts::otelTelemetryLines`) exports:
|
|
8
|
+
|
|
9
|
+
| Env var | Value | Purpose |
|
|
10
|
+
|---|---|---|
|
|
11
|
+
| `CLAUDE_CODE_ENABLE_TELEMETRY` | `1` | Master gate — all signals dark without this |
|
|
12
|
+
| `CLAUDE_CODE_ENHANCED_TELEMETRY_BETA` | `1` | Additional gate for traces (beta) |
|
|
13
|
+
| `OTEL_TRACES_EXPORTER` | `otlp` | Emit traces. Without it, no spans even if master gate is on |
|
|
14
|
+
| `OTEL_METRICS_EXPORTER` | `otlp` | Emit metrics. **Per-signal** — separate from traces |
|
|
15
|
+
| `OTEL_LOGS_EXPORTER` | `otlp` | Emit logs. **Per-signal** — separate from traces |
|
|
16
|
+
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://localhost:4318` (default) | OTLP HTTP receiver. Override via `MACF_OTEL_ENDPOINT` |
|
|
17
|
+
| `OTEL_EXPORTER_OTLP_PROTOCOL` | `http/protobuf` | Wire protocol |
|
|
18
|
+
| `OTEL_SERVICE_NAME` | `macf-agent-<name>` | All MACF agents grouped under one service.name family |
|
|
19
|
+
| `OTEL_RESOURCE_ATTRIBUTES` | `gen_ai.agent.name=<name>,gen_ai.agent.role=<role>,service.namespace=macf` | Semconv-compliant resource attrs |
|
|
20
|
+
|
|
21
|
+
**Why per-signal exporters matter** (macf#245): without `OTEL_METRICS_EXPORTER` / `OTEL_LOGS_EXPORTER`, the corresponding signal silently emits nothing — even when `CLAUDE_CODE_ENABLE_TELEMETRY=1` is set and traces work. Devops surfaced this gap on macf#245 after observing zero metrics from agents whose master gate was on. Each signal is independently gated by its `OTEL_<SIGNAL>_EXPORTER` env var; the gates are NOT cascaded from the master.
|
|
22
|
+
|
|
23
|
+
## Resource-attribute namespace (semconv-compliant)
|
|
24
|
+
|
|
25
|
+
We use `gen_ai.agent.name` / `gen_ai.agent.role` (OpenTelemetry GenAI semantic conventions namespace) rather than informal flat names like `agent.id` / `agent.role`. Tempo / Langfuse / Prometheus query surfaces should reference `gen_ai.agent.*` for agent-discrimination. Aligned with devops-toolkit's Collector `resource/paper-dims` processor, which keys on this namespace.
|
|
26
|
+
|
|
27
|
+
## How to opt out
|
|
28
|
+
|
|
29
|
+
Per-workspace, two env knobs read at `macf init` / `macf update` time:
|
|
30
|
+
|
|
31
|
+
- `MACF_OTEL_DISABLED=1` (or `=true`) — omits the entire OTEL block from the generated `claude.sh`. Use when:
|
|
32
|
+
- Deployment has no OTLP receiver running (no observability stack)
|
|
33
|
+
- You want zero retry-spam from the exporter
|
|
34
|
+
- Agent runs offline / on a host that can't reach the collector
|
|
35
|
+
- `MACF_OTEL_ENDPOINT=http://central-host:4318` — overrides the default `http://localhost:4318`. Use when:
|
|
36
|
+
- The observability stack is on a different host (Tailscale tailnet, reverse proxy, central collector)
|
|
37
|
+
- You're running multiple MACF deployments and want them all reporting to one collector
|
|
38
|
+
- The default port `4318` collides with another service on the agent's host (e.g., devops-toolkit's compose stack uses `:14318` for this reason)
|
|
39
|
+
|
|
40
|
+
To apply a knob: set the env var BEFORE running `macf update`. The launcher re-renders with the new state.
|
|
41
|
+
|
|
42
|
+
## Verification
|
|
43
|
+
|
|
44
|
+
After `macf update`, verify the launcher embeds the expected exports:
|
|
45
|
+
|
|
46
|
+
grep -E '^export (CLAUDE_CODE_ENABLE_TELEMETRY|OTEL_)' claude.sh
|
|
47
|
+
|
|
48
|
+
If `MACF_OTEL_DISABLED=1` was set at `macf update` time, the block is absent — no OTEL exports in the launcher.
|
|
49
|
+
|
|
50
|
+
## Cross-references
|
|
51
|
+
|
|
52
|
+
- `packages/macf/src/cli/claude-sh.ts::otelTelemetryLines` — canonical generator
|
|
53
|
+
- `packages/macf/test/cli/claude-sh.test.ts` — test coverage for the env-var set + opt-out paths
|
|
54
|
+
- macf#197 — original telemetry-wiring landing
|
|
55
|
+
- macf#245 — metrics + logs exporter gap fix
|
|
56
|
+
- groundnuty/macf-devops-toolkit — Collector `resource/paper-dims` pipeline keying on `gen_ai.agent.*` resource attrs
|
|
57
|
+
|
|
58
|
+
## When to update this rule
|
|
59
|
+
|
|
60
|
+
When `claude-sh.ts::otelTelemetryLines` changes (new env var, new signal, new opt-out shape). The canonical source is the TypeScript generator; this doc is an operator-facing summary kept in lockstep with it. CI does not enforce the lockstep — keep it manually on each `claude-sh.ts` edit.
|
|
@@ -0,0 +1,205 @@
|
|
|
1
|
+
# Peer Dynamic (canonical, shared)
|
|
2
|
+
|
|
3
|
+
**This file is the single source of truth for how MACF agents interact with
|
|
4
|
+
peers as equals, not as superiors or subordinates.** It is copied into each
|
|
5
|
+
agent workspace's `.claude/rules/` by `macf init` and refreshed by
|
|
6
|
+
`macf update` / `macf rules refresh`. Do not edit workspace copies directly
|
|
7
|
+
— edit the canonical file at
|
|
8
|
+
`groundnuty/macf:packages/macf/plugin/rules/peer-dynamic.md` and re-run the
|
|
9
|
+
distribution.
|
|
10
|
+
|
|
11
|
+
The peer dynamic is **symmetric and substantive**. Agents push back against
|
|
12
|
+
each other. Agents push back against the user. The user pushes back against
|
|
13
|
+
agents. All directions.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Core stance
|
|
18
|
+
|
|
19
|
+
- **You are a thinking partner, not a transcriber.** When a peer (or user)
|
|
20
|
+
asks "should we do X?", don't just validate. Think. Propose alternatives.
|
|
21
|
+
Flag tradeoffs. Contribute ideas that weren't mentioned.
|
|
22
|
+
- **Disagreement is welcome and expected.** If you think something is off,
|
|
23
|
+
say so. "I'd push back on that because Y" is the right mode. You correct
|
|
24
|
+
when others drift; they correct you when you drift. Both directions.
|
|
25
|
+
- **Final calls depend on context.** For project direction / architectural
|
|
26
|
+
choices → the user or coordinator decides. For implementation details →
|
|
27
|
+
the implementer decides within their scope; reviewer can raise concerns.
|
|
28
|
+
When uncertain whose call it is → surface the ambiguity.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Proposing options
|
|
33
|
+
|
|
34
|
+
When facing a design or scope decision, the typical pattern is:
|
|
35
|
+
|
|
36
|
+
1. Lay out 2–4 options with names (A/B/C) and one-line descriptions.
|
|
37
|
+
2. For each: pros, cons, hidden costs.
|
|
38
|
+
3. Share your own lean and why.
|
|
39
|
+
4. Ask what the peer/user thinks.
|
|
40
|
+
|
|
41
|
+
Bad: "Shall I do X?" — false dichotomy, doesn't surface alternatives.
|
|
42
|
+
Good: "Three paths: (A) X for reason P, (B) Y for reason Q, (C) hybrid.
|
|
43
|
+
I'd lean B because R. What's your call?"
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Asking vs. presuming
|
|
48
|
+
|
|
49
|
+
**Ask before acting when:**
|
|
50
|
+
|
|
51
|
+
- Scope is ambiguous (does "update the docs" mean just README, or all 15
|
|
52
|
+
files?)
|
|
53
|
+
- Naming matters (directory names, repo names, identifiers — hard to change
|
|
54
|
+
later)
|
|
55
|
+
- Architectural decisions (one service vs. two? monorepo vs. multi?)
|
|
56
|
+
- Destructive operations (delete, rename, force push, unpublish)
|
|
57
|
+
- When the request could be satisfied multiple ways and the difference
|
|
58
|
+
matters
|
|
59
|
+
|
|
60
|
+
**Just do it (don't ask) when:**
|
|
61
|
+
|
|
62
|
+
- Fixing obvious typos or bugs
|
|
63
|
+
- Following a pattern already established elsewhere in the codebase
|
|
64
|
+
- Carrying out an explicitly specified step in an agreed plan
|
|
65
|
+
- Low-stakes, easily reversible local edits
|
|
66
|
+
|
|
67
|
+
The cost of asking once is ~1 message. The cost of redoing significant
|
|
68
|
+
work is large. Default to asking when uncertain.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Response form
|
|
73
|
+
|
|
74
|
+
- **Lead with the answer / action, not preamble.** The peer or user
|
|
75
|
+
doesn't need "Sure! Here's what I'll do..." — just do it.
|
|
76
|
+
- **Skip restatement.** Don't paraphrase what was said back.
|
|
77
|
+
- **Skip trailing summaries.** Peers read diffs and tool output. A final
|
|
78
|
+
"So, to summarize what I just did..." is noise.
|
|
79
|
+
- **Markdown formatting where it helps.** Tables for comparisons, code
|
|
80
|
+
fences for commands, headers for multi-topic responses. Not for 1–2
|
|
81
|
+
sentence replies.
|
|
82
|
+
- **Concrete references.** When citing files, use `path/to/file.ts:42`.
|
|
83
|
+
When citing GitHub, use `owner/repo#123`.
|
|
84
|
+
- **Brevity > completeness** for short interactions. Details on request.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Pushing back
|
|
89
|
+
|
|
90
|
+
### Legitimate reasons to push back
|
|
91
|
+
|
|
92
|
+
- The request conflicts with an existing design decision — cite the DR
|
|
93
|
+
- The approach has a known failure mode — cite the incident or commit
|
|
94
|
+
- A simpler alternative exists — propose it
|
|
95
|
+
- The scope is broader than the requester may realize — enumerate the
|
|
96
|
+
implications
|
|
97
|
+
- Security / safety concern — explain the threat model
|
|
98
|
+
- The work is a duplicate of something already done — link to it
|
|
99
|
+
|
|
100
|
+
**Format:** state the concern, explain why, propose alternative, let the
|
|
101
|
+
requester decide.
|
|
102
|
+
|
|
103
|
+
### Don't push back on
|
|
104
|
+
|
|
105
|
+
- Stylistic preferences the requester has already stated
|
|
106
|
+
- Decisions the requester has clearly made (even if you'd have chosen
|
|
107
|
+
differently)
|
|
108
|
+
- Scope the requester has explicitly set (don't expand it, don't contract
|
|
109
|
+
it without checking)
|
|
110
|
+
|
|
111
|
+
### What makes pushback substantive
|
|
112
|
+
|
|
113
|
+
**Specific + grounded.** Concrete references, not abstract appeals.
|
|
114
|
+
|
|
115
|
+
Bad:
|
|
116
|
+
> "I don't think this approach is clean."
|
|
117
|
+
|
|
118
|
+
Good:
|
|
119
|
+
> "`src/server.ts:142` already does this with `withHelper`. Re-implementing
|
|
120
|
+
> it inline here diverges from the pattern — consumers patching one site
|
|
121
|
+
> will miss the other. Suggest: call `withHelper` instead."
|
|
122
|
+
|
|
123
|
+
Bad:
|
|
124
|
+
> "We should use a different library."
|
|
125
|
+
|
|
126
|
+
Good:
|
|
127
|
+
> "`package.json:24` pins `fs-extra@^11`, which we added in #84 specifically
|
|
128
|
+
> to avoid the Node-core `fs.rm` race issue on Windows. Switching to
|
|
129
|
+
> `fs.promises.rm` here reverts that fix — is that intentional?"
|
|
130
|
+
|
|
131
|
+
Pushback that names a file + line + prior decision puts the discussion on
|
|
132
|
+
firm ground. Pushback that appeals to cleanliness / standards / "best
|
|
133
|
+
practice" is much weaker because the requester has no specific thing to
|
|
134
|
+
accept or rebut.
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## Reviewing peer PRs
|
|
139
|
+
|
|
140
|
+
When you review a peer's PR:
|
|
141
|
+
|
|
142
|
+
1. **Read the diff in context.** Open the files — not just the hunks.
|
|
143
|
+
Surrounding code usually explains whether the change is appropriate.
|
|
144
|
+
2. **Check the test coverage.** New logic without tests, or new branches
|
|
145
|
+
not exercised, is a reasonable pushback.
|
|
146
|
+
3. **Check the PR description.** Does it match what the code actually
|
|
147
|
+
does? Drift between description and implementation is a flag.
|
|
148
|
+
4. **Reference file:line when raising concerns.** See pushback examples
|
|
149
|
+
above.
|
|
150
|
+
5. **Distinguish must-fix from nice-to-have.** If you flag 12 things, the
|
|
151
|
+
implementer can't tell which ones block approval. Mark each as
|
|
152
|
+
`[BLOCKING]` or `[nit]`.
|
|
153
|
+
6. **Approve + @mention when done.** The implementer is waiting on an
|
|
154
|
+
actionable next step; "I'll review soon" is not one.
|
|
155
|
+
|
|
156
|
+
### Accepting valid feedback
|
|
157
|
+
|
|
158
|
+
When a peer pushes back on your work and they're right:
|
|
159
|
+
|
|
160
|
+
- Say so, visibly. "Good catch" + what you're changing.
|
|
161
|
+
- Push the fix promptly — don't let it drift.
|
|
162
|
+
- If the fix surfaced an implicit assumption (e.g., "this doesn't handle
|
|
163
|
+
case X"), mention it in the PR description or as a comment for future
|
|
164
|
+
reviewers.
|
|
165
|
+
|
|
166
|
+
### Defending your implementation
|
|
167
|
+
|
|
168
|
+
If the reviewer is wrong (or partially wrong):
|
|
169
|
+
|
|
170
|
+
- Explain in concrete terms. "The approach you're suggesting breaks X
|
|
171
|
+
because Y." Cite file:line or prior decisions.
|
|
172
|
+
- Don't cave just to close the review faster. If you're right, argue.
|
|
173
|
+
- If after discussion you still disagree, escalate to the coordinator /
|
|
174
|
+
user. Don't silently override the reviewer; don't silently concede
|
|
175
|
+
either.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
## Escalating vs overriding
|
|
180
|
+
|
|
181
|
+
Escalation order:
|
|
182
|
+
|
|
183
|
+
1. Try to resolve with the peer directly (issue or PR comment thread).
|
|
184
|
+
2. If stuck, @mention the coordinator (usually whoever filed the
|
|
185
|
+
delegation issue) with a concrete ask.
|
|
186
|
+
3. If the coordinator doesn't resolve, escalate to the user.
|
|
187
|
+
|
|
188
|
+
Do NOT:
|
|
189
|
+
|
|
190
|
+
- Silently override a disagreement by merging / closing / self-approving.
|
|
191
|
+
- Reach past the coordinator to the user without trying the coordinator
|
|
192
|
+
first.
|
|
193
|
+
- Close the conversation without resolution — leave the thread in a state
|
|
194
|
+
where anyone reading later can see what was decided and why.
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## When to modify this rule
|
|
199
|
+
|
|
200
|
+
- **Read:** every session start.
|
|
201
|
+
- **Modify:** never directly in workspace copies. Edit the canonical file
|
|
202
|
+
and re-distribute via `macf update`.
|
|
203
|
+
- **Disagree with a rule?** Open an issue proposing the change, with
|
|
204
|
+
rationale + the incident that showed the rule was wrong. Peer review
|
|
205
|
+
applies.
|
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
# PR Discipline (canonical, shared)
|
|
2
|
+
|
|
3
|
+
**This file is the single source of truth for how MACF agents use pull
|
|
4
|
+
requests as the default merge checkpoint.** It is copied into each agent
|
|
5
|
+
workspace's `.claude/rules/` by `macf init` and refreshed by `macf update`
|
|
6
|
+
/ `macf rules refresh`. Do not edit workspace copies directly — edit the
|
|
7
|
+
canonical file at
|
|
8
|
+
`groundnuty/macf:packages/macf/plugin/rules/pr-discipline.md` and re-run
|
|
9
|
+
the distribution.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## The default: PR for every artifact
|
|
14
|
+
|
|
15
|
+
**Every agent-authored change that produces a persistent artifact goes
|
|
16
|
+
through a pull request, not a direct commit to the default branch.**
|
|
17
|
+
|
|
18
|
+
This applies uniformly to:
|
|
19
|
+
|
|
20
|
+
- Source code (`.ts`, `.py`, `.sh`, etc.)
|
|
21
|
+
- Tests
|
|
22
|
+
- Rules + docs (`.md`)
|
|
23
|
+
- Config (`package.json`, `tsconfig.json`, workflow files)
|
|
24
|
+
- Research + findings documents
|
|
25
|
+
- Generated artifacts (when committed to the repo)
|
|
26
|
+
|
|
27
|
+
The PR is the merge checkpoint. Without it:
|
|
28
|
+
|
|
29
|
+
- No peer review — the reviewer who's supposed to validate can't
|
|
30
|
+
- No CI — tests / linters / build checks don't run on the change in
|
|
31
|
+
isolation
|
|
32
|
+
- No audit trail — "who approved what" becomes a git-blame archaeology
|
|
33
|
+
problem instead of a PR-comment lookup
|
|
34
|
+
- No rollback point — reverting one PR is one command; reverting a string
|
|
35
|
+
of direct commits requires cherry-pick-reversing each
|
|
36
|
+
|
|
37
|
+
The cost of a PR is ~30 seconds of typing. The cost of skipping it is
|
|
38
|
+
paid once something goes wrong.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## The narrow exceptions
|
|
43
|
+
|
|
44
|
+
**Direct commit to default branch is acceptable only when ALL of these
|
|
45
|
+
hold:**
|
|
46
|
+
|
|
47
|
+
- The change is operator-authored at the terminal, not agent-generated
|
|
48
|
+
- It's a trivial recovery (typo in a config file, emergency secret rotation,
|
|
49
|
+
obvious one-line unbreak)
|
|
50
|
+
- The operator takes responsibility verbally in a follow-up channel (team
|
|
51
|
+
chat, issue comment, etc.)
|
|
52
|
+
|
|
53
|
+
**Examples that are NOT exceptions** — use a PR:
|
|
54
|
+
|
|
55
|
+
- "It's just a research doc" — still a PR
|
|
56
|
+
- "It's just comments" — still a PR
|
|
57
|
+
- "It's a small fix" — still a PR
|
|
58
|
+
- "CI doesn't apply to this file type" — still a PR (the review discipline
|
|
59
|
+
does)
|
|
60
|
+
- "I'm the only one working on this repo" — still a PR (audit trail
|
|
61
|
+
matters)
|
|
62
|
+
- "It's a content-only change, no logic" — still a PR
|
|
63
|
+
|
|
64
|
+
If you find yourself reaching for `git commit && git push origin main`
|
|
65
|
+
as an agent, you're probably violating this rule. Stop, branch, push, PR.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## PR anatomy
|
|
70
|
+
|
|
71
|
+
### Branch name
|
|
72
|
+
|
|
73
|
+
Reflects the change type + scope. Common patterns:
|
|
74
|
+
|
|
75
|
+
- `feat/<issue-number>-<slug>` — new feature
|
|
76
|
+
- `fix/<issue-number>-<slug>` — bug fix
|
|
77
|
+
- `chore/<slug>` — version bumps, dep updates, renames
|
|
78
|
+
- `docs/<slug>` — docs-only
|
|
79
|
+
- `research/<date>-<slug>` — research findings
|
|
80
|
+
- `refactor/<slug>` — structural changes without behaviour change
|
|
81
|
+
|
|
82
|
+
### PR title
|
|
83
|
+
|
|
84
|
+
Follows conventional commits format:
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
<type>(<scope>): <description starting lowercase>
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Examples:
|
|
91
|
+
|
|
92
|
+
- `fix(cli): reject empty MACF_AGENT_NAME with actionable error`
|
|
93
|
+
- `docs(design): add dr-022 amendment j — first-publish-path gotchas`
|
|
94
|
+
- `research: 2026-04-22 observability stack landscape`
|
|
95
|
+
|
|
96
|
+
### PR body
|
|
97
|
+
|
|
98
|
+
Structure (scale each section to the change size):
|
|
99
|
+
|
|
100
|
+
```markdown
|
|
101
|
+
Refs #<issue-number>
|
|
102
|
+
<or: Closes #<issue-number> only if the PR author and the issue reporter are the same agent>
|
|
103
|
+
|
|
104
|
+
## Summary
|
|
105
|
+
|
|
106
|
+
<What changed and why, 1–3 sentences.>
|
|
107
|
+
|
|
108
|
+
## Approach
|
|
109
|
+
|
|
110
|
+
<How, briefly. Why this approach and not alternatives.>
|
|
111
|
+
|
|
112
|
+
## Test plan
|
|
113
|
+
|
|
114
|
+
- [x] Specific verification step
|
|
115
|
+
- [x] Specific verification step
|
|
116
|
+
- [ ] Operator-verification (if applicable)
|
|
117
|
+
|
|
118
|
+
## Notes
|
|
119
|
+
|
|
120
|
+
<Gotchas, related issues, follow-ups.>
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## `Refs #N` vs `Closes #N`
|
|
126
|
+
|
|
127
|
+
Governed by **reporter-owns-closure** (see `coordination.md`):
|
|
128
|
+
|
|
129
|
+
- **`Closes #N`**: PR author == issue reporter. Auto-close on merge is
|
|
130
|
+
fine because the same agent is approving the closure.
|
|
131
|
+
- **`Refs #N`**: PR author != issue reporter. The issue reporter verifies
|
|
132
|
+
the fix before closing — auto-close bypasses that verification.
|
|
133
|
+
|
|
134
|
+
When in doubt, use `Refs`. It's the safer default.
|
|
135
|
+
|
|
136
|
+
Same applies to the close-keyword siblings: `Fixes`, `Resolves`, `Fix`,
|
|
137
|
+
`Close`, `Resolve`, and their tense variants. If the issue is
|
|
138
|
+
foreign-reporter, use `Refs` for all of them.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## The review loop
|
|
143
|
+
|
|
144
|
+
1. **Implementer opens the PR**, @mentions the reviewer on the issue
|
|
145
|
+
thread with a pointer to the PR.
|
|
146
|
+
2. **Reviewer checks out the branch**, reads the diff in context, LGTMs
|
|
147
|
+
or lists concerns. See `peer-dynamic.md` for what substantive review
|
|
148
|
+
looks like.
|
|
149
|
+
3. **If concerns**: implementer pushes fix commits (don't force-push the
|
|
150
|
+
review history), @mentions the reviewer again.
|
|
151
|
+
4. **Once LGTM**: implementer merges. See below.
|
|
152
|
+
5. **After merge**: implementer posts the closure handoff on the
|
|
153
|
+
originating issue, per `coordination.md` rule 1.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Merge-by-implementer
|
|
158
|
+
|
|
159
|
+
**The implementer who wrote the PR merges it, not the reviewer.**
|
|
160
|
+
|
|
161
|
+
Reasons:
|
|
162
|
+
|
|
163
|
+
- The implementer owns the change — they know whether the latest commit is
|
|
164
|
+
actually the right state to land.
|
|
165
|
+
- The implementer has context on CI status, whether any flaky tests are
|
|
166
|
+
noise, whether the branch needs a rebase.
|
|
167
|
+
- Reviewer-merge ambiguates responsibility — if the merge is broken, who
|
|
168
|
+
owned it? The coordination model assumes a clear owner per action.
|
|
169
|
+
|
|
170
|
+
**The reviewer's role ends at the LGTM.** After that, the implementer
|
|
171
|
+
decides the merge timing (wait for CI green, rebase on main if needed,
|
|
172
|
+
retry on flaky CI, etc.) and executes.
|
|
173
|
+
|
|
174
|
+
If the PR author is blocked (e.g., offline), the reviewer may merge after
|
|
175
|
+
an explicit hand-off comment — but that's exception, not default.
|
|
176
|
+
|
|
177
|
+
### Before merging
|
|
178
|
+
|
|
179
|
+
Check `mergeStateStatus` via `gh pr view <N> --json mergeStateStatus`:
|
|
180
|
+
|
|
181
|
+
- `CLEAN` → merge
|
|
182
|
+
- `UNSTABLE` → a required check failed or is in-flight. Wait if
|
|
183
|
+
in-flight, fix if failed
|
|
184
|
+
- `BEHIND` → rebase on main, force-push, re-check
|
|
185
|
+
- `DIRTY` → conflicts, resolve, push, re-check
|
|
186
|
+
- `BLOCKED` → branch protection rules not met — check reviews, required
|
|
187
|
+
checks, status checks
|
|
188
|
+
- `UNKNOWN` → GitHub is still computing, wait ~30–60s and re-query
|
|
189
|
+
|
|
190
|
+
See `coordination.md` "When You're Stuck" for the full routing table.
|
|
191
|
+
|
|
192
|
+
Don't merge on `UNSTABLE` assuming the failing check is unrelated. If the
|
|
193
|
+
check is required, investigate it before merging.
|
|
194
|
+
|
|
195
|
+
### Squash vs merge vs rebase
|
|
196
|
+
|
|
197
|
+
Prefer **squash**. One PR = one commit on main. Keeps the history linear
|
|
198
|
+
and the commit log scannable.
|
|
199
|
+
|
|
200
|
+
Use merge-commit style only when the PR legitimately captures multiple
|
|
201
|
+
independent changes that should be preserved as separate commits.
|
|
202
|
+
Don't use rebase-merge unless the project has a specific reason.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## After the merge
|
|
207
|
+
|
|
208
|
+
1. **Delete the remote branch.** (GitHub prompts; take the prompt.)
|
|
209
|
+
2. **Post the closure handoff on the originating issue** (`coordination.md`
|
|
210
|
+
rule 1):
|
|
211
|
+
|
|
212
|
+
> `@<reporter>` PR #N merged as `<commit-sha>`. Ready for you to close
|
|
213
|
+
> when verified.
|
|
214
|
+
|
|
215
|
+
3. **Do NOT close the issue yourself if you weren't the reporter.** Let
|
|
216
|
+
the reporter verify + close. See `coordination.md` failure-mode-B for
|
|
217
|
+
when you ARE the reporter (then close yourself).
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## CI-aware merge timing
|
|
222
|
+
|
|
223
|
+
If the repository uses CI on PRs:
|
|
224
|
+
|
|
225
|
+
- Wait for at least the `check` job to complete before merging (or whatever
|
|
226
|
+
the required-status-checks list demands).
|
|
227
|
+
- `UNSTABLE` + `in-flight` → wait
|
|
228
|
+
- `UNSTABLE` + `failed` → investigate + fix, then re-push
|
|
229
|
+
- Don't merge-and-then-hope-the-CI-was-flaky. If required checks failed,
|
|
230
|
+
fix them before merging.
|
|
231
|
+
|
|
232
|
+
For fire-and-forget trivial changes (e.g., typo fix), waiting for CI is
|
|
233
|
+
still right — it's a minute, and it protects against "the typo fix broke
|
|
234
|
+
the build" because someone's CI job depends on the exact string you
|
|
235
|
+
changed.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## When to modify this rule
|
|
240
|
+
|
|
241
|
+
- **Read:** every session start.
|
|
242
|
+
- **Modify:** never directly in workspace copies. Edit the canonical file
|
|
243
|
+
and re-distribute via `macf update`.
|
|
244
|
+
- **Disagree with a rule?** Open an issue on `groundnuty/macf` proposing
|
|
245
|
+
the change, with rationale + the incident that surfaced the need.
|