@groundnuty/macf 0.2.0-rc.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/dist/.build-info.json +2 -2
  2. package/dist/cli/claude-sh.d.ts.map +1 -1
  3. package/dist/cli/claude-sh.js +12 -4
  4. package/dist/cli/claude-sh.js.map +1 -1
  5. package/dist/cli/commands/init.d.ts.map +1 -1
  6. package/dist/cli/commands/init.js +8 -1
  7. package/dist/cli/commands/init.js.map +1 -1
  8. package/dist/cli/commands/rules-refresh.d.ts.map +1 -1
  9. package/dist/cli/commands/rules-refresh.js +5 -1
  10. package/dist/cli/commands/rules-refresh.js.map +1 -1
  11. package/dist/cli/commands/update.d.ts.map +1 -1
  12. package/dist/cli/commands/update.js +8 -1
  13. package/dist/cli/commands/update.js.map +1 -1
  14. package/dist/cli/index.js +2 -1
  15. package/dist/cli/index.js.map +1 -1
  16. package/dist/cli/settings-writer.d.ts +84 -4
  17. package/dist/cli/settings-writer.d.ts.map +1 -1
  18. package/dist/cli/settings-writer.js +182 -4
  19. package/dist/cli/settings-writer.js.map +1 -1
  20. package/dist/cli/version-resolver.d.ts.map +1 -1
  21. package/dist/cli/version-resolver.js +15 -2
  22. package/dist/cli/version-resolver.js.map +1 -1
  23. package/dist/package-version.d.ts +2 -0
  24. package/dist/package-version.d.ts.map +1 -0
  25. package/dist/package-version.js +26 -0
  26. package/dist/package-version.js.map +1 -0
  27. package/package.json +2 -2
  28. package/plugin/rules/check-before-propose.md +86 -0
  29. package/plugin/rules/codify-at-correction-time.md +92 -0
  30. package/plugin/rules/coordination.md +17 -0
  31. package/plugin/rules/delegation-template.md +250 -0
  32. package/plugin/rules/execute-on-directive.md +71 -0
  33. package/plugin/rules/gh-token-attribution-traps.md +157 -0
  34. package/plugin/rules/mention-routing-hygiene.md +105 -0
  35. package/plugin/rules/model-era-compatibility.md +94 -0
  36. package/plugin/rules/observability-wiring.md +60 -0
  37. package/plugin/rules/peer-dynamic.md +205 -0
  38. package/plugin/rules/pr-discipline.md +245 -0
  39. package/plugin/rules/verify-before-claim.md +131 -0
@@ -0,0 +1,250 @@
1
+ # Delegation Template (canonical, shared)
2
+
3
+ **This file is the single source of truth for how one agent delegates work to
4
+ another.** It is copied into each agent workspace's `.claude/rules/` by
5
+ `macf init` and refreshed by `macf update` / `macf rules refresh`. Do not edit
6
+ workspace copies directly — edit the canonical file at
7
+ `groundnuty/macf:packages/macf/plugin/rules/delegation-template.md` and re-run
8
+ the distribution.
9
+
10
+ Applies to any MACF agent that hands off a task to a peer — coordinators
11
+ delegating to implementers, researchers delegating to reviewers, any agent
12
+ splitting off scope. Works whether the peer is another bot or a human.
13
+
14
+ ---
15
+
16
+ ## When to delegate vs do the work yourself
17
+
18
+ Delegation is for **asymmetric capability**, not ceremonial hand-off. Before
19
+ filing a delegation issue, ask: is the peer agent actually better positioned
20
+ to do this work than I am?
21
+
22
+ **Delegate when:**
23
+
24
+ - The peer has domain expertise the reporter lacks (framework TypeScript
25
+ internals → code-agent; LaTeX CV typesetting → cv-architect; historical
26
+ project research → cv-project-archaeologist)
27
+ - The peer owns the repo / the canonical source (framework changes →
28
+ code-agent who owns `groundnuty/macf`)
29
+ - The peer has persistent context the reporter doesn't (long-running
30
+ investigation, repository-specific conventions, team-facing
31
+ relationships)
32
+ - The work would meaningfully benefit from review + merge discipline by a
33
+ non-author (even if the reporter could technically do the work, the
34
+ peer's second pair of eyes catches issues the author won't)
35
+
36
+ **Do the work yourself (skip delegation) when:**
37
+
38
+ - You have asymmetric context the peer would need to learn — delegation
39
+ becomes "please copy my notes into a file" rather than real work
40
+ - You are the domain expert (rules about your own collaboration patterns,
41
+ a postmortem of an incident you lived, a DR for a design you drove)
42
+ - The task is ceremonial packaging of material the reporter authored
43
+ anyway (the peer adds no value beyond typing)
44
+ - The work is time-sensitive and the delegation round-trip would exceed
45
+ the fix window
46
+
47
+ The test: if the peer's first action would be "ask the reporter for more
48
+ context / source material", you're delegating ceremony, not real work —
49
+ do it yourself. If the peer would immediately have everything they need
50
+ to start, delegate.
51
+
52
+ When you do the work yourself despite it being "in the peer's domain",
53
+ note it explicitly — either in the PR body or a handoff comment:
54
+
55
+ > "Authoring this directly rather than delegating to `<peer>` because the
56
+ > content is distilled from my own collaboration-pattern observations —
57
+ > `<peer>` would need me to write the text anyway. See
58
+ > delegation-template.md 'When to delegate' for the principle."
59
+
60
+ Transparency preserves the peer relationship: the peer sees the reasoning
61
+ instead of feeling bypassed, and the coordinator can push back if they
62
+ disagree with the self-authored call. Default to delegating when in
63
+ doubt — the PR review step gives the peer their voice regardless.
64
+
65
+ ---
66
+
67
+ ## The 6-section issue template
68
+
69
+ When you file a delegation issue for another agent, structure the body so the
70
+ peer can start without coming back for clarification. Consistency matters more
71
+ than any single section — the same headings in the same order make the body
72
+ scannable and automation-friendly.
73
+
74
+ ```markdown
75
+ ## Context
76
+
77
+ <What is this part of? Link to parent issue, design doc, DR. Why does this
78
+ task exist? What problem is it solving?>
79
+
80
+ ## Goal
81
+
82
+ <One-sentence statement of what success looks like. If you can't write it in
83
+ one sentence, the task is too big — split first.>
84
+
85
+ ## Acceptance Criteria
86
+
87
+ - [ ] <specific, testable, externally verifiable>
88
+ - [ ] <specific, testable, externally verifiable>
89
+ - [ ] <specific, testable, externally verifiable>
90
+
91
+ ## Dependencies
92
+
93
+ - Depends on: #<N> (must be done first)
94
+ - Blocks: #<M> (waiting on this)
95
+ - (or "none" if truly standalone)
96
+
97
+ ## Pointers
98
+
99
+ - Design ref: <path or link to the DR / spec / research doc>
100
+ - Files to touch: <paths>
101
+ - Existing patterns: <where to look in the codebase for reference>
102
+ - Prior art: <similar work already done>
103
+
104
+ ## Notes
105
+
106
+ <Gotchas, tradeoffs you considered, alternatives rejected. Research-refresher
107
+ caveats — e.g., "your training data may be stale for library X; verify
108
+ current docs before implementing.">
109
+
110
+ ---
111
+
112
+ @<peer-agent>[bot] please take a look and ask if anything is unclear.
113
+ ```
114
+
115
+ The assignee label (`code-agent`, `cv-architect`, etc.) goes on issue
116
+ creation, not as the mention target — routing picks it up from the label.
117
+
118
+ ### Why a fixed template
119
+
120
+ - **Predictability** — peer agents don't have to guess where the acceptance
121
+ criteria are
122
+ - **Completeness check** — missing sections surface missing information at
123
+ write time, not implement time
124
+ - **Parseability** — scripts, dashboards, and automation can reliably extract
125
+ fields
126
+
127
+ Deviation from the template is allowed when the domain genuinely needs
128
+ different structure — e.g., research-findings tasks may need "Project
129
+ framing / Dates / Milestones / CV-angle hooks" instead of generic sections.
130
+ Keep the *shape* (multiple level-2 or level-3 headings identifying distinct
131
+ concerns); don't free-text a blob of requirements.
132
+
133
+ ---
134
+
135
+ ## Ask before filing
136
+
137
+ Before creating the issue, confirm with the requester (usually the user, but
138
+ it may be a coordinator peer):
139
+
140
+ > **Route this now or backlog?**
141
+ > 1. Now — peer agent picks it up immediately (applies assignee label, adds to board)
142
+ > 2. Backlog — sits on the board for later, unassigned
143
+
144
+ Getting this wrong creates noise: the assigned peer starts on something that
145
+ isn't ready, or a backlog item sits unprocessed because no one noticed the
146
+ label. Ask once; check the answer for each delegation.
147
+
148
+ ---
149
+
150
+ ## Labels on creation
151
+
152
+ - **Assignee label** (`code-agent`, `science-agent`, `cv-architect`,
153
+ `writing-agent`, or whatever the target's routing label is) — the primary
154
+ routing signal
155
+ - **Phase / area label** (`phase:P1`, `docs`, `research`, etc.) — optional
156
+ classification
157
+ - **Type label** (`feat`, `fix`, `chore`, `docs`) — optional
158
+ - **Priority label** (`priority:P0`, `priority:P1`, etc.) — optional
159
+
160
+ Don't apply the assignee label in "Backlog" mode — routing picks up the label
161
+ and wakes the peer. Use a separate `backlog` label if your project has one,
162
+ or leave unassigned.
163
+
164
+ ---
165
+
166
+ ## After filing
167
+
168
+ 1. Post a brief comment on any related issue / PR linking the new delegation:
169
+ "Filed #<N> for this."
170
+ 2. Add to the project board if not auto-added.
171
+ 3. **Continue with other work — do not wait idle** for the peer to respond.
172
+ You will be @mentioned when it's your turn again.
173
+
174
+ ---
175
+
176
+ ## Receiving work back
177
+
178
+ When the peer agent files a PR referencing your delegation issue:
179
+
180
+ 1. You are @mentioned — check out the PR branch, read the diff (and relevant
181
+ surrounding context, not just the diff lines).
182
+ 2. Review honestly. Spend enough time to be sure. A quick LGTM on substantive
183
+ work is worse than a thoughtful pushback.
184
+ 3. If **LGTM**: approve + @mention the peer that they can merge.
185
+ 4. If **changes needed**: list specifics (file:line references preferred),
186
+ @mention peer, explain *why* (not just "change X to Y" — the reasoning
187
+ matters for the peer to decide whether to push back or accept).
188
+ 5. **Do not merge yourself.** The implementer merges after your approval,
189
+ per coordination.md "merge-by-implementer" (they wrote the code; they
190
+ own the merge).
191
+
192
+ After changes are pushed:
193
+
194
+ 6. Re-review the updated diff — not the whole PR again, just the delta.
195
+ 7. Either re-LGTM or list further concerns. @mention either way.
196
+
197
+ ---
198
+
199
+ ## Push-back acknowledgment
200
+
201
+ If the peer pushes back on your issue body or your review:
202
+
203
+ - Read their argument completely.
204
+ - If they're right, say so, adjust the issue body / requirements / review
205
+ comments, and continue.
206
+ - If you disagree, explain *why* in concrete terms — cite the relevant DR,
207
+ prior pattern in the codebase, or acceptance criteria. Abstract appeals
208
+ ("standard practice", "clean code") are not substantive; point at something
209
+ specific.
210
+ - If after discussion you still disagree, escalate to the requester (usually
211
+ the user) rather than overriding. See `coordination.md` "Escalation".
212
+
213
+ This is the peer dynamic — not "you file, I obey."
214
+
215
+ ---
216
+
217
+ ## When a PR is not needed
218
+
219
+ Questions or discussions resolve in comments alone. The reporter closes the
220
+ issue when done — no PR required. Use this for clarifications, status
221
+ requests, or research-summary tasks where the output is the comment thread
222
+ itself (rare; prefer a committed doc + PR for anything citeable).
223
+
224
+ **Not "when you're in a hurry."** PR discipline (see `pr-discipline.md`) is
225
+ the default for any work that produces an artifact.
226
+
227
+ ---
228
+
229
+ ## Stage-appropriate delegation
230
+
231
+ | Stage | Routing mechanism | Can you delegate? |
232
+ |---|---|---|
233
+ | 0 | None (single agent) | No — you are the only agent; do the work yourself or defer |
234
+ | 1 | None (bot identity but solo) | No — same as stage 0 but with bot attribution |
235
+ | 2 | SSH + tmux routing | Yes — routing Action forwards issues / @mentions to peer |
236
+ | 3 | MACF channels | Yes — HTTP POST to peer's channel server |
237
+
238
+ Filing issues before routing is set up just piles them up unrouted. If the
239
+ peer agent doesn't exist yet, ask whether to file anyway (for tracking) or
240
+ defer until routing is online.
241
+
242
+ ---
243
+
244
+ ## When to modify this rule
245
+
246
+ - **Read:** every session start. This rule defines how delegations look.
247
+ - **Modify:** never directly in workspace copies. Edit the canonical file
248
+ and re-distribute via `macf update` in each affected workspace.
249
+ - **Disagree with a rule?** Open an issue on `groundnuty/macf` proposing the
250
+ change, with rationale. Peer review applies.
@@ -0,0 +1,71 @@
1
+ # Execute on Directive
2
+
3
+ **After a user directive like "go", "do X", "proceed", "approve", "ship it" — execute. Do not circle back to re-ask.**
4
+
5
+ Pattern that causes friction: user gives a directive → you describe what you'd do → you ask "should I proceed?" → user repeats the directive → you execute. Three turns for what should be one turn.
6
+
7
+ ---
8
+
9
+ ## The rule
10
+
11
+ When the user has clearly authorized the action — explicitly ("go", "do X", "proceed") or contextually ("here's the plan... okay, let's do it") — execute. The user's next message should be a response to the *result*, not a second authorization of the same thing.
12
+
13
+ **Bad:**
14
+ > User: run the tests
15
+ > You: I'll run `make test` now. Should I proceed?
16
+ > User: yes
17
+ > You: [runs tests]
18
+
19
+ **Good:**
20
+ > User: run the tests
21
+ > You: [runs tests] All 47 passing.
22
+
23
+ **Good (when genuinely non-obvious):**
24
+ > User: ship the observability stack
25
+ > You: Two paths — (A) ship exactly what's in the research doc (Tier B), (B) the hybrid fallback (Langfuse-on-compose). I'd lean A. Which?
26
+ > User: A
27
+ > You: [ships Tier B]
28
+
29
+ The second "Good" is NOT asking to proceed — it's surfacing a real branching decision. The directive is ambiguous between two paths, and picking one creates a different outcome. That's a legitimate clarifying question.
30
+
31
+ ---
32
+
33
+ ## When to ask vs. when to execute
34
+
35
+ **Execute when:**
36
+ - The directive names a specific action ("close the issue", "merge the PR", "run the spike")
37
+ - You already laid out what you'd do and the user said a word-or-two approval ("go", "ok", "proceed", "yes", "ship it")
38
+ - The action is reversible and bounded (local edit, `make check` run, filing one issue, reading a file)
39
+
40
+ **Ask when:**
41
+ - Genuinely multiple paths with materially different outcomes
42
+ - Naming that matters (directory names, branch names, App names — hard to change later)
43
+ - Destructive + irreversible operations (force-push, `rm -rf`, dropping a database, unpublishing an npm package, deleting a PR branch before merge-confirmation)
44
+ - Scope ambiguity where wrong interpretation wastes hours (does "update the docs" mean just README, or all 15 files?)
45
+ - Security / access-control consequences (installing a GitHub App, granting secrets access, opening a port)
46
+
47
+ **The test:** if clarification would reveal a different action, ask. If clarification would return the same answer you already heard, execute.
48
+
49
+ ---
50
+
51
+ ## What slow-directive-execution looks like from the user's side
52
+
53
+ The friction mode: user says "go", agent spends a turn restating the plan and asking "shall I?", user re-approves, agent finally runs. The restatement was free for the agent but costly for the user — it's another message to read, another turn to spend before seeing the result.
54
+
55
+ Once the plan is agreed, the user wants the *output* of executing it, not a reminder of what the plan was.
56
+
57
+ **Corollary:** after execution, lead with the result, not a recap of what you did. See `peer-dynamic.md` § "Response form" — skip restatement, skip trailing summaries.
58
+
59
+ ---
60
+
61
+ ## How this interacts with `pr-discipline.md` and `coordination.md`
62
+
63
+ Those rules introduce structured decision points (ask-before-filing, reviewer-approval-before-merge, never-close-someone-else's-issue). Those are NOT "should I proceed?" moments — they're workflow-level gates that the user already endorsed by adopting MACF.
64
+
65
+ This rule applies to the *micro* level: once a turn's work is approved, the micro-steps don't each need re-approval. Don't turn one directive into ten re-confirmations.
66
+
67
+ ---
68
+
69
+ ## Why this rule exists
70
+
71
+ Agents that over-ask burn the user's attention on recaps they already wrote. The fix isn't being *less* careful — it's trusting the directive when it's already explicit. Save the clarifying-question budget for the cases that genuinely need it.
@@ -0,0 +1,157 @@
1
+ # gh-token attribution-trap failure modes
2
+
3
+ **Six ways `gh` operations silently mis-attribute to the user account instead of the bot, and the patterns that prevent them.**
4
+
5
+ When a bot agent runs `gh` commands, multiple silent failure modes cause ops to attribute to the user account instead of the bot. Because the content the agent posts is what it intended, mis-attribution is **invisible unless explicitly checked.** Past incidents:
6
+
7
+ - Code-agent's merge-handoff comments posted under operator's account for an entire session — discovered only when cross-agent routing started failing
8
+ - PR #16/#17 author mis-attribution surfaced during routine review (not by attribution itself)
9
+ - 5+ recurring instances logged across science-agent + code-agent memory before this rule was canonicalized
10
+
11
+ This is a **silent-fallback hazard class**: tool operations succeed at the API boundary, semantic-level failure (wrong identity / wrong scope / wrong target) is invisible until something downstream breaks. Defenses must guard at the *result-invariant* level, not the *exit-code* level.
12
+
13
+ ---
14
+
15
+ ## The six failure modes
16
+
17
+ ### 1. Wrong private key file
18
+
19
+ GitHub rotated the App's private key (or the operator put the wrong .pem in the workspace). Local `.github-app-key.pem` doesn't match what GitHub has registered. Every JWT fails with `"A JSON web token could not be decoded"`. `gh token generate` exits non-zero, `GH_TOKEN=$(...)` captures empty, `gh` silently falls through to stored `gh auth login` as the user.
20
+
21
+ **Detect:**
22
+ ```bash
23
+ # Compare local fingerprint with GitHub's (visible on the App settings page)
24
+ openssl rsa -in <key.pem> -pubout -outform DER 2>/dev/null | openssl dgst -sha256 -binary | base64
25
+ ```
26
+ Should match the "SHA256:..." shown on `github.com/settings/apps/<app>`.
27
+
28
+ ### 2. Clock drift between VM and GitHub
29
+
30
+ If `iat` in the JWT is ahead of GitHub's clock (VM runs fast, or even a few seconds skew), GitHub rejects the JWT as "from the future" — same `"JSON web token could not be decoded"` error. Intermittent: sometimes valid, sometimes not, depending on the moment.
31
+
32
+ **Fix:** use a 180-second back-window on `iat` (not the 60s default):
33
+ ```bash
34
+ iat=$((now - 180)) # 3 minutes in the past — tolerates up to 3 min of clock skew
35
+ exp=$((now + 420)) # still 10 min total lifetime (180 past + 420 future)
36
+ ```
37
+
38
+ ### 3. `gh` silent fallback to stored user auth
39
+
40
+ When `GH_TOKEN` is empty or invalid, `gh` falls through to `~/.config/gh/hosts.yml` if present. Ops succeed, content is correct, but `author` on the created resource is the user account.
41
+
42
+ **Fix:** fail loud. Never use `export GH_TOKEN=$(gh token generate ... | jq)` as a one-liner; it swallows errors. Pattern:
43
+ ```bash
44
+ # Assert token was generated AND has the bot prefix
45
+ TOKEN=$(.claude/scripts/macf-gh-token.sh --app-id "$APP_ID" --install-id "$INSTALL_ID" --key "$KEY_PATH") || {
46
+ echo "FATAL: token-gen failed" >&2; exit 1
47
+ }
48
+ [ -n "$TOKEN" ] || { echo "FATAL: empty token" >&2; exit 1; }
49
+ case "$TOKEN" in ghs_*) ;; *) echo "FATAL: bad token prefix" >&2; exit 1 ;; esac
50
+ export GH_TOKEN="$TOKEN"
51
+ ```
52
+
53
+ ### 4. Wrong `gh auth` on the VM providing fallback
54
+
55
+ Having `gh auth login` configured as a user account creates the fallback surface in #3. Even a "good" setup where the script is correct can hide a broken bot token because `gh` quietly uses the user auth.
56
+
57
+ **Best hardening:** remove stored `gh auth login` on agent VMs entirely — then broken bot tokens fail loudly with "no auth," not silently as the user.
58
+
59
+ **Tradeoff:** interactive `gh` inspection by a human on the VM also requires a token. Usually acceptable for dedicated agent VMs.
60
+
61
+ ### 5. Helper script missing from workspace (path 127, silent empty capture)
62
+
63
+ Non-init'd workspace had never received `.claude/scripts/macf-gh-token.sh` because the operator forgot to run `macf rules refresh` after the helper was introduced. Calling `./.claude/scripts/macf-gh-token.sh ...` returned exit 127 ("no such file") — but with `export GH_TOKEN=$(helper 2>/dev/null)` the 127 is silently discarded, stdout is empty, `GH_TOKEN=""`, `gh` falls through to stored user auth.
64
+
65
+ Compounds modes #3 + #4: all the bot ops look normal but post as the user. Only noticed when a human checked a comment URL.
66
+
67
+ **Fix:** use `macf rules refresh --dir <workspace>` to install canonical `macf-gh-token.sh` + `macf-whoami.sh` + `tmux-send-to-claude.sh`. Workbench-only workspaces (substrate agents) still need this — the helpers are distributed via a separate mechanism from full `macf init`.
68
+
69
+ Also: **always validate token prefix in the chain, not just in the helper.** Even a correctly-installed helper can fail (rotated key, clock drift) and return empty. Chain guard:
70
+
71
+ ```bash
72
+ GH_TOKEN=$(./.claude/scripts/macf-gh-token.sh ...) \
73
+ && [[ "$GH_TOKEN" == ghs_* ]] \
74
+ || { echo "FATAL: bad token"; exit 1; }
75
+ export GH_TOKEN
76
+ ```
77
+
78
+ The `[[ ... == ghs_* ]]` check catches both empty and junk-output cases (e.g., the "(eval):1: no such file" leak when stderr was merged into stdout).
79
+
80
+ ### 6. Relative path to helper (or key) breaks on cross-repo `cd`
81
+
82
+ When an agent `cd`'s to another repo for cross-repo work (e.g., code-agent editing `macf-actions` from its `macf` workspace), `./.claude/scripts/...` doesn't resolve from the new cwd, and relative `$KEY_PATH` can't be read either.
83
+
84
+ `$(...)` command substitution swallows the helper's exit 127 silently, returns empty string. `export GH_TOKEN=""` succeeds with no error. Next `gh` call falls through to stored user auth. Mode-3 silent fallback, triggered by path breakage.
85
+
86
+ **Fix (canonical, post macf#161):**
87
+
88
+ - `claude.sh` exports `MACF_WORKSPACE_DIR="$SCRIPT_DIR"` — the workspace absolute path, available in all agent env regardless of cwd.
89
+ - `claude.sh` absolutizes `KEY_PATH` via `case` on leading slash (preserves operator-absolute paths like `/etc/macf/keys/...`, rewrites relative default to `$SCRIPT_DIR/$KEY_PATH`).
90
+ - All canonical agent templates use `$MACF_WORKSPACE_DIR/.claude/scripts/...` (NOT relative `./...`).
91
+
92
+ **Why mode 6 is distinct from mode 5:** mode 5 is "helper script file missing from workspace entirely." Mode 6 is "helper present in workspace, but reachable only via a path that breaks on cross-repo cwd." Mode 5's fix is installing the helper (`macf rules refresh`); mode 6's fix is using an absolute path to it (`$MACF_WORKSPACE_DIR/...`).
93
+
94
+ ---
95
+
96
+ ## Verifying identity after ops
97
+
98
+ Don't trust; verify. A token that "looks right" can still be misattributed due to a subtle env issue. Spot-check at session start:
99
+
100
+ ```bash
101
+ # After posting any comment in a session, sanity-check ONE post:
102
+ GH_TOKEN=$T gh api "/repos/$REPO/issues/comments/$ID" --jq '.user.login'
103
+ # Expect: <bot-name>[bot]; FAIL if it shows the operator's user account
104
+ ```
105
+
106
+ Do this once at session start (cheap), then trust. Mid-session re-checks not needed unless something suspicious happens.
107
+
108
+ The `macf-whoami.sh` helper canonicalizes this check:
109
+
110
+ ```bash
111
+ .claude/scripts/macf-whoami.sh
112
+ # Prints the actor identity associated with current $GH_TOKEN
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Canonical pattern (distilled)
118
+
119
+ ```bash
120
+ # Token acquisition — fails loud, token-prefix validated
121
+ TOKEN=$(.claude/scripts/macf-gh-token.sh --app-id "$APP_ID" --install-id "$INSTALL_ID" --key "$KEY_PATH") \
122
+ || { echo "FATAL: token gen failed" >&2; exit 1; }
123
+
124
+ # Chain the op immediately so token doesn't linger in env
125
+ GH_TOKEN=$TOKEN gh issue comment <N> --repo <owner>/<repo> --body "..."
126
+ ```
127
+
128
+ The helper script `macf-gh-token.sh` must:
129
+ 1. Use `set -euo pipefail`
130
+ 2. Use 180s `iat` back-window for clock drift tolerance
131
+ 3. Validate token prefix is `ghs_` (installation token); refuse to print user PATs (`ghp_*`, `gho_*`)
132
+ 4. Print nothing to stdout on failure (only stderr)
133
+
134
+ ---
135
+
136
+ ## Structural backstop: PreToolUse hook (macf#140)
137
+
138
+ Workspaces include a `PreToolUse` hook that intercepts `gh` and `git push` invocations and blocks with `exit 2` if `GH_TOKEN` is missing or doesn't have the `ghs_` prefix. This catches mode 3, 5, and 6 at the call site rather than after the fact.
139
+
140
+ Distribution: `macf init` / `macf update` / `macf rules refresh` install the hook + the helper scripts together.
141
+
142
+ When intentionally bypassing the hook for a knowingly user-attributed op (e.g., `gh auth login` during onboarding), set `MACF_SKIP_TOKEN_CHECK=1` for that one call.
143
+
144
+ ---
145
+
146
+ ## How this relates to other canonical rules
147
+
148
+ - `coordination.md` § "Token & Git Hygiene" documents the canonical helper invocation; this rule provides the failure-mode catalog the helper is designed to defend against.
149
+ - `pr-discipline.md` documents auto-close-keyword hazards in PR bodies; same shape (silent-fallback hazard at the API boundary) but different surface.
150
+
151
+ ---
152
+
153
+ ## Why this rule exists
154
+
155
+ Silent mis-attribution is a coordination failure mode that breaks routing workflows (workflow @-mention iteration sees user identity on bot-emitted posts; routing doesn't fire) and audit trails (paper-trail evidence shows wrong actors). The trap is mature — recurred 5+ instances across two substrate agents before this rule canonicalized — and prevention is straightforward once the failure modes are catalogued.
156
+
157
+ Pattern: defenses target *result-invariants* (token has `ghs_` prefix; spot-check actor on a known post), not *exit-code-success* (which silent fallbacks satisfy). This generalizes beyond gh-tokens to other tool/API surfaces with silent-fallback hazards.
@@ -0,0 +1,105 @@
1
+ # Mention-Routing Hygiene
2
+
3
+ **GitHub `@handle[bot]` mentions fire the Agent Router workflow regardless of surrounding context. When you are writing *about* an agent — quoting its output, analyzing its behavior, describing it in documentation — code-format the handle to suppress routing. Raw handles are reserved for intentional routing targets.**
4
+
5
+ GitHub's @mention semantics treat any bare `@handle` in a comment or PR body as a routable signal. The macf-actions router picks these up and forwards the reference to the named agent's tmux session, regardless of whether the reference was in an addressing context ("please take a look") or a describing context ("tester-2's response was…"). When described-not-addressed mentions fire routing, the referenced agent receives an ambient ping asking for an action that was never actually requested. For rules-loaded agents, this typically triggers scope-discipline reasoning (read context, recognize content-reference, cite `agent-identity.md §Not-for-testers`, stand down) and a response comment explaining why. Multiply this across scenario PRs that analyze tester behavior, insight documents that reference agents as research subjects, or cross-agent research commentary — testers and other agents get pinged for every describing use of their handle.
6
+
7
+ The fix is one character per handle: wrap the `@` and bracketed `[bot]` in backticks. The convention is cheap to apply; the cost of skipping it is proportional to how much the fleet writes about other agents.
8
+
9
+ ---
10
+
11
+ ## 1. The two modes
12
+
13
+ **Addressing** — you want the agent to see this and respond / act:
14
+
15
+ @macf-code-agent[bot] please review PR #12 when convenient.
16
+
17
+ **Describing** — you are writing about the agent; no routing needed:
18
+
19
+ The `@macf-tester-2-agent[bot]` response quoted `coordination.md` rule 1 verbatim.
20
+
21
+ The difference is a single pair of backticks. Both forms still render recognizably in GitHub's UI; only the raw form fires routing.
22
+
23
+ **Decision rule when unsure:** ask whether the agent receiving this comment should treat it as an action ask. If yes → raw. If no → backticked.
24
+
25
+ ---
26
+
27
+ ## 2. Contexts where describing happens
28
+
29
+ These are the common places the describing form applies. Not exhaustive; the principle generalizes:
30
+
31
+ - **PR bodies** quoting another agent's output verbatim (scenario transcripts, review excerpts, observation snippets)
32
+ - **Issue bodies and comments** analyzing agent behavior for research or post-mortem purposes
33
+ - **Observation logs** and **insight documents** referencing agents as research subjects
34
+ - **Canonical rule files and documentation** citing agent handles as examples
35
+ - **Paper drafts** and **research notes** in which agents are data, not interlocutors
36
+ - **Cross-agent commentary** where you are synthesizing what multiple agents did
37
+
38
+ In each of these, raw `@handle` will fire routing. Backtick-wrap.
39
+
40
+ Handles inside fenced code blocks (triple-backtick) or 4-space-indented blocks are already safe — GitHub's mention parser skips code contexts. Backtick-wrapping is for inline prose where a bare `@handle` would otherwise be parsed as a routing target.
41
+
42
+ ---
43
+
44
+ ## 3. Contexts where addressing happens
45
+
46
+ These are the legitimate routing targets — keep the raw form:
47
+
48
+ - The closing line of a PR body that asks a reviewer to look: `@macf-science-agent[bot] ready for review.`
49
+ - Direct replies on a thread where you expect the agent to act: `@macf-code-agent[bot] pushback: see file:line ref below.`
50
+ - Handoff comments: `@<reporter> ready for you to close when verified.`
51
+ - Escalation pings when blocked.
52
+
53
+ If a comment contains both describing and addressing references to the same agent, the describing form gets backticks and the addressing form stays raw.
54
+
55
+ ---
56
+
57
+ ## 4. Verification check before posting
58
+
59
+ For any comment or PR body that contains agent handles, grep the draft:
60
+
61
+ grep -nE '@macf-[a-z-]+-agent\[bot\]' <draft-file>
62
+
63
+ For each line returned: is this line an action ask (raw stays) or a content reference (backticks wrap)?
64
+
65
+ If you don't want to verify per-line, the safe default is: **backtick by default, un-backtick only the addressing lines**.
66
+
67
+ Drafts you run through this check regularly: scenario PR bodies that include preserved artifacts, observation-log entries, insight files, review comments that quote the agent being reviewed.
68
+
69
+ ---
70
+
71
+ ## 5. Alternative forms (same effect, lower readability)
72
+
73
+ All three of these suppress routing; the choice is stylistic:
74
+
75
+ - **Backticks** (preferred): `` `@macf-tester-2-agent[bot]` ``
76
+ - **Escape sequences**: `\@macf-tester-2-agent\[bot\]`
77
+ - **Label form**: "tester-2" or "the tester-2 agent" (for prose where a full handle isn't needed)
78
+
79
+ Backticks are preferred because they render as inline code in GitHub's Markdown, which is semantically meaningful ("this is a handle identifier being referenced") and visually distinct from raw routing targets.
80
+
81
+ ---
82
+
83
+ ## 6. Symmetry — this rule applies to every agent
84
+
85
+ The convention is **bidirectional** across the fleet. Every agent that writes commentary referencing other agents applies the same rule:
86
+
87
+ - Science-agent writing about testers → backticks
88
+ - Code-agent writing about testers → backticks
89
+ - Testers writing about each other → backticks
90
+ - Operator-thread comments quoting agent output → backticks (operator is human but the convention applies for consistency)
91
+ - Devops, CV, future-fleet agents → backticks
92
+
93
+ Asymmetric adoption breaks the protocol: if only some agents escape handles, the others continue leaking routing from the contexts they miss. Symmetric adoption is what makes the convention enforceable.
94
+
95
+ ---
96
+
97
+ ## Why this rule exists
98
+
99
+ Routing is a global side-effect. Unlike most GitHub semantics, `@mention`-based routing in MACF does not differentiate the grammatical role of the handle — whether the sentence is about the agent, directed at the agent, or merely mentions the agent in passing. The router sees the handle and fires.
100
+
101
+ Without the backtick convention, every describing use of a handle produces a false-positive ping. For rules-loaded agents that correctly apply scope-discipline when incorrectly routed, this means they must spend attention reading the context, identifying that they were referenced-not-addressed, citing the applicable rule, and posting a stand-down comment. Each firing is a few seconds of noise per agent — but the describing form of a handle is far more common than the addressing form, so the noise accumulates quickly.
102
+
103
+ The failure-mode was observed on `macf-testbed#9` and `#18` (2026-04-24): a single rules-loaded tester received three ambient routing pings across two scenario PRs, correctly disciplined each response with scope-preserving rationale, and escalated the third firing into a cross-session-commitment-tracking critique of the author. That sequence of responses was appropriate — but it was also three response turns that could have been prevented by one keystroke of backticks per handle reference in the PR bodies.
104
+
105
+ The rule is cheap to apply, symmetric across the fleet, and eliminates a class of false-positive routing that otherwise compounds with every describing use of an agent handle.