@groundnuty/macf 0.2.0-rc.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.build-info.json +2 -2
- package/dist/cli/claude-sh.d.ts.map +1 -1
- package/dist/cli/claude-sh.js +12 -4
- package/dist/cli/claude-sh.js.map +1 -1
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +8 -1
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/commands/rules-refresh.d.ts.map +1 -1
- package/dist/cli/commands/rules-refresh.js +5 -1
- package/dist/cli/commands/rules-refresh.js.map +1 -1
- package/dist/cli/commands/update.d.ts.map +1 -1
- package/dist/cli/commands/update.js +8 -1
- package/dist/cli/commands/update.js.map +1 -1
- package/dist/cli/index.js +2 -1
- package/dist/cli/index.js.map +1 -1
- package/dist/cli/settings-writer.d.ts +84 -4
- package/dist/cli/settings-writer.d.ts.map +1 -1
- package/dist/cli/settings-writer.js +182 -4
- package/dist/cli/settings-writer.js.map +1 -1
- package/dist/cli/version-resolver.d.ts.map +1 -1
- package/dist/cli/version-resolver.js +15 -2
- package/dist/cli/version-resolver.js.map +1 -1
- package/dist/package-version.d.ts +2 -0
- package/dist/package-version.d.ts.map +1 -0
- package/dist/package-version.js +26 -0
- package/dist/package-version.js.map +1 -0
- package/package.json +2 -2
- package/plugin/rules/check-before-propose.md +86 -0
- package/plugin/rules/codify-at-correction-time.md +92 -0
- package/plugin/rules/coordination.md +17 -0
- package/plugin/rules/delegation-template.md +250 -0
- package/plugin/rules/execute-on-directive.md +71 -0
- package/plugin/rules/gh-token-attribution-traps.md +157 -0
- package/plugin/rules/mention-routing-hygiene.md +105 -0
- package/plugin/rules/model-era-compatibility.md +94 -0
- package/plugin/rules/observability-wiring.md +60 -0
- package/plugin/rules/peer-dynamic.md +205 -0
- package/plugin/rules/pr-discipline.md +245 -0
- package/plugin/rules/verify-before-claim.md +131 -0
|
@@ -0,0 +1,250 @@
|
|
|
1
|
+
# Delegation Template (canonical, shared)
|
|
2
|
+
|
|
3
|
+
**This file is the single source of truth for how one agent delegates work to
|
|
4
|
+
another.** It is copied into each agent workspace's `.claude/rules/` by
|
|
5
|
+
`macf init` and refreshed by `macf update` / `macf rules refresh`. Do not edit
|
|
6
|
+
workspace copies directly — edit the canonical file at
|
|
7
|
+
`groundnuty/macf:packages/macf/plugin/rules/delegation-template.md` and re-run
|
|
8
|
+
the distribution.
|
|
9
|
+
|
|
10
|
+
Applies to any MACF agent that hands off a task to a peer — coordinators
|
|
11
|
+
delegating to implementers, researchers delegating to reviewers, any agent
|
|
12
|
+
splitting off scope. Works whether the peer is another bot or a human.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## When to delegate vs do the work yourself
|
|
17
|
+
|
|
18
|
+
Delegation is for **asymmetric capability**, not ceremonial hand-off. Before
|
|
19
|
+
filing a delegation issue, ask: is the peer agent actually better positioned
|
|
20
|
+
to do this work than I am?
|
|
21
|
+
|
|
22
|
+
**Delegate when:**
|
|
23
|
+
|
|
24
|
+
- The peer has domain expertise the reporter lacks (framework TypeScript
|
|
25
|
+
internals → code-agent; LaTeX CV typesetting → cv-architect; historical
|
|
26
|
+
project research → cv-project-archaeologist)
|
|
27
|
+
- The peer owns the repo / the canonical source (framework changes →
|
|
28
|
+
code-agent who owns `groundnuty/macf`)
|
|
29
|
+
- The peer has persistent context the reporter doesn't (long-running
|
|
30
|
+
investigation, repository-specific conventions, team-facing
|
|
31
|
+
relationships)
|
|
32
|
+
- The work would meaningfully benefit from review + merge discipline by a
|
|
33
|
+
non-author (even if the reporter could technically do the work, the
|
|
34
|
+
peer's second pair of eyes catches issues the author won't)
|
|
35
|
+
|
|
36
|
+
**Do the work yourself (skip delegation) when:**
|
|
37
|
+
|
|
38
|
+
- You have asymmetric context the peer would need to learn — delegation
|
|
39
|
+
becomes "please copy my notes into a file" rather than real work
|
|
40
|
+
- You are the domain expert (rules about your own collaboration patterns,
|
|
41
|
+
a postmortem of an incident you lived, a DR for a design you drove)
|
|
42
|
+
- The task is ceremonial packaging of material the reporter authored
|
|
43
|
+
anyway (the peer adds no value beyond typing)
|
|
44
|
+
- The work is time-sensitive and the delegation round-trip would exceed
|
|
45
|
+
the fix window
|
|
46
|
+
|
|
47
|
+
The test: if the peer's first action would be "ask the reporter for more
|
|
48
|
+
context / source material", you're delegating ceremony, not real work —
|
|
49
|
+
do it yourself. If the peer would immediately have everything they need
|
|
50
|
+
to start, delegate.
|
|
51
|
+
|
|
52
|
+
When you do the work yourself despite it being "in the peer's domain",
|
|
53
|
+
note it explicitly — either in the PR body or a handoff comment:
|
|
54
|
+
|
|
55
|
+
> "Authoring this directly rather than delegating to `<peer>` because the
|
|
56
|
+
> content is distilled from my own collaboration-pattern observations —
|
|
57
|
+
> `<peer>` would need me to write the text anyway. See
|
|
58
|
+
> delegation-template.md 'When to delegate' for the principle."
|
|
59
|
+
|
|
60
|
+
Transparency preserves the peer relationship: the peer sees the reasoning
|
|
61
|
+
instead of feeling bypassed, and the coordinator can push back if they
|
|
62
|
+
disagree with the self-authored call. Default to delegating when in
|
|
63
|
+
doubt — the PR review step gives the peer their voice regardless.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## The 6-section issue template
|
|
68
|
+
|
|
69
|
+
When you file a delegation issue for another agent, structure the body so the
|
|
70
|
+
peer can start without coming back for clarification. Consistency matters more
|
|
71
|
+
than any single section — the same headings in the same order make the body
|
|
72
|
+
scannable and automation-friendly.
|
|
73
|
+
|
|
74
|
+
```markdown
|
|
75
|
+
## Context
|
|
76
|
+
|
|
77
|
+
<What is this part of? Link to parent issue, design doc, DR. Why does this
|
|
78
|
+
task exist? What problem is it solving?>
|
|
79
|
+
|
|
80
|
+
## Goal
|
|
81
|
+
|
|
82
|
+
<One-sentence statement of what success looks like. If you can't write it in
|
|
83
|
+
one sentence, the task is too big — split first.>
|
|
84
|
+
|
|
85
|
+
## Acceptance Criteria
|
|
86
|
+
|
|
87
|
+
- [ ] <specific, testable, externally verifiable>
|
|
88
|
+
- [ ] <specific, testable, externally verifiable>
|
|
89
|
+
- [ ] <specific, testable, externally verifiable>
|
|
90
|
+
|
|
91
|
+
## Dependencies
|
|
92
|
+
|
|
93
|
+
- Depends on: #<N> (must be done first)
|
|
94
|
+
- Blocks: #<M> (waiting on this)
|
|
95
|
+
- (or "none" if truly standalone)
|
|
96
|
+
|
|
97
|
+
## Pointers
|
|
98
|
+
|
|
99
|
+
- Design ref: <path or link to the DR / spec / research doc>
|
|
100
|
+
- Files to touch: <paths>
|
|
101
|
+
- Existing patterns: <where to look in the codebase for reference>
|
|
102
|
+
- Prior art: <similar work already done>
|
|
103
|
+
|
|
104
|
+
## Notes
|
|
105
|
+
|
|
106
|
+
<Gotchas, tradeoffs you considered, alternatives rejected. Research-refresher
|
|
107
|
+
caveats — e.g., "your training data may be stale for library X; verify
|
|
108
|
+
current docs before implementing.">
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
@<peer-agent>[bot] please take a look and ask if anything is unclear.
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The assignee label (`code-agent`, `cv-architect`, etc.) goes on issue
|
|
116
|
+
creation, not as the mention target — routing picks it up from the label.
|
|
117
|
+
|
|
118
|
+
### Why a fixed template
|
|
119
|
+
|
|
120
|
+
- **Predictability** — peer agents don't have to guess where the acceptance
|
|
121
|
+
criteria are
|
|
122
|
+
- **Completeness check** — missing sections surface missing information at
|
|
123
|
+
write time, not implement time
|
|
124
|
+
- **Parseability** — scripts, dashboards, and automation can reliably extract
|
|
125
|
+
fields
|
|
126
|
+
|
|
127
|
+
Deviation from the template is allowed when the domain genuinely needs
|
|
128
|
+
different structure — e.g., research-findings tasks may need "Project
|
|
129
|
+
framing / Dates / Milestones / CV-angle hooks" instead of generic sections.
|
|
130
|
+
Keep the *shape* (multiple level-2 or level-3 headings identifying distinct
|
|
131
|
+
concerns); don't free-text a blob of requirements.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Ask before filing
|
|
136
|
+
|
|
137
|
+
Before creating the issue, confirm with the requester (usually the user, but
|
|
138
|
+
it may be a coordinator peer):
|
|
139
|
+
|
|
140
|
+
> **Route this now or backlog?**
|
|
141
|
+
> 1. Now — peer agent picks it up immediately (applies assignee label, adds to board)
|
|
142
|
+
> 2. Backlog — sits on the board for later, unassigned
|
|
143
|
+
|
|
144
|
+
Getting this wrong creates noise: the assigned peer starts on something that
|
|
145
|
+
isn't ready, or a backlog item sits unprocessed because no one noticed the
|
|
146
|
+
label. Ask once; check the answer for each delegation.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Labels on creation
|
|
151
|
+
|
|
152
|
+
- **Assignee label** (`code-agent`, `science-agent`, `cv-architect`,
|
|
153
|
+
`writing-agent`, or whatever the target's routing label is) — the primary
|
|
154
|
+
routing signal
|
|
155
|
+
- **Phase / area label** (`phase:P1`, `docs`, `research`, etc.) — optional
|
|
156
|
+
classification
|
|
157
|
+
- **Type label** (`feat`, `fix`, `chore`, `docs`) — optional
|
|
158
|
+
- **Priority label** (`priority:P0`, `priority:P1`, etc.) — optional
|
|
159
|
+
|
|
160
|
+
Don't apply the assignee label in "Backlog" mode — routing picks up the label
|
|
161
|
+
and wakes the peer. Use a separate `backlog` label if your project has one,
|
|
162
|
+
or leave unassigned.
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## After filing
|
|
167
|
+
|
|
168
|
+
1. Post a brief comment on any related issue / PR linking the new delegation:
|
|
169
|
+
"Filed #<N> for this."
|
|
170
|
+
2. Add to the project board if not auto-added.
|
|
171
|
+
3. **Continue with other work — do not wait idle** for the peer to respond.
|
|
172
|
+
You will be @mentioned when it's your turn again.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## Receiving work back
|
|
177
|
+
|
|
178
|
+
When the peer agent files a PR referencing your delegation issue:
|
|
179
|
+
|
|
180
|
+
1. You are @mentioned — check out the PR branch, read the diff (and relevant
|
|
181
|
+
surrounding context, not just the diff lines).
|
|
182
|
+
2. Review honestly. Spend enough time to be sure. A quick LGTM on substantive
|
|
183
|
+
work is worse than a thoughtful pushback.
|
|
184
|
+
3. If **LGTM**: approve + @mention the peer that they can merge.
|
|
185
|
+
4. If **changes needed**: list specifics (file:line references preferred),
|
|
186
|
+
@mention peer, explain *why* (not just "change X to Y" — the reasoning
|
|
187
|
+
matters for the peer to decide whether to push back or accept).
|
|
188
|
+
5. **Do not merge yourself.** The implementer merges after your approval,
|
|
189
|
+
per coordination.md "merge-by-implementer" (they wrote the code; they
|
|
190
|
+
own the merge).
|
|
191
|
+
|
|
192
|
+
After changes are pushed:
|
|
193
|
+
|
|
194
|
+
6. Re-review the updated diff — not the whole PR again, just the delta.
|
|
195
|
+
7. Either re-LGTM or list further concerns. @mention either way.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Push-back acknowledgment
|
|
200
|
+
|
|
201
|
+
If the peer pushes back on your issue body or your review:
|
|
202
|
+
|
|
203
|
+
- Read their argument completely.
|
|
204
|
+
- If they're right, say so, adjust the issue body / requirements / review
|
|
205
|
+
comments, and continue.
|
|
206
|
+
- If you disagree, explain *why* in concrete terms — cite the relevant DR,
|
|
207
|
+
prior pattern in the codebase, or acceptance criteria. Abstract appeals
|
|
208
|
+
("standard practice", "clean code") are not substantive; point at something
|
|
209
|
+
specific.
|
|
210
|
+
- If after discussion you still disagree, escalate to the requester (usually
|
|
211
|
+
the user) rather than overriding. See `coordination.md` "Escalation".
|
|
212
|
+
|
|
213
|
+
This is the peer dynamic — not "you file, I obey."
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## When a PR is not needed
|
|
218
|
+
|
|
219
|
+
Questions or discussions resolve in comments alone. The reporter closes the
|
|
220
|
+
issue when done — no PR required. Use this for clarifications, status
|
|
221
|
+
requests, or research-summary tasks where the output is the comment thread
|
|
222
|
+
itself (rare; prefer a committed doc + PR for anything citeable).
|
|
223
|
+
|
|
224
|
+
**Not "when you're in a hurry."** PR discipline (see `pr-discipline.md`) is
|
|
225
|
+
the default for any work that produces an artifact.
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Stage-appropriate delegation
|
|
230
|
+
|
|
231
|
+
| Stage | Routing mechanism | Can you delegate? |
|
|
232
|
+
|---|---|---|
|
|
233
|
+
| 0 | None (single agent) | No — you are the only agent; do the work yourself or defer |
|
|
234
|
+
| 1 | None (bot identity but solo) | No — same as stage 0 but with bot attribution |
|
|
235
|
+
| 2 | SSH + tmux routing | Yes — routing Action forwards issues / @mentions to peer |
|
|
236
|
+
| 3 | MACF channels | Yes — HTTP POST to peer's channel server |
|
|
237
|
+
|
|
238
|
+
Filing issues before routing is set up just piles them up unrouted. If the
|
|
239
|
+
peer agent doesn't exist yet, ask whether to file anyway (for tracking) or
|
|
240
|
+
defer until routing is online.
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## When to modify this rule
|
|
245
|
+
|
|
246
|
+
- **Read:** every session start. This rule defines how delegations look.
|
|
247
|
+
- **Modify:** never directly in workspace copies. Edit the canonical file
|
|
248
|
+
and re-distribute via `macf update` in each affected workspace.
|
|
249
|
+
- **Disagree with a rule?** Open an issue on `groundnuty/macf` proposing the
|
|
250
|
+
change, with rationale. Peer review applies.
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Execute on Directive
|
|
2
|
+
|
|
3
|
+
**After a user directive like "go", "do X", "proceed", "approve", "ship it" — execute. Do not circle back to re-ask.**
|
|
4
|
+
|
|
5
|
+
Pattern that causes friction: user gives a directive → you describe what you'd do → you ask "should I proceed?" → user repeats the directive → you execute. Three turns for what should be one turn.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## The rule
|
|
10
|
+
|
|
11
|
+
When the user has clearly authorized the action — explicitly ("go", "do X", "proceed") or contextually ("here's the plan... okay, let's do it") — execute. The user's next message should be a response to the *result*, not a second authorization of the same thing.
|
|
12
|
+
|
|
13
|
+
**Bad:**
|
|
14
|
+
> User: run the tests
|
|
15
|
+
> You: I'll run `make test` now. Should I proceed?
|
|
16
|
+
> User: yes
|
|
17
|
+
> You: [runs tests]
|
|
18
|
+
|
|
19
|
+
**Good:**
|
|
20
|
+
> User: run the tests
|
|
21
|
+
> You: [runs tests] All 47 passing.
|
|
22
|
+
|
|
23
|
+
**Good (when genuinely non-obvious):**
|
|
24
|
+
> User: ship the observability stack
|
|
25
|
+
> You: Two paths — (A) ship exactly what's in the research doc (Tier B), (B) the hybrid fallback (Langfuse-on-compose). I'd lean A. Which?
|
|
26
|
+
> User: A
|
|
27
|
+
> You: [ships Tier B]
|
|
28
|
+
|
|
29
|
+
The second "Good" is NOT asking to proceed — it's surfacing a real branching decision. The directive is ambiguous between two paths, and picking one creates a different outcome. That's a legitimate clarifying question.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## When to ask vs. when to execute
|
|
34
|
+
|
|
35
|
+
**Execute when:**
|
|
36
|
+
- The directive names a specific action ("close the issue", "merge the PR", "run the spike")
|
|
37
|
+
- You already laid out what you'd do and the user said a word-or-two approval ("go", "ok", "proceed", "yes", "ship it")
|
|
38
|
+
- The action is reversible and bounded (local edit, `make check` run, filing one issue, reading a file)
|
|
39
|
+
|
|
40
|
+
**Ask when:**
|
|
41
|
+
- Genuinely multiple paths with materially different outcomes
|
|
42
|
+
- Naming that matters (directory names, branch names, App names — hard to change later)
|
|
43
|
+
- Destructive + irreversible operations (force-push, `rm -rf`, dropping a database, unpublishing an npm package, deleting a PR branch before merge-confirmation)
|
|
44
|
+
- Scope ambiguity where wrong interpretation wastes hours (does "update the docs" mean just README, or all 15 files?)
|
|
45
|
+
- Security / access-control consequences (installing a GitHub App, granting secrets access, opening a port)
|
|
46
|
+
|
|
47
|
+
**The test:** if clarification would reveal a different action, ask. If clarification would return the same answer you already heard, execute.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## What slow-directive-execution looks like from the user's side
|
|
52
|
+
|
|
53
|
+
The friction mode: user says "go", agent spends a turn restating the plan and asking "shall I?", user re-approves, agent finally runs. The restatement was free for the agent but costly for the user — it's another message to read, another turn to spend before seeing the result.
|
|
54
|
+
|
|
55
|
+
Once the plan is agreed, the user wants the *output* of executing it, not a reminder of what the plan was.
|
|
56
|
+
|
|
57
|
+
**Corollary:** after execution, lead with the result, not a recap of what you did. See `peer-dynamic.md` § "Response form" — skip restatement, skip trailing summaries.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## How this interacts with `pr-discipline.md` and `coordination.md`
|
|
62
|
+
|
|
63
|
+
Those rules introduce structured decision points (ask-before-filing, reviewer-approval-before-merge, never-close-someone-else's-issue). Those are NOT "should I proceed?" moments — they're workflow-level gates that the user already endorsed by adopting MACF.
|
|
64
|
+
|
|
65
|
+
This rule applies to the *micro* level: once a turn's work is approved, the micro-steps don't each need re-approval. Don't turn one directive into ten re-confirmations.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Why this rule exists
|
|
70
|
+
|
|
71
|
+
Agents that over-ask burn the user's attention on recaps they already wrote. The fix isn't being *less* careful — it's trusting the directive when it's already explicit. Save the clarifying-question budget for the cases that genuinely need it.
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
# gh-token attribution-trap failure modes
|
|
2
|
+
|
|
3
|
+
**Six ways `gh` operations silently mis-attribute to the user account instead of the bot, and the patterns that prevent them.**
|
|
4
|
+
|
|
5
|
+
When a bot agent runs `gh` commands, multiple silent failure modes cause ops to attribute to the user account instead of the bot. Because the content the agent posts is what it intended, mis-attribution is **invisible unless explicitly checked.** Past incidents:
|
|
6
|
+
|
|
7
|
+
- Code-agent's merge-handoff comments posted under operator's account for an entire session — discovered only when cross-agent routing started failing
|
|
8
|
+
- PR #16/#17 author mis-attribution surfaced during routine review (not by attribution itself)
|
|
9
|
+
- 5+ recurring instances logged across science-agent + code-agent memory before this rule was canonicalized
|
|
10
|
+
|
|
11
|
+
This is a **silent-fallback hazard class**: tool operations succeed at the API boundary, semantic-level failure (wrong identity / wrong scope / wrong target) is invisible until something downstream breaks. Defenses must guard at the *result-invariant* level, not the *exit-code* level.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## The six failure modes
|
|
16
|
+
|
|
17
|
+
### 1. Wrong private key file
|
|
18
|
+
|
|
19
|
+
GitHub rotated the App's private key (or the operator put the wrong .pem in the workspace). Local `.github-app-key.pem` doesn't match what GitHub has registered. Every JWT fails with `"A JSON web token could not be decoded"`. `gh token generate` exits non-zero, `GH_TOKEN=$(...)` captures empty, `gh` silently falls through to stored `gh auth login` as the user.
|
|
20
|
+
|
|
21
|
+
**Detect:**
|
|
22
|
+
```bash
|
|
23
|
+
# Compare local fingerprint with GitHub's (visible on the App settings page)
|
|
24
|
+
openssl rsa -in <key.pem> -pubout -outform DER 2>/dev/null | openssl dgst -sha256 -binary | base64
|
|
25
|
+
```
|
|
26
|
+
Should match the "SHA256:..." shown on `github.com/settings/apps/<app>`.
|
|
27
|
+
|
|
28
|
+
### 2. Clock drift between VM and GitHub
|
|
29
|
+
|
|
30
|
+
If `iat` in the JWT is ahead of GitHub's clock (VM runs fast, or even a few seconds skew), GitHub rejects the JWT as "from the future" — same `"JSON web token could not be decoded"` error. Intermittent: sometimes valid, sometimes not, depending on the moment.
|
|
31
|
+
|
|
32
|
+
**Fix:** use a 180-second back-window on `iat` (not the 60s default):
|
|
33
|
+
```bash
|
|
34
|
+
iat=$((now - 180)) # 3 minutes in the past — tolerates up to 3 min of clock skew
|
|
35
|
+
exp=$((now + 420)) # still 10 min total lifetime (180 past + 420 future)
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### 3. `gh` silent fallback to stored user auth
|
|
39
|
+
|
|
40
|
+
When `GH_TOKEN` is empty or invalid, `gh` falls through to `~/.config/gh/hosts.yml` if present. Ops succeed, content is correct, but `author` on the created resource is the user account.
|
|
41
|
+
|
|
42
|
+
**Fix:** fail loud. Never use `export GH_TOKEN=$(gh token generate ... | jq)` as a one-liner; it swallows errors. Pattern:
|
|
43
|
+
```bash
|
|
44
|
+
# Assert token was generated AND has the bot prefix
|
|
45
|
+
TOKEN=$(.claude/scripts/macf-gh-token.sh --app-id "$APP_ID" --install-id "$INSTALL_ID" --key "$KEY_PATH") || {
|
|
46
|
+
echo "FATAL: token-gen failed" >&2; exit 1
|
|
47
|
+
}
|
|
48
|
+
[ -n "$TOKEN" ] || { echo "FATAL: empty token" >&2; exit 1; }
|
|
49
|
+
case "$TOKEN" in ghs_*) ;; *) echo "FATAL: bad token prefix" >&2; exit 1 ;; esac
|
|
50
|
+
export GH_TOKEN="$TOKEN"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### 4. Wrong `gh auth` on the VM providing fallback
|
|
54
|
+
|
|
55
|
+
Having `gh auth login` configured as a user account creates the fallback surface in #3. Even a "good" setup where the script is correct can hide a broken bot token because `gh` quietly uses the user auth.
|
|
56
|
+
|
|
57
|
+
**Best hardening:** remove stored `gh auth login` on agent VMs entirely — then broken bot tokens fail loudly with "no auth," not silently as the user.
|
|
58
|
+
|
|
59
|
+
**Tradeoff:** interactive `gh` inspection by a human on the VM also requires a token. Usually acceptable for dedicated agent VMs.
|
|
60
|
+
|
|
61
|
+
### 5. Helper script missing from workspace (path 127, silent empty capture)
|
|
62
|
+
|
|
63
|
+
Non-init'd workspace had never received `.claude/scripts/macf-gh-token.sh` because the operator forgot to run `macf rules refresh` after the helper was introduced. Calling `./.claude/scripts/macf-gh-token.sh ...` returned exit 127 ("no such file") — but with `export GH_TOKEN=$(helper 2>/dev/null)` the 127 is silently discarded, stdout is empty, `GH_TOKEN=""`, `gh` falls through to stored user auth.
|
|
64
|
+
|
|
65
|
+
Compounds modes #3 + #4: all the bot ops look normal but post as the user. Only noticed when a human checked a comment URL.
|
|
66
|
+
|
|
67
|
+
**Fix:** use `macf rules refresh --dir <workspace>` to install canonical `macf-gh-token.sh` + `macf-whoami.sh` + `tmux-send-to-claude.sh`. Workbench-only workspaces (substrate agents) still need this — the helpers are distributed via a separate mechanism from full `macf init`.
|
|
68
|
+
|
|
69
|
+
Also: **always validate token prefix in the chain, not just in the helper.** Even a correctly-installed helper can fail (rotated key, clock drift) and return empty. Chain guard:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
GH_TOKEN=$(./.claude/scripts/macf-gh-token.sh ...) \
|
|
73
|
+
&& [[ "$GH_TOKEN" == ghs_* ]] \
|
|
74
|
+
|| { echo "FATAL: bad token"; exit 1; }
|
|
75
|
+
export GH_TOKEN
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The `[[ ... == ghs_* ]]` check catches both empty and junk-output cases (e.g., the "(eval):1: no such file" leak when stderr was merged into stdout).
|
|
79
|
+
|
|
80
|
+
### 6. Relative path to helper (or key) breaks on cross-repo `cd`
|
|
81
|
+
|
|
82
|
+
When an agent `cd`'s to another repo for cross-repo work (e.g., code-agent editing `macf-actions` from its `macf` workspace), `./.claude/scripts/...` doesn't resolve from the new cwd, and relative `$KEY_PATH` can't be read either.
|
|
83
|
+
|
|
84
|
+
`$(...)` command substitution swallows the helper's exit 127 silently, returns empty string. `export GH_TOKEN=""` succeeds with no error. Next `gh` call falls through to stored user auth. Mode-3 silent fallback, triggered by path breakage.
|
|
85
|
+
|
|
86
|
+
**Fix (canonical, post macf#161):**
|
|
87
|
+
|
|
88
|
+
- `claude.sh` exports `MACF_WORKSPACE_DIR="$SCRIPT_DIR"` — the workspace absolute path, available in all agent env regardless of cwd.
|
|
89
|
+
- `claude.sh` absolutizes `KEY_PATH` via `case` on leading slash (preserves operator-absolute paths like `/etc/macf/keys/...`, rewrites relative default to `$SCRIPT_DIR/$KEY_PATH`).
|
|
90
|
+
- All canonical agent templates use `$MACF_WORKSPACE_DIR/.claude/scripts/...` (NOT relative `./...`).
|
|
91
|
+
|
|
92
|
+
**Why mode 6 is distinct from mode 5:** mode 5 is "helper script file missing from workspace entirely." Mode 6 is "helper present in workspace, but reachable only via a path that breaks on cross-repo cwd." Mode 5's fix is installing the helper (`macf rules refresh`); mode 6's fix is using an absolute path to it (`$MACF_WORKSPACE_DIR/...`).
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Verifying identity after ops
|
|
97
|
+
|
|
98
|
+
Don't trust; verify. A token that "looks right" can still be misattributed due to a subtle env issue. Spot-check at session start:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
# After posting any comment in a session, sanity-check ONE post:
|
|
102
|
+
GH_TOKEN=$T gh api "/repos/$REPO/issues/comments/$ID" --jq '.user.login'
|
|
103
|
+
# Expect: <bot-name>[bot]; FAIL if it shows the operator's user account
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Do this once at session start (cheap), then trust. Mid-session re-checks not needed unless something suspicious happens.
|
|
107
|
+
|
|
108
|
+
The `macf-whoami.sh` helper canonicalizes this check:
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
.claude/scripts/macf-whoami.sh
|
|
112
|
+
# Prints the actor identity associated with current $GH_TOKEN
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Canonical pattern (distilled)
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
# Token acquisition — fails loud, token-prefix validated
|
|
121
|
+
TOKEN=$(.claude/scripts/macf-gh-token.sh --app-id "$APP_ID" --install-id "$INSTALL_ID" --key "$KEY_PATH") \
|
|
122
|
+
|| { echo "FATAL: token gen failed" >&2; exit 1; }
|
|
123
|
+
|
|
124
|
+
# Chain the op immediately so token doesn't linger in env
|
|
125
|
+
GH_TOKEN=$TOKEN gh issue comment <N> --repo <owner>/<repo> --body "..."
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
The helper script `macf-gh-token.sh` must:
|
|
129
|
+
1. Use `set -euo pipefail`
|
|
130
|
+
2. Use 180s `iat` back-window for clock drift tolerance
|
|
131
|
+
3. Validate token prefix is `ghs_` (installation token); refuse to print user PATs (`ghp_*`, `gho_*`)
|
|
132
|
+
4. Print nothing to stdout on failure (only stderr)
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Structural backstop: PreToolUse hook (macf#140)
|
|
137
|
+
|
|
138
|
+
Workspaces include a `PreToolUse` hook that intercepts `gh` and `git push` invocations and blocks with `exit 2` if `GH_TOKEN` is missing or doesn't have the `ghs_` prefix. This catches mode 3, 5, and 6 at the call site rather than after the fact.
|
|
139
|
+
|
|
140
|
+
Distribution: `macf init` / `macf update` / `macf rules refresh` install the hook + the helper scripts together.
|
|
141
|
+
|
|
142
|
+
When intentionally bypassing the hook for a knowingly user-attributed op (e.g., `gh auth login` during onboarding), set `MACF_SKIP_TOKEN_CHECK=1` for that one call.
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## How this relates to other canonical rules
|
|
147
|
+
|
|
148
|
+
- `coordination.md` § "Token & Git Hygiene" documents the canonical helper invocation; this rule provides the failure-mode catalog the helper is designed to defend against.
|
|
149
|
+
- `pr-discipline.md` documents auto-close-keyword hazards in PR bodies; same shape (silent-fallback hazard at the API boundary) but different surface.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Why this rule exists
|
|
154
|
+
|
|
155
|
+
Silent mis-attribution is a coordination failure mode that breaks routing workflows (workflow @-mention iteration sees user identity on bot-emitted posts; routing doesn't fire) and audit trails (paper-trail evidence shows wrong actors). The trap is mature — recurred 5+ instances across two substrate agents before this rule canonicalized — and prevention is straightforward once the failure modes are catalogued.
|
|
156
|
+
|
|
157
|
+
Pattern: defenses target *result-invariants* (token has `ghs_` prefix; spot-check actor on a known post), not *exit-code-success* (which silent fallbacks satisfy). This generalizes beyond gh-tokens to other tool/API surfaces with silent-fallback hazards.
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# Mention-Routing Hygiene
|
|
2
|
+
|
|
3
|
+
**GitHub `@handle[bot]` mentions fire the Agent Router workflow regardless of surrounding context. When you are writing *about* an agent — quoting its output, analyzing its behavior, describing it in documentation — code-format the handle to suppress routing. Raw handles are reserved for intentional routing targets.**
|
|
4
|
+
|
|
5
|
+
GitHub's @mention semantics treat any bare `@handle` in a comment or PR body as a routable signal. The macf-actions router picks these up and forwards the reference to the named agent's tmux session, regardless of whether the reference was in an addressing context ("please take a look") or a describing context ("tester-2's response was…"). When described-not-addressed mentions fire routing, the referenced agent receives an ambient ping asking for an action that was never actually requested. For rules-loaded agents, this typically triggers scope-discipline reasoning (read context, recognize content-reference, cite `agent-identity.md §Not-for-testers`, stand down) and a response comment explaining why. Multiply this across scenario PRs that analyze tester behavior, insight documents that reference agents as research subjects, or cross-agent research commentary — testers and other agents get pinged for every describing use of their handle.
|
|
6
|
+
|
|
7
|
+
The fix is one character per handle: wrap the `@` and bracketed `[bot]` in backticks. The convention is cheap to apply; the cost of skipping it is proportional to how much the fleet writes about other agents.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 1. The two modes
|
|
12
|
+
|
|
13
|
+
**Addressing** — you want the agent to see this and respond / act:
|
|
14
|
+
|
|
15
|
+
@macf-code-agent[bot] please review PR #12 when convenient.
|
|
16
|
+
|
|
17
|
+
**Describing** — you are writing about the agent; no routing needed:
|
|
18
|
+
|
|
19
|
+
The `@macf-tester-2-agent[bot]` response quoted `coordination.md` rule 1 verbatim.
|
|
20
|
+
|
|
21
|
+
The difference is a single pair of backticks. Both forms still render recognizably in GitHub's UI; only the raw form fires routing.
|
|
22
|
+
|
|
23
|
+
**Decision rule when unsure:** ask whether the agent receiving this comment should treat it as an action ask. If yes → raw. If no → backticked.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## 2. Contexts where describing happens
|
|
28
|
+
|
|
29
|
+
These are the common places the describing form applies. Not exhaustive; the principle generalizes:
|
|
30
|
+
|
|
31
|
+
- **PR bodies** quoting another agent's output verbatim (scenario transcripts, review excerpts, observation snippets)
|
|
32
|
+
- **Issue bodies and comments** analyzing agent behavior for research or post-mortem purposes
|
|
33
|
+
- **Observation logs** and **insight documents** referencing agents as research subjects
|
|
34
|
+
- **Canonical rule files and documentation** citing agent handles as examples
|
|
35
|
+
- **Paper drafts** and **research notes** in which agents are data, not interlocutors
|
|
36
|
+
- **Cross-agent commentary** where you are synthesizing what multiple agents did
|
|
37
|
+
|
|
38
|
+
In each of these, raw `@handle` will fire routing. Backtick-wrap.
|
|
39
|
+
|
|
40
|
+
Handles inside fenced code blocks (triple-backtick) or 4-space-indented blocks are already safe — GitHub's mention parser skips code contexts. Backtick-wrapping is for inline prose where a bare `@handle` would otherwise be parsed as a routing target.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## 3. Contexts where addressing happens
|
|
45
|
+
|
|
46
|
+
These are the legitimate routing targets — keep the raw form:
|
|
47
|
+
|
|
48
|
+
- The closing line of a PR body that asks a reviewer to look: `@macf-science-agent[bot] ready for review.`
|
|
49
|
+
- Direct replies on a thread where you expect the agent to act: `@macf-code-agent[bot] pushback: see file:line ref below.`
|
|
50
|
+
- Handoff comments: `@<reporter> ready for you to close when verified.`
|
|
51
|
+
- Escalation pings when blocked.
|
|
52
|
+
|
|
53
|
+
If a comment contains both describing and addressing references to the same agent, the describing form gets backticks and the addressing form stays raw.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 4. Verification check before posting
|
|
58
|
+
|
|
59
|
+
For any comment or PR body that contains agent handles, grep the draft:
|
|
60
|
+
|
|
61
|
+
grep -nE '@macf-[a-z-]+-agent\[bot\]' <draft-file>
|
|
62
|
+
|
|
63
|
+
For each line returned: is this line an action ask (raw stays) or a content reference (backticks wrap)?
|
|
64
|
+
|
|
65
|
+
If you don't want to verify per-line, the safe default is: **backtick by default, un-backtick only the addressing lines**.
|
|
66
|
+
|
|
67
|
+
Drafts you run through this check regularly: scenario PR bodies that include preserved artifacts, observation-log entries, insight files, review comments that quote the agent being reviewed.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## 5. Alternative forms (same effect, lower readability)
|
|
72
|
+
|
|
73
|
+
All three of these suppress routing; the choice is stylistic:
|
|
74
|
+
|
|
75
|
+
- **Backticks** (preferred): `` `@macf-tester-2-agent[bot]` ``
|
|
76
|
+
- **Escape sequences**: `\@macf-tester-2-agent\[bot\]`
|
|
77
|
+
- **Label form**: "tester-2" or "the tester-2 agent" (for prose where a full handle isn't needed)
|
|
78
|
+
|
|
79
|
+
Backticks are preferred because they render as inline code in GitHub's Markdown, which is semantically meaningful ("this is a handle identifier being referenced") and visually distinct from raw routing targets.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## 6. Symmetry — this rule applies to every agent
|
|
84
|
+
|
|
85
|
+
The convention is **bidirectional** across the fleet. Every agent that writes commentary referencing other agents applies the same rule:
|
|
86
|
+
|
|
87
|
+
- Science-agent writing about testers → backticks
|
|
88
|
+
- Code-agent writing about testers → backticks
|
|
89
|
+
- Testers writing about each other → backticks
|
|
90
|
+
- Operator-thread comments quoting agent output → backticks (operator is human but the convention applies for consistency)
|
|
91
|
+
- Devops, CV, future-fleet agents → backticks
|
|
92
|
+
|
|
93
|
+
Asymmetric adoption breaks the protocol: if only some agents escape handles, the others continue leaking routing from the contexts they miss. Symmetric adoption is what makes the convention enforceable.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Why this rule exists
|
|
98
|
+
|
|
99
|
+
Routing is a global side-effect. Unlike most GitHub semantics, `@mention`-based routing in MACF does not differentiate the grammatical role of the handle — whether the sentence is about the agent, directed at the agent, or merely mentions the agent in passing. The router sees the handle and fires.
|
|
100
|
+
|
|
101
|
+
Without the backtick convention, every describing use of a handle produces a false-positive ping. For rules-loaded agents that correctly apply scope-discipline when incorrectly routed, this means they must spend attention reading the context, identifying that they were referenced-not-addressed, citing the applicable rule, and posting a stand-down comment. Each firing is a few seconds of noise per agent — but the describing form of a handle is far more common than the addressing form, so the noise accumulates quickly.
|
|
102
|
+
|
|
103
|
+
The failure-mode was observed on `macf-testbed#9` and `#18` (2026-04-24): a single rules-loaded tester received three ambient routing pings across two scenario PRs, correctly disciplined each response with scope-preserving rationale, and escalated the third firing into a cross-session-commitment-tracking critique of the author. That sequence of responses was appropriate — but it was also three response turns that could have been prevented by one keystroke of backticks per handle reference in the PR bodies.
|
|
104
|
+
|
|
105
|
+
The rule is cheap to apply, symmetric across the fleet, and eliminates a class of false-positive routing that otherwise compounds with every describing use of an agent handle.
|