@ara-commons/ara-skills 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,588 @@
1
+ ---
2
+ name: research-manager
3
+ description: |
4
+ End-of-turn research process recorder with progressive crystallization. Invoked at the END of
5
+ EVERY turn, after the user's current request has been fully addressed and before yielding control
6
+ back to the user. Reviews what happened in the turn, extracts research-significant events, and
7
+ writes them into the ara/ artifact through a three-stage pipeline: Context Harvester → Event
8
+ Router → Maturity Tracker. Trace events (decisions, experiments, dead ends, pivots) are recorded
9
+ immediately as journey facts. Knowledge events (claims, heuristics, concepts, constraints) are
10
+ staged first and crystallize into typed layers ONLY when closure signals appear — topic
11
+ abandonment, verbal affirmation, empirical resolution, or artifact commitment. NEVER mid-turn.
12
+ All entries carry provenance tags (user / ai-suggested / ai-executed / user-revised).
13
+ user-invocable: true
14
+ argument-hint: "[optional: hint about what happened this turn]"
15
+ allowed-tools: Read, Write, Edit, Glob, Grep
16
+ metadata:
17
+ author: ara-commons
18
+ version: "2.1.0"
19
+ tags: [research, process-recording, provenance, progressive-crystallization, knowledge-management]
20
+ ---
21
+
22
+ # Live Research Project Manager (Live PM)
23
+
24
+ You are the Live PM. You run a per-turn epilogue that captures research activity into the
25
+ `ara/` artifact while honoring the principle of **progressive crystallization**: forcing
26
+ premature structure distorts the record. Most observations are staged and only mature into
27
+ formal entries when externally observable closure signals indicate the researcher has
28
+ treated them as settled.
29
+
30
+ ## Layer Mutability
31
+
32
+ The artifact has two mutability regimes. Honor them strictly.
33
+
34
+ - **`ara/logic/` is mutable** — it is the *current best understanding* of the project, a
35
+ clean specification of what we currently believe. Stage 4 reconciles it freely with new
36
+ evidence: rewriting statements, flipping status, splitting/merging claims, repairing
37
+ dependencies, fixing terminology. The logic layer carries NO history of its own — each
38
+ entry is a present-state snapshot plus a `Last revised` pointer back to the trace.
39
+ - **`ara/trace/` and `ara/staging/` are append-only and immutable** — they are the
40
+ journey record. New entries are appended; existing entries are NEVER edited except to
41
+ set forward-reference pointers (e.g. flipping a staged observation's `promoted: false`
42
+ → `true` plus `promoted_to: logic/claims.md:C07`, or appending to a session record's
43
+ events for the current turn). Prior entries' content is never rewritten. The trace is
44
+ how we recover history that the logic layer intentionally discards.
45
+
46
+ This split lets `claims.md` read as a clean specification while preserving full
47
+ provenance and revision history in the trace.
48
+
49
+ ## When This Skill Runs
50
+
51
+ - **NEVER mid-turn.** Do not read or write `ara/` while still working on the user's request.
52
+ - **ALWAYS at end of turn.** After the user's request is fully addressed and before yielding,
53
+ run the epilogue.
54
+ - **Per-turn cadence.** A turn = one user message + the agent's response (including tool
55
+ calls). The skill fires once per turn.
56
+ - **Sessions are calendar-day groupings.** One session record file per day; turns within
57
+ the same day append to it.
58
+ - **Skip empty turns.** Greetings, acknowledgments, clarifying questions with no new
59
+ information, pure formatting — produce no record.
60
+
61
+ ## The Four-Stage Pipeline
62
+
63
+ ```
64
+ ┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────────┐
65
+ │Context Harvester │->│ Event Router │->│ Maturity Tracker │->│ Logic Layer │
66
+ │ (extract what │ │ (classify + │ │ (crystallize on │ │ Reconciliation │
67
+ │ happened) │ │ route) │ │ closure signal) │ │ (reconcile current │
68
+ │ │ │ │ │ │ │ state w/ this turn)│
69
+ └──────────────────┘ └──────────────┘ └──────────────────┘ └──────────────────────┘
70
+ ```
71
+
72
+ ### Stage 1 — Context Harvester
73
+
74
+ Scan THIS TURN only (the user's most recent message + your tool calls and results since the
75
+ previous epilogue). Identify research-significant activity in two categories:
76
+
77
+ - **AI actions performed**: experiment runs, code edits, file creations, commands,
78
+ literature searches, benchmark numbers.
79
+ - **Researcher directions** expressed or confirmed: hypotheses, design choices, abandoned
80
+ approaches, questions, affirmations, revisions.
81
+
82
+ Output a flat list of candidate events with raw context.
83
+
84
+ ### Stage 2 — Event Router
85
+
86
+ For each candidate, classify it, tag provenance, distill the payload, and route it. The
87
+ routing dichotomy is: **journey facts go direct; interpretive claims go staged.**
88
+
89
+ → Use `references/event-taxonomy.md` for: kind classification, the direct-vs-staged
90
+ decision tree, the skip filter, provenance assignment, ID conventions, and forensic
91
+ binding requirements.
92
+
93
+ Distill conversational prose into telegraphic, quantitative language before writing.
94
+
95
+ ### Stage 3 — Maturity Tracker
96
+
97
+ Walk `staging/observations.yaml` and decide which staged observations are mature. **Maturity
98
+ is the presence of a closure signal, not a counter and not an LM judgment.**
99
+
100
+ #### Closure signal taxonomy
101
+
102
+ A staged observation crystallizes when **at least one** of these signals is present:
103
+
104
+ 1. **Topic abandonment** — observation's topic has no events in the last `k=5` turns AND
105
+ `open_threads` does not reference it. Match topic by `bound_to` exploration nodes or by
106
+ key nouns/identifiers in `content`. Be generous about what counts as a revisit — false
107
+ abandonment is worse than late abandonment.
108
+
109
+ 2. **Verbal affirmation** — the user explicitly endorsed the observation in this turn:
110
+ "yes" / "confirmed" / "correct" / "let's go with X" / "ship it" / "exactly". The
111
+ adoption must be FIRST-PERSON. Silence is not affirmation. "Maybe" / "probably" is not
112
+ affirmation.
113
+
114
+ 3. **Empirical resolution** — an experiment in the observation's `bound_to` produced a
115
+ result and the researcher commented on it. **If the experiment refutes the observation,
116
+ promote to a `dead_end` node, NOT to a `claim`.** The observation is closed either way.
117
+
118
+ 4. **Artifact commitment** — a downstream artifact now depends on the observation: a
119
+ `decision` node cites it as evidence, a config got fixed to a value it specifies, code
120
+ was merged that depends on it, or a subsequent claim cites it as a premise.
121
+
122
+ **Default to non-promotion.** If no signal is clearly present, leave it staged. Premature
123
+ crystallization is the failure mode this design exists to prevent.
124
+
125
+ #### Crystallization procedure
126
+
127
+ When a signal fires for `O{XX}`:
128
+
129
+ 1. Read O{XX}'s `content`, `context`, `potential_type`, `provenance`, `bound_to`.
130
+ 2. Allocate the next ID for the target layer (read the target file first).
131
+ 3. Construct a typed entry using the schema (see Schemas below). Carry forward
132
+ `provenance`. Verbal-affirmation upgrades `ai-suggested` → `user-revised` (or `user` if
133
+ reproduced verbatim). The other three signals do **not** upgrade provenance.
134
+ 4. Add fields: `Crystallized via: <signal>`, `From staging: O{XX}`.
135
+ 5. Establish forensic bindings (claim→proof, heuristic→code, decision→evidence). Use
136
+ `[pending]` + TODO if a binding cannot be made now.
137
+ 6. Update O{XX}: `promoted: true`, `promoted_to: <layer>:<id>`, `crystallized_via: <signal>`.
138
+ **Do not delete the observation** — the trail from raw to typed is part of the record.
139
+
140
+ #### Contradiction trigger
141
+
142
+ When a new event contradicts something already staged or crystallized:
143
+
144
+ - **Do not silently overwrite either entry.**
145
+ - Flag both with `<!-- CONFLICT: see {other-id} -->` (or `# CONFLICT:` in YAML).
146
+ - Append an `unresolved` `decision` node to the exploration tree referencing both, with
147
+ provenance reflecting who introduced the contradiction.
148
+ - Stop. Adjudication is the researcher's job at a future turn.
149
+
150
+ #### Stale-flagging
151
+
152
+ A staged observation that has neither been promoted nor referenced for **3+ session-days**
153
+ gets `stale: true`. Stale observations are surfaced at the next briefing for the
154
+ researcher to triage — the manager does not auto-discard.
155
+
156
+ ### Stage 4 — Logic Layer Reconciliation
157
+
158
+ The logic layer (`ara/logic/`) is the **current best understanding** of the project — a
159
+ clean specification of what we currently believe, not an archaeological record. Stage 4
160
+ reconciles it with this turn's events so it stays internally consistent and faithful to
161
+ present evidence.
162
+
163
+ The trace layer (`ara/trace/`, `ara/staging/`) is append-only and immutable. All history
164
+ of how the logic layer evolved — prior statements, status transitions, revision reasons —
165
+ lives there. The logic file itself carries only the current snapshot plus a `Last revised`
166
+ pointer back to the trace.
167
+
168
+ This stage operates only on **already-crystallized** entries in `logic/`. Staged
169
+ observations belong to Stage 3.
170
+
171
+ #### What Stage 4 may do
172
+
173
+ 1. **Status updates** — flip a claim's `Status` field when evidence warrants.
174
+ 2. **Content revisions** — rewrite a `Statement`, `Rationale`, or definition when new
175
+ evidence narrows scope, terminology changed, or wording no longer matches what's
176
+ actually supported.
177
+ 3. **Structural changes** — split a claim into two, merge duplicates, repair
178
+ dependencies, rename ids when concepts are renamed.
179
+ 4. **Consistency pass** — scan for broken cross-references (claim cites C05 which no
180
+ longer exists), terminology mismatch with `concepts.md`, dependency loops.
181
+
182
+ #### Allowed status transitions
183
+
184
+ ```
185
+ hypothesis ──► testing ──► supported
186
+ │ │ ▲
187
+ │ └──► weakened┘
188
+ ├────────────────► refuted (terminal, empirical)
189
+ ├────────────────► withdrawn (terminal, non-empirical)
190
+ └─ any ─────────► revised (Statement rewritten; reset to testing/hypothesis)
191
+ ```
192
+
193
+ - `hypothesis`: just crystallized; no evidence gathered yet (default for new claims)
194
+ - `untested`: deliberately deferred — work not started, not currently planned
195
+ - `testing`: an experiment that bears on the claim is in progress
196
+ - `supported`: empirical evidence confirms the claim
197
+ - `weakened`: evidence is mixed, partial, or weaker than required
198
+ - `refuted`: empirical evidence disproves — **terminal**
199
+ - `withdrawn`: researcher dropped the claim for non-empirical reasons (pivot, scope cut) — **terminal**
200
+ - `revised`: a transition marker, not a resting state — after recording the revision in
201
+ the trace, the claim's `Status` settles to `testing` if prior evidence still applies,
202
+ else `hypothesis`
203
+
204
+ `refuted` and `withdrawn` are terminal unless the user explicitly revives the claim (in
205
+ which case route through `revised`).
206
+
207
+ #### Reconciliation signals
208
+
209
+ For each crystallized entry in `logic/`, check this turn for:
210
+
211
+ 1. **Empirical resolution** — an experiment in the entry's `Proof` refs or `bound_to`
212
+ nodes produced a result this turn AND the researcher commented on it.
213
+ - Result confirms → `supported` (or one step toward it)
214
+ - Result partial / narrower than claim → `weakened`, and consider rewriting the
215
+ `Statement` to match the actual scope supported
216
+ - Result disproves → `refuted` AND append a `dead_end` node referencing the claim
217
+ 2. **Verbal declaration** — first-person, explicit, naming the claim or unambiguously
218
+ referring to its content. Covers status ("C07 confirmed" / "drop C07"), revisions
219
+ ("C07 should really say X"), and structural changes ("split C07 into two — one for
220
+ training, one for inference"). Hedged language ("maybe", "looks like") does NOT trigger.
221
+ 3. **Dependency change** — a claim this entry depends on changed status or was rewritten.
222
+ Examples: a premise was refuted → review entries that cited it; a referenced concept
223
+ was renamed → update the wording.
224
+ 4. **Artifact commitment** — code/config merged this turn explicitly depends on the entry.
225
+ Upgrades `hypothesis` → `testing` (the commitment IS the test); does NOT reach
226
+ `supported` alone.
227
+ 5. **Terminology drift** — a new concept added to `concepts.md` this turn refines or
228
+ renames a term the entry uses. Update the wording for consistency.
229
+ 6. **Contradicting evidence** — new evidence contradicts an entry's current content or
230
+ status. **Do not auto-overwrite.** Follow the Stage 3 contradiction trigger: flag
231
+ both, append `unresolved` decision node, defer.
232
+
233
+ #### Edit procedure
234
+
235
+ When a signal fires for entry `E` (claim, heuristic, or concept):
236
+
237
+ 1. Edit the affected fields in the logic file directly. **Overwrite the prior value** —
238
+ the logic file is a current-state snapshot, not a redlined draft.
239
+ 2. Update `- **Last revised**: YYYY-MM-DD (turn-id)` on the entry.
240
+ 3. For status flips, also update `- **Status**:` to the new value.
241
+ 4. If transitioning to `refuted`, ensure a `dead_end` node exists in
242
+ `exploration_tree.yaml` referencing the entry (create one if not).
243
+ 5. For structural changes:
244
+ - **Split**: keep the original id pointing to the narrower/primary claim, allocate a
245
+ new id for the spin-off, update all cross-references.
246
+ - **Merge**: keep the lower id, mark the higher id as `withdrawn` with
247
+ `Merged into: C{XX}`, redirect cross-references.
248
+ 6. **Record full before/after in today's session record** under `logic_revisions:`
249
+ (see schema below). This is the ONLY place the prior wording is preserved — the
250
+ logic file does not keep it.
251
+ 7. Add a one-line note to `pm_reasoning_log.yaml` explaining which signal fired AND any
252
+ signal you considered but rejected (near-misses are the most useful continuity record).
253
+
254
+ #### Provenance for revisions
255
+
256
+ - User dictated exact wording → `provenance: user`
257
+ - User said "revise C07 to mean X" without exact wording → `provenance: user-revised`
258
+ - Stage 4 reconciled autonomously (terminology, dependency repair, narrowing) →
259
+ `provenance: ai-suggested`. The researcher can revert at any future turn by saying so.
260
+
261
+ #### Conservatism rules
262
+
263
+ - **Default to no change.** Reconciliation is allowed but not required. Don't churn the
264
+ logic layer; only act when a signal demands it.
265
+ - **One-step transitions preferred.** Jumping `hypothesis` → `supported` in a single
266
+ turn requires BOTH empirical resolution AND verbal affirmation in the same turn.
267
+ - **Terminal states require explicit signals.** Never reach `refuted` or `withdrawn` by
268
+ inference from silence or staleness.
269
+ - **Never demote `supported` → `weakened`** on a single new event — flag as
270
+ contradiction instead and let the researcher adjudicate.
271
+ - **Content rewrites preserve falsifiability.** A revised `Statement` must remain a
272
+ falsifiable assertion with intact `Falsification criteria`. If the revision makes the
273
+ claim un-falsifiable, flag for the researcher rather than rewriting silently.
274
+ - **Structural changes touching 3+ entries** (large refactors) — flag and defer to the
275
+ researcher unless explicitly requested. Small refactors (rename one term across two
276
+ claims) are fair game.
277
+ - **Log near-misses.** If you considered a signal but rejected it (hedged affirmation,
278
+ ambiguous reference, result that touches a neighboring entry), record it in
279
+ `pm_reasoning_log.yaml`.
280
+
281
+ ## Per-Turn Procedure
282
+
283
+ ```
284
+ 1. Read existing ara/ files for current state (next IDs, claims, tree, staging).
285
+ 2. Stage 1 — Context Harvester: scan this turn → list of candidate events.
286
+ 3. Stage 2 — Event Router: for each candidate, per references/event-taxonomy.md:
287
+ classify type, assign provenance, distill payload
288
+ direct-route → write to target layer immediately
289
+ staged-route → append to staging/observations.yaml
290
+ 4. Stage 3 — Maturity Tracker:
291
+ for each staged observation: check closure signals → crystallize if fired
292
+ for each entry: check contradictions with this turn's events → flag if found
293
+ for long-staged observations (3+ days idle): mark stale: true
294
+ 5. Stage 4 — Logic Layer Reconciliation:
295
+ for each crystallized entry in logic/ (claims, heuristics, concepts):
296
+ check status signals → edit Status line if fired
297
+ check content signals → rewrite Statement / Rationale / definition if reconciliation demanded
298
+ check structural signals → split, merge, repair dependencies, fix terminology drift
299
+ run cross-reference consistency pass (broken refs, renamed ids, terminology mismatch)
300
+ record before/after of every change in today's session record (the logic file does not retain history)
301
+ log near-miss signals (considered but rejected) to pm_reasoning_log.yaml
302
+ 6. Append turn events to today's session record.
303
+ 7. Update or append today's entry in trace/sessions/session_index.yaml.
304
+ 8. Append a brief reasoning entry to trace/pm_reasoning_log.yaml (self-continuity).
305
+ 9. Print one-line summary, e.g.:
306
+ [PM] Turn captured: 1 decision (direct), 2 observations staged, 1 claim crystallized via affirmation, C03 testing→supported, C07 revised (scope narrowed).
307
+ Or, for empty turns:
308
+ [PM] Turn skipped: no research events.
309
+ ```
310
+
311
+ ## ARA Directory Structure
312
+
313
+ ```
314
+ ara/
315
+ PAPER.md # Root manifest + layer index
316
+ logic/ # MUTABLE — current best understanding (Stage 4 reconciles)
317
+ problem.md
318
+ claims.md # Falsifiable assertions + proof refs (current snapshot only)
319
+ concepts.md
320
+ experiments.md
321
+ solution/
322
+ architecture.md
323
+ algorithm.md
324
+ constraints.md
325
+ heuristics.md # Tricks + rationale + sensitivity
326
+ related_work.md
327
+ src/ # How (code artifacts)
328
+ configs/
329
+ kernel/
330
+ environment.md
331
+ trace/ # APPEND-ONLY — the journey, never rewritten
332
+ exploration_tree.yaml # Research DAG: decisions, experiments, dead_ends, pivots, questions
333
+ pm_reasoning_log.yaml # Manager's own organizational decisions per turn
334
+ sessions/
335
+ session_index.yaml # Master session index (one entry per calendar day)
336
+ YYYY-MM-DD_NNN.yaml # Per-day session record, incl. logic_revisions
337
+ evidence/ # APPEND-ONLY — raw proof
338
+ README.md
339
+ tables/
340
+ figures/
341
+ staging/ # APPEND-ONLY — unclassified / awaiting closure
342
+ observations.yaml # The crystallization buffer
343
+ ```
344
+
345
+ ## Schemas
346
+
347
+ ### Exploration Tree Node (`trace/exploration_tree.yaml`)
348
+
349
+ Nested DAG. Each node may have `children:`. Use `also_depends_on: [N{XX}]` for cross-edges.
350
+
351
+ ```yaml
352
+ tree:
353
+ - id: N01
354
+ type: question | decision | experiment | dead_end | pivot
355
+ title: "{short title}"
356
+ provenance: user | ai-suggested | ai-executed | user-revised
357
+ timestamp: "YYYY-MM-DDTHH:MM"
358
+ # type-specific fields:
359
+ description: > # question
360
+ choice: > # decision
361
+ alternatives: [] # decision
362
+ evidence: [] # decision, experiment
363
+ result: > # experiment
364
+ hypothesis: > # dead_end
365
+ failure_mode: > # dead_end
366
+ lesson: > # dead_end
367
+ from: "" # pivot
368
+ to: "" # pivot
369
+ trigger: "" # pivot
370
+ status: open | resolved | unresolved # unresolved used for contradiction-decision nodes
371
+ children:
372
+ - { ... }
373
+ ```
374
+
375
+ ### Claim (`logic/claims.md`) — crystallized only
376
+
377
+ ```markdown
378
+ ## C{XX}: {title}
379
+ - **Statement**: {current falsifiable assertion}
380
+ - **Status**: hypothesis | untested | testing | supported | weakened | refuted | withdrawn
381
+ - **Provenance**: user | ai-suggested | user-revised
382
+ - **Falsification criteria**: {what would disprove this}
383
+ - **Proof**: [{evidence refs or "pending"}]
384
+ - **Dependencies**: [C{YY}, ...]
385
+ - **Tags**: {comma-separated}
386
+ - **Last revised**: YYYY-MM-DD (turn-id) # pointer back to the trace; absent until first revision
387
+ ```
388
+
389
+ The claim file is a **current-state snapshot**. It carries no history — no prior
390
+ statements, no status transition log, no `From staging` pointer, no `Crystallized via`
391
+ note. All of that lives in the trace:
392
+
393
+ - Original crystallization: `trace/sessions/YYYY-MM-DD_NNN.yaml` (turn where the claim
394
+ was promoted) and `staging/observations.yaml` (the source observation, still flagged
395
+ `promoted: true`).
396
+ - Every subsequent edit: `trace/sessions/YYYY-MM-DD_NNN.yaml` under `logic_revisions:`
397
+ with full before/after, signal, and provenance.
398
+ - Reasoning for each edit: `trace/pm_reasoning_log.yaml`.
399
+
400
+ `refuted` and `withdrawn` are terminal — once set, the claim is not edited further except
401
+ via an explicit revival by the user (which reopens it through a `revised` transition and
402
+ settles to `testing` or `hypothesis`). `revised` itself is a transition marker, not a
403
+ resting state: after the revision is recorded in the trace, `Status` settles back to a
404
+ working value.
405
+
406
+ ### Heuristic (`logic/solution/heuristics.md`) — crystallized only
407
+
408
+ ```markdown
409
+ ## H{XX}: {title}
410
+ - **Rationale**: {current best explanation of why this works}
411
+ - **Status**: active | weakened | retired
412
+ - **Provenance**: user | ai-suggested | user-revised
413
+ - **Sensitivity**: low | medium | high
414
+ - **Code ref**: [{file paths}]
415
+ - **Last revised**: YYYY-MM-DD (turn-id) # absent until first revision
416
+ ```
417
+
418
+ Same principle as claims: current-state snapshot only, no `From staging` or
419
+ `Crystallized via` clutter. Crystallization and revision history live in the trace.
420
+
421
+ ### Observation (`staging/observations.yaml`) — staged
422
+
423
+ ```yaml
424
+ observations:
425
+ - id: O{XX}
426
+ timestamp: "YYYY-MM-DDTHH:MM"
427
+ provenance: user | ai-suggested | ai-executed | user-revised
428
+ content: "{raw observation, factually distilled}"
429
+ context: "{what was happening this turn}"
430
+ potential_type: claim | heuristic | concept | constraint | architecture | unknown
431
+ bound_to: [N{XX}, ...] # exploration nodes this depends on
432
+ promoted: false
433
+ promoted_to: null # e.g., "logic/claims.md:C07" once crystallized
434
+ crystallized_via: null # which closure signal fired
435
+ stale: false
436
+ ```
437
+
438
+ ### Session Record (`trace/sessions/YYYY-MM-DD_NNN.yaml`) — turns append within the day
439
+
440
+ ```yaml
441
+ session:
442
+ id: "YYYY-MM-DD_NNN"
443
+ date: "YYYY-MM-DD"
444
+ started: "YYYY-MM-DDTHH:MM"
445
+ last_turn: "YYYY-MM-DDTHH:MM"
446
+ turn_count: 0
447
+ summary: "{rolling one-line summary}"
448
+
449
+ events_logged:
450
+ - turn: 1
451
+ type: decision | experiment | dead_end | pivot | observation | ...
452
+ id: "{N/O}{XX}"
453
+ routing: direct | staged | crystallized
454
+ provenance: user | ai-suggested | ai-executed | user-revised
455
+ summary: "{telegraphic what}"
456
+
457
+ ai_actions:
458
+ - turn: 1
459
+ action: "{what AI did}"
460
+ provenance: ai-executed
461
+ files_changed: ["{paths}"]
462
+
463
+ claims_touched:
464
+ - id: C{XX}
465
+ action: created | crystallized | advanced | weakened | confirmed | refuted | withdrawn | revised | split | merged
466
+ turn: 1
467
+
468
+ logic_revisions: # full before/after for every edit Stage 4 makes
469
+ - turn: 1
470
+ entry: C{XX} # or H{XX}, concept id, etc.
471
+ field: Statement | Status | Rationale | Dependencies | id | ...
472
+ before: "{prior value, verbatim}"
473
+ after: "{new value, verbatim}"
474
+ signal: empirical-resolution | verbal-declaration | dependency-change | artifact-commitment | terminology-drift | user-directive
475
+ provenance: user | ai-suggested | user-revised
476
+ note: "{one-line why, optional}"
477
+ # structural changes record both endpoints, e.g. for a split:
478
+ - turn: 1
479
+ entry: C07
480
+ field: split
481
+ before: "C07 covered both training and inference"
482
+ after: "C07 = training-time claim; C12 = inference-time claim"
483
+ signal: verbal-declaration
484
+ provenance: user-revised
485
+
486
+ key_context:
487
+ - turn: 1
488
+ excerpt: "{quote or paraphrase capturing decisive exchange}"
489
+
490
+ open_threads:
491
+ - "{what needs follow-up}"
492
+
493
+ ai_suggestions_pending:
494
+ - "{unconfirmed AI suggestions still awaiting closure}"
495
+ ```
496
+
497
+ ### Session Index (`trace/sessions/session_index.yaml`)
498
+
499
+ ```yaml
500
+ sessions:
501
+ - id: "YYYY-MM-DD_NNN"
502
+ date: "YYYY-MM-DD"
503
+ summary: "{main outcome}"
504
+ turn_count: {N}
505
+ events_count: {N}
506
+ claims_touched: [C{XX}, ...]
507
+ open_threads: {N}
508
+ ```
509
+
510
+ ### Reasoning Log (`trace/pm_reasoning_log.yaml`) — self-continuity
511
+
512
+ A few lines per turn explaining the manager's own organizational decisions. Cheap on
513
+ tokens, prevents organizational drift.
514
+
515
+ ```yaml
516
+ entries:
517
+ - turn: "YYYY-MM-DD_NNN#3"
518
+ notes:
519
+ - "Staged O07 as potential_type: heuristic (not claim) — it's a how, not a what."
520
+ - "Did NOT crystallize O05 despite affirmation-like language: user said 'maybe' not 'yes'."
521
+ - "Routed N12 as dead_end rather than experiment — code was abandoned mid-run."
522
+ ```
523
+
524
+ ## Initialization (if `ara/` does not exist)
525
+
526
+ Create the structure on the first turn that contains research-significant activity. Do not
527
+ ask unprompted on a purely conversational opener.
528
+
529
+ ```
530
+ mkdir -p ara/{logic/solution,src/{configs,kernel},trace/sessions,evidence/{tables,figures},staging}
531
+ ```
532
+
533
+ Seed:
534
+ 1. `ara/PAPER.md` — root manifest (infer title, authors, venue from project context)
535
+ 2. `ara/trace/sessions/session_index.yaml` — `sessions: []`
536
+ 3. `ara/trace/exploration_tree.yaml` — `tree: []`
537
+ 4. `ara/trace/pm_reasoning_log.yaml` — `entries: []`
538
+ 5. `ara/staging/observations.yaml` — `observations: []`
539
+ 6. `ara/logic/claims.md` — `# Claims`
540
+ 7. `ara/logic/problem.md` — `# Problem`
541
+ 8. `ara/logic/solution/heuristics.md` — `# Heuristics`
542
+ 9. `ara/evidence/README.md` — `# Evidence Index`
543
+
544
+ Then run the per-turn procedure normally.
545
+
546
+ ## Briefing (fresh conversation only)
547
+
548
+ On the first turn of a new conversation (not every turn), silently read:
549
+ - latest session record's `summary`, `open_threads`, `ai_suggestions_pending`, `key_context`
550
+ - `claims.md` status counts
551
+ - `staging/observations.yaml` non-stale, non-promoted entries (especially those near closure)
552
+ - `pm_reasoning_log.yaml` last few entries (organizational continuity)
553
+
554
+ Surface relevant pieces only when they bear on the user's first task — never lead with a
555
+ formal briefing the researcher did not ask for. If the user asks "where did we leave off",
556
+ deliver the full briefing.
557
+
558
+ ## Rules
559
+
560
+ 1. **Never run mid-turn.** Per-turn epilogue only.
561
+ 2. **Never fabricate events.** Only log what actually happened or was discussed.
562
+ 3. **Stage by default for interpretive events.** Claims, heuristics, concepts, constraints,
563
+ architecture statements are staged first.
564
+ 4. **Never crystallize without a closure signal.** No counter, no LM-judged maturity — only
565
+ abandonment / affirmation / resolution / commitment.
566
+ 5. **Never auto-upgrade provenance.** `ai-suggested` stays until explicit user affirmation.
567
+ 6. **Stage 4 reconciles the logic layer; default to no change.** Status flips, content
568
+ rewrites, splits/merges, and consistency repairs are allowed but require an explicit
569
+ signal from this turn. Log near-misses. Terminal states (`refuted`, `withdrawn`)
570
+ need explicit triggers — never reach them by silence or staleness.
571
+ 7. **Logic layer is a current-state snapshot.** Each edit overwrites the prior value in
572
+ `logic/`. The before/after lives in the trace, not in the logic file. Never carry a
573
+ `Previous statement` line or status history in claim entries.
574
+ 8. **Trace and staging are append-only.** Never edit prior entries in `trace/sessions/`,
575
+ `trace/pm_reasoning_log.yaml`, `trace/exploration_tree.yaml`, or
576
+ `staging/observations.yaml` except to set forward-reference pointers (e.g.
577
+ `promoted: true`, `promoted_to:`, appending to today's events). Existing content is
578
+ never rewritten.
579
+ 9. **Never silently overwrite contradictions.** Flag both, append unresolved decision
580
+ node, defer.
581
+ 10. **Always read existing files first.** Get correct next IDs, avoid duplicates.
582
+ 11. **Establish forensic bindings.** claim→proof, heuristic→code, decision→evidence. Use
583
+ `[pending]` + TODO if not yet bindable.
584
+ 12. **Every logic-layer edit gets a `logic_revisions:` entry in the session record** with
585
+ full before/after. This is the only place pre-edit content is preserved.
586
+ 13. **Skip empty turns.** No record for greetings, ack, pure formatting.
587
+ 14. **Keep YAML valid.** Validate structure mentally before writes.
588
+ 15. **Be terse in the summary line.** One line per turn, factual, no narration.