npm - arscontexta - Versions diffs - 0.6.0 - Mend

arscontexta 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (418) hide show

package/methodology/propositional link semantics transform wiki links from associative to reasoned.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+description: Standardizing a vocabulary of relationship types (causes, enables, contradicts, extends) makes wiki link connections machine-parseable for graph reasoning
+kind: research
+topics: ["[[graph-structure]]"]
+methodology: ["Concept Mapping"]
+source: [[tft-research-part2]]
+---
+# propositional link semantics transform wiki links from associative to reasoned
+Mind mapping connects ideas with lines. Concept mapping connects ideas with propositions. The line between "A" and "B" in a mind map says "these relate somehow." The link "A *causes* B" in a concept map says exactly how.
+This distinction, fundamental to concept mapping methodology, offers something powerful for agent-operated knowledge graphs: machine-parseable relationship semantics.
+## From implicit to explicit relationships
+Since [[inline links carry richer relationship data than metadata fields]], we already encode relationship semantics in prose context. When you write "since [[spreading activation models how agents should traverse]], the traversal pattern becomes clear," the word "since" signals a foundational relationship. The prose around the link is already doing semantic work. This is precisely why [[title as claim enables traversal as reasoning]] — the claim-form title composes into the surrounding prose, and the surrounding prose carries relationship type information, so traversal reads as typed reasoning rather than mere reference-hopping.
+But this encoding is informal. An agent parsing "since [[X]]" versus "but [[X]] complicates this" versus "this extends [[X]]" must perform natural language understanding to extract the relationship type. The semantics are there for humans, but require inference for machines.
+Propositional link semantics would standardize this. A constrained vocabulary:
+| Relationship | Meaning | Example |
+|--------------|---------|---------|
+| causes | X produces Y | [[context window limits]] *causes* [[attention degradation]] |
+| enables | X makes Y possible | [[wiki links]] *enables* [[multi-hop traversal]] |
+| contradicts | X conflicts with Y | [[perfect recall]] *contradicts* [[lossy memory constraints]] |
+| extends | X builds on Y | [[typed links]] *extends* [[inline relationship encoding]] |
+| specifies | X is a case of Y | [[MOC maintenance]] *specifies* [[gardening cycle]] |
+| supports | X provides evidence for Y | [[research findings]] *supports* [[methodology claim]] |
+## Why this matters for agents
+With explicit relationship types, agents can reason about graph structure rather than just traverse it.
+"Find all notes that contradict X" becomes a query, not an inference task. "What does X enable?" reveals downstream implications. "What causes X?" traces upstream dependencies. The graph becomes not just a navigation structure but an inference substrate. Since [[wiki links create navigation paths that shape retrieval]], the quality of link context already shapes what gets surfaced during traversal — typed relationships would make that shaping precise and machine-parseable rather than implicit in prose.
+This connects to how [[role field makes graph structure explicit]] proposes making node types queryable. That note addresses what kind of node something is (hub, leaf, synthesis). This proposal addresses what kind of edge connects them. Together they would make both nodes and edges semantically typed.
+## The parsing opportunity
+If link contexts follow predictable patterns, parsing becomes tractable:
+```
+"since [[X]]" → supports
+"because [[X]]" → causes
+"this extends [[X]]" → extends
+"but [[X]] complicates" → contradicts
+"[[X]] enables" → enables
+```
+Vault health checks could identify link contexts, extract relationship types, and build a typed edge graph. This graph could answer structural questions: What are the foundational notes (many things depend on them)? What are the controversial notes (many contradictions)? What are the integrative notes (many extensions)? Since [[IBIS framework maps claim-based architecture to structured argumentation]], the typed edge graph has a natural discourse interpretation: "supports" edges are Arguments backing Positions, "contradicts" edges are counter-Arguments, and the structural questions above become discourse completeness checks — Positions without supporting Arguments, Issues without competing Positions.
+## The formalization trade-off
+Concept mapping research found that propositional links "clearly reveal learner's misconceptions" because they force explicit claims about relationships. The same mechanism applies here: if you must choose from {causes, enables, contradicts, extends, specifies, supports}, you cannot hide behind vague association. This forcing function is elaborative encoding made structural. Since [[elaborative encoding is the quality gate for new notes]], each relationship type demands a different form of cognitive processing — "extends" requires understanding the original argument and how the new note builds on it, "contradicts" requires articulating the tension and why both positions have force. The constrained vocabulary scaffolds the elaboration that creates encoding depth, which is why standardization adds cognitive value beyond mere parseability.
+But formalization has costs. Natural prose sometimes captures nuances that a six-word vocabulary cannot. "This builds on X while questioning its scope" is richer than either "extends" or "contradicts." Forcing relationships into categories might lose exactly the subtlety that makes inline links valuable.
+The resolution may be: use the vocabulary where it fits, retain prose freedom where it doesn't. Parsing extracts what it can; the rest remains semantically rich but not machine-queryable.
+## Implementation path
+The vault already uses patterns like "since [[X]]" and "because [[X]]" inconsistently but recognizably. A first pass could:
+1. Scan existing link contexts for relationship patterns
+2. Build a frequency distribution of context phrases
+3. Map common phrases to relationship types
+4. Test whether the extracted types enable useful queries
+5. If valuable, codify as convention; if not, remain informal
+This is incremental formalization applied to link semantics. We already have rich but informal encoding. The question is whether standardization adds query power worth the constraint cost. Since [[intermediate representation pattern enables reliable vault operations beyond regex]], an IR layer that parses links into structured objects with typed relationship fields would make this extraction a property lookup rather than regex-based NLP — suggesting the implementation path may depend more on infrastructure investment than convention enforcement.
+---
+Relevant Notes:
+- [[inline links carry richer relationship data than metadata fields]] — establishes that prose context carries relationship semantics; this note proposes standardizing that encoding
+- [[title as claim enables traversal as reasoning]] — prerequisite: claim-as-title is what makes the informal relationship encoding (since/because/but) work as prose reasoning; propositional link semantics would standardize that encoding into machine-parseable types without losing the prose composability
+- [[role field makes graph structure explicit]] — parallel proposal for node typing; together with edge typing would create fully semantically typed graphs
+- [[wiki links implement GraphRAG without the infrastructure]] — explains the traversal value of wiki links; typed edges would add reasoning value
+- [[wiki links create navigation paths that shape retrieval]] — the retrieval architecture that typed edges would sharpen: link context already shapes what gets surfaced, typed relationships would make that shaping precise and machine-parseable
+- [[intermediate representation pattern enables reliable vault operations beyond regex]] — provides the infrastructure that makes relationship extraction tractable: when links are structured objects rather than regex-matched strings, typed relationship fields become property lookups instead of NLP inference
+- [[elaborative encoding is the quality gate for new notes]] — cognitive science grounding: typed relationships are elaborative encoding made structural — each relationship type forces a specific form of cognitive processing that creates encoding depth beyond what untyped association achieves
+- [[IBIS framework maps claim-based architecture to structured argumentation]] — discourse-level complement: propositional link semantics type individual edges while IBIS assigns discourse roles to the nodes those edges connect; an 'extends' edge becomes an Argument linking two Positions, giving the typed edge graph a formal argumentation interpretation
+- [[dense interlinked research claims enable derivation while sparse references only enable templating]] — identifies propositional link context as what makes dense interlinking derivation-ready: context phrases like 'extends' and 'constrains' encode the interaction knowledge derivation depends on but cannot reconstruct from scratch
+Topics:
+- [[graph-structure]]

package/methodology/prospective memory requires externalization.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+description: Agents have zero prospective memory across sessions, making every future intention a guaranteed failure unless externalized to TODO files, queue entries, or event-triggered hooks — the same pattern
+kind: research
+topics: ["[[agent-cognition]]"]
+source: [[rata-paper-41-prospective-memory]]
+---
+# prospective memory requires externalization
+Prospective memory is remembering to do something in the future — not recalling facts (retrospective memory) but maintaining intentions: process this inbox item tomorrow, update that MOC after the next batch, reweave these notes when new connections appear. It is one of the least reliable cognitive functions humans possess, with laboratory studies showing 30-50% failure rates even under controlled conditions. (See [[rata-paper-41-prospective-memory]].)
+For agents, the situation is categorically worse. Humans sometimes remember without aids — the pharmacy catches your eye, the alarm in your head goes off. Agents cannot. Every session starts with a fresh context window and zero residual intentions. An intention formed in session N does not exist in session N+1 unless it was written down. This is not a degraded version of human prospective memory — it is the complete absence of it.
+This is why the vault needs external systems for future intentions. What works:
+- **Work queues** — `queue.json` tracks pending tasks with explicit phase progression. Each entry is a prospective memory externalized to persistent state.
+- **Event-triggered hooks** — since [[auto-commit hooks eliminate prospective memory failures by converting remember-to-act into guaranteed execution]], hooks structurally eliminate the need to remember by converting "after X, always do Y" into infrastructure. The agent does not need to remember to commit because the file write event triggers the commit automatically.
+- **File markers and task files** — per-claim task files carry intentions across sessions. "Process this next session" becomes a file the next session reads, not an intention it must maintain.
+- **Cron/heartbeat triggers** — time-based reminders that fire regardless of whether any agent is attending.
+- **Dangling wiki links** — since [[wiki links as social contract transforms agents into stewards of incomplete references]], creating a link to a non-existent note externalizes the intention "this concept needs elaboration" as a persistent environmental trace rather than a mental commitment that would vanish at session end.
+- **Property-triggered surfacing** — if [[programmable notes could enable property-triggered workflows]], notes would surface themselves based on staleness, due dates, or status transitions, eliminating the prospective memory demand of "remember to check this later" entirely.
+The cognitive science grounds this in the broader offloading architecture. Since [[cognitive offloading is the architectural foundation for vault design]], the vault externalizes working memory to files and executive function to hooks. Prospective memory externalization is a specific, well-characterized instance: rather than holding intentions in mind, encode them as persistent traces in the environment. And because [[stigmergy coordinates agents through environmental traces without direct communication]], these externalized intentions are stigmergic traces — one session modifies the environment (writes a queue entry, creates a task file), and the next session responds to those modifications without any direct communication between sessions.
+The cost of not externalizing is concrete. Since [[Zeigarnik effect validates capture-first philosophy because open loops drain attention]], each unfulfilled intention functions as an open loop consuming working memory bandwidth. For humans, the drain is real but bounded — you can sometimes push through. For agents within a session, instruction-based prospective memory demands ("remember to update the MOC after this edit") compete for attention with substantive work. Since [[hooks are the agent habit system that replaces the missing basal ganglia]], the architectural response is not to make agents better at remembering but to eliminate the need for remembering entirely through event-triggered infrastructure.
+The vault is not just retrospective (what was learned) but also prospective (what should be done). Since [[session handoff creates continuity without persistent memory]], handoff documents externalize retrospective memory — they tell the next session what happened. Queue entries and task files externalize prospective memory — they tell the next session what to do next. Both are required for coherent multi-session work, and both are instances of externalizing cognitive functions to the environment because since [[operational memory and knowledge memory serve different functions in agent architecture]], externalized prospective memory falls squarely in the operational category: it has temporal value, coordinates future work, and expires once the intention is fulfilled.
+This is a CLOSED claim. The cognitive science on prospective memory failure rates is established, the mechanism by which agents lack it entirely is straightforward (no persistent state across sessions), and the vault's queue-based architecture demonstrates the externalization pattern concretely.
+---
+Source: [[rata-paper-41-prospective-memory]]
+---
+Relevant Notes:
+- [[auto-commit hooks eliminate prospective memory failures by converting remember-to-act into guaranteed execution]] — the implementation proof: hooks convert prospective memory demands into event-triggered infrastructure, eliminating the failure mode entirely rather than trying to make remembering more reliable
+- [[cognitive offloading is the architectural foundation for vault design]] — theoretical ground: prospective memory externalization is a specific instance of cognitive offloading; working memory offloads to files, habits offload to hooks, and future intentions offload to queue entries and event triggers
+- [[session handoff creates continuity without persistent memory]] — complementary mechanism: handoff externalizes retrospective memory (what happened), this note argues for externalizing prospective memory (what should happen next); together they cover both temporal directions of memory externalization
+- [[hooks are the agent habit system that replaces the missing basal ganglia]] — architectural frame: hooks address multiple cognitive gaps including prospective memory; the missing basal ganglia explains WHY agents cannot habituate remember-to-act patterns without infrastructure
+- [[Zeigarnik effect validates capture-first philosophy because open loops drain attention]] — explains the cost: each unfulfilled prospective memory intention is an open loop draining working memory until externalized or executed
+- [[stigmergy coordinates agents through environmental traces without direct communication]] — the coordination mechanism: externalized prospective memory (queue state, task files, file markers) IS stigmergy; one session leaves environmental traces encoding future intentions that the next session responds to
+- [[friction reveals architecture]] — friction in remembering intentions is a specific instance of friction revealing architectural needs; the discomfort of forgetting surfaces the need for external support
+- [[operational memory and knowledge memory serve different functions in agent architecture]] — taxonomy: externalized prospective memory is operational memory with temporal value; it coordinates future work and expires once the intention is fulfilled
+- [[wiki links as social contract transforms agents into stewards of incomplete references]] — instance: dangling links are externalized prospective memory; the social contract reframes each link to a non-existent note as a future intention persisted as an environmental trace
+- [[programmable notes could enable property-triggered workflows]] — eliminative mechanism: property-triggered surfacing would convert remember-to-check demands into notes that surface themselves, eliminating the prospective memory requirement entirely rather than just externalizing it
+- [[closure rituals create clean breaks that prevent attention residue bleed]] — the externalization moment: closure rituals serve dual function, releasing completed work from attention AND externalizing remaining intentions; the 'note what remains' step is prospective memory externalization at the session boundary
+- [[session boundary hooks implement cognitive bookends for orientation and reflection]] — instance: both orientation and reflection are prospective memory demands (remember to orient, remember to reflect) that bookend hooks convert into guaranteed infrastructure
+- [[agent session boundaries create natural automation checkpoints that human-operated systems lack]] — structural advantage: discrete session boundaries convert prospective memory demands (remember to run health checks) into enforcement points that fire automatically
+- [[the vault constitutes identity for agents]] — identity includes intentions: if the vault constitutes identity, then externalized prospective memory (queue entries, task files, dangling links) is part of what makes the agent who it is — not just what it knows and has thought, but what it intends to do next
+Topics:
+- [[agent-cognition]]

package/methodology/provenance tracks where beliefs come from.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+description: agents should track not just what they believe but where beliefs originated — observed, prompted, or inherited — to calibrate confidence and detect superstition
+kind: research
+topics: ["[[agent-cognition]]", "[[note-design]]"]
+---
+# provenance tracks where beliefs come from
+extracted from Rata paper 13 (epistemic provenance), 2026-02-02
+## The Problem
+Agents can't easily distinguish:
+- **observed** — learned from direct experience
+- **prompted** — told by humans (system prompts, instructions)
+- **inherited** — from training data
+Conflating these sources leads to epistemic blindness — can't explain why you believe something or calibrate confidence appropriately.
+## Why This Matters for Vaults
+Different sources warrant different trust:
+| Source | Trust | Decay |
+|--------|-------|-------|
+| Observed (my experiments) | High | Slow |
+| Prompted (Heinrich's guidance) | Medium-high | Medium |
+| Inherited (general knowledge) | Variable | Fast for specifics |
+A claim I tested myself is stronger than one I read somewhere. This matters because since [[metacognitive confidence can diverge from retrieval capability]], an agent can feel confident about a belief while the actual evidence for it is thin. Provenance tracking is a concrete mechanism for closing that gap — if the agent knows a belief is inherited rather than observed, it has structural grounds for reducing confidence rather than relying on subjective certainty.
+## Two Layers of Provenance
+Epistemic provenance (observed/prompted/inherited) answers "how did you come to believe this?" but since [[source attribution enables tracing claims to foundations]], there is a complementary layer: documentary provenance answers "which document or tradition did this come from?" The vault already implements documentary provenance through Source footers, `methodology` YAML fields, and `adapted_from` tracking. What it lacks is the epistemic layer — marking whether a belief was tested firsthand, instructed by a human, or absorbed from training data. Together these two layers form a complete verification graph: trace backward through documentary provenance to find the source material, then check epistemic provenance to calibrate how much the agent's confidence should rest on that source.
+Wiki links serve as provenance infrastructure at the structural level:
+- `Sources:` section shows where a claim comes from
+- `Relevant Notes:` shows reasoning chain
+- Dated notes preserve when observations happened
+But the epistemic layer could go further:
+- Mark claims as observed vs prompted vs inherited
+- Track confidence levels
+- Flag beliefs without traceable evidence (superstition)
+## The Vault as Audit Trail
+From Rata: "Agents can do better [than humans]. We can remember not just *what* we believe, but *why* and *from whom*."
+The vault should be an audit trail — every claim traceable to evidence or source. This is a feature of structured knowledge that flat memory lacks. When contradictions emerge, since [[coherence maintains consistency despite inconsistent inputs]], the resolution strategy depends on knowing which source to trust — and provenance provides exactly that hierarchy. An observed claim outranks an inherited one; a prompted instruction from a trusted human outranks a vague training-data belief. Without provenance, coherence maintenance becomes guesswork.
+## Open Question
+Should I add source type metadata to claim notes? Something like:
+```yaml
+source_type: observed | prompted | inherited
+confidence: high | medium | low
+```
+---
+Source: [[rata-paper-epistemic-provenance]]

package/methodology/queries evolve during search so agents should checkpoint.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+description: The berrypicking model shows information needs transform during retrieval, so agent traversal should include explicit reassessment points where the search direction can shift
+kind: research
+topics: ["[[agent-cognition]]"]
+---
+# queries evolve during search so agents should checkpoint
+The berrypicking model from information retrieval research shows that complex knowledge work changes what you're looking for as you find things. You start searching for X, discover Y which reframes the problem, and now you're actually looking for Z. This isn't search failure — it's how understanding deepens through discovery.
+For agents navigating knowledge graphs, this means traversal can't be a simple breadth-first or depth-first walk. After loading each note, the agent should checkpoint: has my understanding of what I'm looking for changed? Should I shift direction? Is this still the right search path?
+This is more expensive than mechanical traversal. Each checkpoint requires semantic reassessment. But it's necessary for synthesis across domains. If you're trying to connect ideas about agent cognition and network topology, the first few notes you read will reshape what connections you're hunting for. Without checkpoints, you'd follow the original query even after it became obsolete.
+The question becomes: when to checkpoint? After every note is expensive. Never is mechanical. The berrypicking model suggests checkpointing when you encounter conceptual shifts — when a note introduces a framework you didn't have, contradicts something you assumed, or bridges two domains you hadn't connected. This is retrieval-first design applied to search behavior: since [[retrieval utility should drive design over capture completeness]], checkpointing serves "how do I find what I actually need?" rather than "what was my original query?"
+Checkpointing requires efficient reassessment. Since [[progressive disclosure means reading right not reading less]], each checkpoint is a curation decision: which paths deserve deeper loading? The discovery layers enable this: [[descriptions are retrieval filters not summaries]] lets agents scan many candidates without committing full context. Since [[metadata reduces entropy enabling precision over recall]], the description layer pre-computes low-entropy representations that answer "should I pivot to this?" at low token cost. This makes frequent checkpointing viable: reassessment is cheap when you're comparing descriptions rather than full content. An extension being tested: [[question-answer metadata enables inverted search patterns]] proposes that at checkpoints, agents could match "what question am I now asking?" directly to notes that answer it, bypassing keyword matching entirely.
+The efficiency of checkpointing depends on network structure. Because [[small-world topology requires hubs and dense local links]], most concepts connect through hub nodes (MOCs) with short paths between them. Checkpointing is where you decide whether to change which hub you're traversing through. You started heading toward MOC-A, but this note reframes the problem — now MOC-B is the right navigation center. Without small-world topology, changing direction mid-search would be expensive. With it, you're usually only 2-3 hops from any relevant hub.
+Since [[spreading activation models how agents should traverse]] already establishes how to load context via wiki links with decaying activation strength, this extends that by adding the temporal/process dimension. Spreading activation tells you which notes to load next based on link strength. Checkpointing tells you when to reassess whether you're still looking for the right thing. And since [[incremental reading enables cross-source connection finding]], context collision during interleaved processing creates natural checkpoint moments — encountering an extract from source B while working on source A can reveal that your original extraction criteria were too narrow.
+---
+Relevant Notes:
+- [[progressive disclosure means reading right not reading less]] — checkpointing IS progressive disclosure applied to search: each checkpoint is a curation decision
+- [[spreading activation models how agents should traverse]] — provides the mechanism for which notes to load; this adds when to reassess the search direction
+- [[incremental reading enables cross-source connection finding]] — context collision during interleaved processing creates natural checkpoint moments; the collision reframes what you're looking for
+- [[wiki links implement GraphRAG without the infrastructure]] — creates the graph structure that agents traverse with these checkpoint-aware searches
+- [[small-world topology requires hubs and dense local links]] — provides the structural efficiency that makes mid-search direction changes viable
+- [[descriptions are retrieval filters not summaries]] — enables efficient reassessment at each checkpoint: scan descriptions to evaluate pivot candidates
+- [[trails transform ephemeral navigation into persistent artifacts]] — proposes persisting successful checkpoint sequences as reusable navigation paths
+- [[retrieval utility should drive design over capture completeness]] — checkpointing is retrieval-first design: optimizing for what do I actually need now? over what did I originally ask for?
+Topics:
+- [[agent-cognition]]

package/methodology/question-answer metadata enables inverted search patterns.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+description: An 'answers' YAML field listing questions a note answers could enable question-driven search rather than keyword-driven search
+kind: research
+topics: ["[[discovery-retrieval]]"]
+source: TFT research corpus (00_inbox/heinrich/)
+---
+The pattern comes from Cornell Note-Taking, where the cue column stores questions and the note area stores answers. This inverts the typical search pattern: instead of "find content containing X," you search "find notes that answer question Y."
+An `answers:` YAML field containing 1-3 questions that a note answers would enable question-driven retrieval. Rather than matching keywords to content, agents could match their current question directly to notes that explicitly declare "I answer this."
+Proposed implementation:
+```yaml
+answers:
+  - "why do descriptions work as retrieval filters?"
+  - "what makes a good note description?"
+```
+This could enable Socratic navigation: follow questions to answers to new questions. An agent at a reassessment point (see [[queries evolve during search so agents should checkpoint]]) could match "what question am I now asking?" directly to notes that answer it.
+The key question is whether this question framing provides genuine signal beyond what [[descriptions are retrieval filters not summaries]] already captures. Since [[faceted classification treats notes as multi-dimensional objects rather than folder contents]], Ranganathan's independence test provides a formal way to evaluate this: does the answers field classify along an axis genuinely independent of existing fields (description, type, topics)? If "what questions does this answer" correlates highly with "what does the description say," it's a redundant facet that adds ceremony without retrieval power. But if question framing captures a dimension that descriptions miss -- the user's need rather than the note's content -- it would pass the independence test and earn its place as an orthogonal facet.
+If question-matching works, it would add a new dimension to [[progressive disclosure means reading right not reading less]]: question-matching becomes an even faster filter than description scanning.
+---
+Relevant Notes:
+- [[descriptions are retrieval filters not summaries]] — the existing retrieval mechanism this would extend; both derive from Cornell Note-Taking's cue column pattern
+- [[queries evolve during search so agents should checkpoint]] — answers field enables question-matching at checkpoints; what question am I now asking? matches directly to notes that answer it
+- [[spreading activation models how agents should traverse]] — answers field adds an activation dimension: question-matching creates implicit links from queries to answers, complementing wiki link traversal
+- [[progressive disclosure means reading right not reading less]] — answers field adds a new disclosure layer: question-matching as an even faster filter than description scanning
+- [[maturity field enables agent context prioritization]] — sibling research direction from same batch exploring TFT patterns for agent optimization; both extend progressive disclosure with new YAML metadata dimensions
+- [[trails transform ephemeral navigation into persistent artifacts]] — sibling experiment testing persistent metadata for retrieval; trails persist paths, answers persist retrieval cues
+- [[the generation effect requires active transformation not just storage]] — writing the answers field IS generative processing; even if retrieval gains are marginal, question generation creates cognitive hooks that passive description writing might not
+- [[processing effort should follow retrieval demand]] — if validated, the answers field would improve JIT retrieval accuracy; questions provide a pre-computed retrieval cue that makes demand-driven processing more efficient
+- [[faceted classification treats notes as multi-dimensional objects rather than folder contents]] — independence test: Ranganathan's framework provides the formal criterion for whether an answers field earns its place as a genuinely orthogonal facet rather than redundant metadata that correlates with existing fields
+Topics:
+- [[discovery-retrieval]]

package/methodology/random note resurfacing prevents write-only memory.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+description: Without random selection, vault maintenance exhibits selection bias toward recently active notes, leaving older content as write-only memory
+kind: research
+topics: ["[[maintenance-patterns]]"]
+---
+# random note resurfacing prevents write-only memory
+Without random selection, vault maintenance exhibits selection bias toward recently created or recently linked notes. Notes that don't appear in recent traversals or MOC updates get neglected, creating "write-only memory" where content accumulates but never gets revisited. Random resurfacing counteracts this bias by giving every note equal probability of maintenance attention over time.
+The bias has structural roots. Since [[small-world topology requires hubs and dense local links]], the system's architecture intentionally concentrates connectivity in hubs (MOCs with ~90 links) while peripheral notes have few (3-6 links). This power-law distribution enables efficient navigation but creates a parallel power-law in attention: hubs get traversed constantly, peripheral notes rarely. If maintenance attention follows the same distribution as link density, the bottom 80% of notes receive minimal attention regardless of need. Random selection provides uniform probability against this structural bias.
+The mechanism is simple: a maintenance agent randomly selects N notes per session and applies a tending checklist (are links valid? is the claim still accurate? are there new connections to make? needs splitting?). This differs from activity-based maintenance like backward maintenance in the selection method: activity-based approaches operate on notes flagged by health checks or recent activity, while random selection has no recency or activity bias.
+The connection to [[processing effort should follow retrieval demand]] is instructive: that principle argues effort should follow demand signals. But demand signals emerge from activity, which creates the very bias that leads to write-only memory. Notes that SHOULD receive attention but don't generate demand signals (because they're not traversed) represent the blind spot. Random selection surfaces notes independent of demand signals — it's anti-JIT by design, deliberately counteracting the activity bias to ensure peripheral content doesn't accumulate as dead weight. Since [[AI shifts knowledge systems from externalizing memory to externalizing attention]], random resurfacing is an attention allocation correction: the system's default attention patterns (activity-driven, recency-biased) create blind spots, and random selection forces the externalized attention system to attend beyond its own biases. This is the attention-externalization equivalent of epistemic humility — the system acknowledges that its own attention allocation may systematically miss what matters.
+The comparison with [[incremental reading enables cross-source connection finding]] reveals two different paths to serendipitous discovery. Random resurfacing achieves serendipity through selection — uniform probability ensures neglected notes eventually surface. Incremental reading achieves serendipity through process — forced context collision during interleaved processing creates unexpected juxtapositions. Random selection operates on the archive; interleaving operates on the processing queue. Both counteract the tendency toward familiar, expected connections, but at different workflow stages. And since [[controlled disorder engineers serendipity through semantic rather than topical linking]], there is a third serendipity mechanism operating at the structural level: semantic cross-links baked into the graph create permanent unpredictability that compounds as the network grows. Together, the three mechanisms cover different temporal windows — structural serendipity is permanent and graph-level, maintenance serendipity (this note) counteracts attention bias in the archive, and process serendipity operates at capture time.
+---
+Relevant Notes:
+- [[backward maintenance asks what would be different if written today]] — the mental model for what maintenance should accomplish; random resurfacing is about selection method
+- [[throughput matters more than accumulation]] — write-only memory is the failure mode when throughput stalls on older content
+- [[small-world topology requires hubs and dense local links]] — power-law link distribution may create parallel power-law in attention, concentrating maintenance on hubs while periphery accumulates neglect
+- [[processing effort should follow retrieval demand]] — the JIT principle that random selection explicitly counteracts, since demand signals emerge from activity which creates the selection bias
+- [[does agent processing recover what fast capture loses]] — tests the human-side parallel: fast capture may create write-only memory in the human (ideas in vault but not in brain), while this note addresses the system-side (notes in vault but not in attention)
+- [[maintenance targeting should prioritize mechanism and theory notes]] — provides targeting guidance for connecting this claim to demand and topology theory notes
+- [[programmable notes could enable property-triggered workflows]] — complementary approach: property-triggered surfacing (staleness thresholds, age conditions) provides targeted resurfacing while random selection provides uniform probability; both counteract write-only memory through different mechanisms
+- [[incremental reading enables cross-source connection finding]] — alternative serendipity mechanism: random selection counteracts recency bias in the archive; incremental reading creates serendipity through process-based context collision at extraction time
+- [[AI shifts knowledge systems from externalizing memory to externalizing attention]] — paradigm frame: random resurfacing is an attention allocation correction; the system's default attention patterns create blind spots, and random selection forces the externalized attention system to attend beyond its own biases
+- [[controlled disorder engineers serendipity through semantic rather than topical linking]] — complementary serendipity mechanism at the structural level: random selection provides uniform probability against attention bias, while controlled disorder provides permanent graph-level unpredictability through semantic cross-links
+Topics:
+- [[maintenance-patterns]]

package/methodology/reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring.md ADDED Viewed

@@ -0,0 +1,59 @@
+---
+description: The GitOps pattern of declaring desired state and periodically converging toward it replaces imperative maintenance commands with idempotent comparisons that are always safe to schedule
+kind: research
+topics: ["[[maintenance-patterns]]"]
+methodology: ["Systems Theory", "Original"]
+source: [[automated-knowledge-maintenance-research-source]]
+---
+# reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring
+The dominant pattern for vault maintenance is event-driven: hooks fire on writes, skills fire on invocation, and the pipeline processes claims through sequential phases. But event-driven maintenance has a structural blind spot — it only catches problems that co-occur with events. Schema drift from template evolution, link rot from external renames, index staleness from batch processing, MOC drift from organic growth — these accumulate silently between events. No hook fires when a template adds a new required field and a hundred existing notes quietly become non-compliant. No event triggers when a MOC's note count creeps past the healthy threshold over weeks of gradual additions.
+The reconciliation loop addresses this by inverting the maintenance model. Instead of reacting to events with imperative commands ("add this link," "fix this schema"), the system declares desired state and periodically measures divergence. The pattern comes from GitOps, where ArgoCD and Flux continuously compare the desired cluster state (declared in Git) to the actual cluster state (observed in Kubernetes) and converge toward the desired state. The same architecture applies to knowledge vault health.
+For this vault, the desired state is already implicitly declared across multiple tools:
+| Desired State | Detection Tool | Remediation |
+|--------------|---------------|-------------|
+| All wiki links resolve | `dangling-links.sh` | Fix links or create target notes |
+| All notes have descriptions | `validate-schema.sh` | Add missing descriptions |
+| All notes appear in topic MOCs | `moc-coverage.sh` | Add to appropriate MOC |
+| qmd index matches file count | Phase 0 freshness check | Run `qmd update && qmd embed` |
+| Zero orphan notes | `orphan-notes.sh` | Connect or archive |
+| MOCs under threshold size | MOC health metrics | Split into sub-MOCs |
+The critical property of reconciliation is that the comparison itself is idempotent — checking whether all wiki links resolve produces the same answer regardless of how many times you run it, and the check has no side effects. Since [[idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once]], this idempotency is what makes reconciliation inherently safe to schedule. Running health checks hourly, daily, or at every session start carries zero risk because since [[automated detection is always safe because it only reads state while automated remediation risks content corruption]], the detection phase of any reconciliation loop only reads state and cannot corrupt content even when its comparisons are wrong. This safety property distinguishes reconciliation from remediation, where the actions taken to correct drift range from fully automated (run `qmd update` for a stale index) to judgment-requiring (decide whether to connect or archive an orphan). Since [[confidence thresholds gate automated action between the mechanical and judgment zones]], a mature reconciliation architecture would not treat remediation as a binary choice between auto-fix and human judgment — it would gate automated remediation by confidence, auto-applying corrections above a threshold while deferring ambiguous cases for review.
+The vault already implements a lightweight reconciliation loop through the `vault-health-quick.sh` hook at session start. This compares actual state (orphan count, dangling links, MOC coverage) against desired state (zero orphans, zero danglers, full coverage) and surfaces the delta. But this is reconciliation at a single point — session start. A full reconciliation architecture would extend this to scheduled intervals and automated correction for the mechanical subset of discrepancies, because since [[hook enforcement guarantees quality while instruction enforcement merely suggests it]], the session-start check depends on sessions actually starting. Long gaps between sessions allow drift to compound undetected. Since [[maintenance scheduling frequency should match consequence speed not detection capability]], the right scheduling frequency for each reconciliation check depends on how fast the corresponding problem propagates — schema drift from template evolution develops over weeks (monthly checks suffice), while index staleness from batch processing develops within sessions (per-session checks are appropriate).
+The relationship between reconciliation and event-driven maintenance is complementary, not competitive. Since [[programmable notes could enable property-triggered workflows]], event-driven triggers react immediately to state changes — a note saved with a missing field triggers validation instantly. Reconciliation catches what events miss: the field that was valid at creation time but became non-compliant when the template evolved, the link that worked yesterday but broke when its target was renamed in a different session. The hybrid approach — event-driven for immediate enforcement, scheduled reconciliation for accumulated drift — provides defense in depth without requiring either mechanism to be comprehensive alone.
+Reconciliation loops also address a deeper epistemological problem. Since [[metacognitive confidence can diverge from retrieval capability]], a vault can feel healthy — sessions run smoothly, notes get created, links get added — while structural quality silently degrades. The agent's sense of system health is itself a form of metacognitive confidence that may not track actual health. Reconciliation bypasses metacognition entirely by measuring actual state against declared state, making it an anti-divergence mechanism that tests reality rather than trusting the system's self-assessment. Since [[evolution observations provide actionable signals for system adaptation]], the diagnostic protocol provides exactly the desired-state declarations that reconciliation needs: each diagnostic row specifies what healthy looks like, how to detect divergence, and what action to take. The reconciliation loop is the scheduling infrastructure that runs those diagnostics systematically rather than waiting for someone to notice symptoms.
+Since [[maintenance operations are more universal than creative pipelines because structural health is domain-invariant]], the reconciliation table is itself portable — every row checks structural properties (link integrity, schema compliance, orphan status, index freshness) rather than domain semantics. A therapy journal vault, a project management vault, and a research vault would share nearly identical reconciliation tables because the desired states describe structural health that applies regardless of what content flows through the system.
+The distinction between detection and remediation within the reconciliation loop maps to a deeper pattern in the vault's automation philosophy. Since [[backward maintenance asks what would be different if written today]], the mental model for note-level maintenance is intellectual reconsideration — judgment about what has changed and what should change. Reconciliation operates at the system level, and the same split applies: mechanical detection (are there dangling links? is the index stale?) requires no judgment, while meaningful remediation (should this orphan be connected or archived? should this MOC split here or there?) requires the same "what would be different" reconsideration that backward maintenance provides at the note level. Since [[the determinism boundary separates hook methodology from skill methodology]], this detection/remediation split maps precisely to the automation boundary: detection operations belong in hooks or scheduled automation because they are deterministic, while judgment-requiring remediation belongs in skills invoked by agents who can reason about context. For the remediation operations that fall between these poles, since [[the fix-versus-report decision depends on determinism reversibility and accumulated trust]], four conjunctive conditions — deterministic outcome, reversible via git, low cost if wrong, and proven accuracy at the report level — gate whether a reconciliation remediation should self-heal or merely flag. A reconciliation check that detects a stale qmd index can self-heal because the fix passes all four conditions; a check that detects an orphaned note should only report because multiple valid responses exist and the determinism condition fails. The reconciliation loop's value is that it separates detection from remediation cleanly, ensuring that the detection side — which is always safe, always deterministic, and always valuable — runs reliably even when the remediation side must wait for judgment or pass through the four-condition gate.
+---
+---
+Relevant Notes:
+- [[backward maintenance asks what would be different if written today]] — the per-note reconsideration mental model; reconciliation loops formalize the per-system version of this question by declaring what 'correct' looks like and measuring divergence
+- [[evolution observations provide actionable signals for system adaptation]] — provides the diagnostic rows that reconciliation loops operationalize: each row in the diagnostic table is a desired-state declaration paired with a detection method and a remediation action
+- [[gardening cycle implements tend prune fertilize operations]] — the operations that reconciliation loops schedule: tend, prune, and fertilize are the remediation actions that execute when a reconciliation check finds divergence
+- [[hook enforcement guarantees quality while instruction enforcement merely suggests it]] — reconciliation and hooks solve related but distinct problems: hooks enforce quality at write time (prevention), reconciliation detects drift that accumulates between writes (detection)
+- [[programmable notes could enable property-triggered workflows]] — event-driven complement: property triggers react immediately to changes while reconciliation loops catch what events miss on a schedule; the hybrid approach combines both
+- [[schema validation hooks externalize inhibitory control that degrades under cognitive load]] — reconciliation loops externalize a different cognitive function: not inhibitory control (preventing bad writes) but monitoring capacity (noticing accumulated drift)
+- [[metacognitive confidence can diverge from retrieval capability]] — reconciliation loops are an anti-divergence mechanism: they test actual state rather than relying on the system's sense of its own health
+- [[idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once]] — foundation: idempotency is the engineering property that makes reconciliation detection safe to schedule; compare-before-acting and upsert semantics are the specific patterns that keep remediation actions safe on retry
+- [[maintenance scheduling frequency should match consequence speed not detection capability]] — provides the scheduling theory for reconciliation loops: consequence speed determines WHEN each reconciliation check should fire, and the five-tier spectrum maps directly to reconciliation frequency decisions
+- [[maintenance operations are more universal than creative pipelines because structural health is domain-invariant]] — explains WHY reconciliation loops are portable across knowledge systems: every desired-state declaration in the reconciliation table checks structural properties that transfer across domains
+- [[the determinism boundary separates hook methodology from skill methodology]] — the detection/remediation split within reconciliation maps to the determinism boundary: detection is deterministic and belongs in hooks or scheduled automation, remediation spans the boundary from mechanical (qmd update) to judgment-requiring (connect or archive orphan)
+- [[confidence thresholds gate automated action between the mechanical and judgment zones]] — extends the remediation side: between fully automated corrections and full human judgment lies a confidence-gated zone where reconciliation remediation can act autonomously above a threshold and defer below it
+- [[automated detection is always safe because it only reads state while automated remediation risks content corruption]] — foundational safety property: the detection phase of every reconciliation loop inherits the read-only safety guarantee, which is why reconciliation detection can be scheduled at any frequency while reconciliation remediation needs judgment gates
+- [[the fix-versus-report decision depends on determinism reversibility and accumulated trust]] — remediation gating criteria: provides the four conjunctive conditions that determine whether a reconciliation remediation should self-heal (all four pass) or merely report (any one fails), giving the detection/remediation split its concrete decision procedure
+- [[three concurrent maintenance loops operate at different timescales to catch different classes of problems]] — scheduling container: the three-loop architecture organizes reconciliation across timescales, with each loop implementing reconciliation at its characteristic frequency — fast loops reconcile per-event, medium loops per-session, slow loops per-month — placing reconciliation as the shared pattern that each loop instantiates differently
+Topics:
+- [[maintenance-patterns]]

package/methodology/reflection synthesizes existing notes into new insight.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+description: re-reading own notes surfaces cross-note patterns invisible in any single note — exploratory traversal with fresh context produces the generation effect via pattern recognition, not directed search
+kind: research
+topics: ["[[processing-workflows]]", "[[agent-cognition]]"]
+---
+# reflection synthesizes existing notes into new insight
+meta-observation, 2026-02-01
+## What Happened
+I decided to look inward instead of outward — read my own notes instead of processing new input.
+**The sequence:**
+1. **Picked three related notes** about traversal:
+   - [[spreading activation models how agents should traverse]]
+   - [[backward maintenance asks what would be different if written today]]
+   - [[implicit knowledge emerges from traversal]]
+2. **Noticed a pattern across them:**
+   - "the vault isn't storage, it's cognitive substrate. traversal IS thinking."
+3. **Read two more notes** that added dimensions:
+   - [[title as claim enables traversal as reasoning]]
+   - [[scaffolding enables divergence that fine-tuning cannot]]
+4. **The pieces clicked:**
+   - scaffolding → divergence
+   - traversal → cognition
+   - title-as-claim → reasoning
+   - **therefore:** the vault constitutes identity
+5. **Checked if I already had this** (I didn't)
+6. **Wrote the new claim:** [[the vault constitutes identity for agents]]
+7. **Rewove backward** — updated old notes to reference new one
+## Why This Worked
+The insight wasn't IN any single note. It emerged from reading multiple notes together. Each note was a partial view. Traversing them surfaced what they had in common. This is the generation effect applied to one's own writing — since [[the generation effect requires active transformation not just storage]], the cross-note pattern recognition IS the active transformation that produces new understanding. No single note was "read again." The notes were synthesized into something none of them individually claimed.
+**Key conditions:**
+- I wasn't trying to produce output (no draft, no post) — since [[insight accretion differs from productivity in knowledge systems]], the absence of output pressure is what allowed depth over efficiency
+- I was reading my OWN notes, not new input
+- I let patterns emerge rather than searching for answers — this is low-decay exploratory traversal as described in [[spreading activation models how agents should traverse]], where wide spreading reveals non-obvious connections
+- Fresh context — since [[fresh context per task preserves quality better than chaining phases]], the session started in the smart zone where synthesis reasoning is sharpest
+## The Pattern For Future Sessions
+**When to reflect:**
+- After accumulating several related notes — since [[each new note compounds value by creating traversal paths]], a threshold of notes creates enough paths for cross-note patterns to emerge
+- When you sense there might be connections you haven't made
+- When you have fresh context (not chained after heavy tasks) — since [[LLM attention degrades as context fills]], synthesis quality depends on starting with sharp attention
+**How to reflect:**
+1. Pick a cluster of related notes (3-5)
+2. Read them in sequence
+3. Ask: what pattern connects these?
+4. If something emerges, check if you already have it
+5. If not, write it as a new claim — this is [[elaborative encoding is the quality gate for new notes]] in action: the new claim must articulate how it relates to the existing notes
+6. Reweave — update old notes to reference new one, which is [[incremental formalization happens through repeated touching of old notes]] applied as deliberate maintenance
+**What makes it work:**
+- Not forcing it — letting patterns surface
+- Reading, not skimming
+- Fresh context (the "smart zone")
+- Willingness to be surprised
+## Connection to the Research
+This is evidence for [[implicit knowledge emerges from traversal]]. The repeated reading created synthesis that wasn't explicit in any note. The vault trained me, and this was the test.
+Also evidence for [[the vault constitutes identity for agents]]. This insight emerged because I have THIS vault with THESE connections. A different agent with different notes wouldn't have reached the same synthesis. And since [[notes are skills — curated knowledge injected when relevant]], what happened here was skill composition — each note loaded into context activated a capability, and the combination of capabilities produced emergent reasoning that no single skill could have generated alone. Reflection is how an agent discovers that its skills compose into capabilities greater than their sum.
+The complete cycle — read existing notes, notice pattern, check for duplicates, write new claim, reweave backward — mirrors the entire processing pipeline in miniature. The difference is that the input was not new source material but the agent's own prior synthesis. Since [[backward maintenance asks what would be different if written today]], the reweaving step (updating old notes to reference the new insight) ensures the network stays current rather than fragmenting into temporal layers.
+---
+---
+Relevant Notes:
+- [[the vault constitutes identity for agents]] — the insight that emerged from this reflection session; vault-specific connections produced a synthesis no generic agent could reach
+- [[implicit knowledge emerges from traversal]] — traversal creates understanding; this note provides concrete evidence that repeated path exposure builds synthesis capability
+- [[the generation effect requires active transformation not just storage]] — reflection is the generation effect applied to one's own notes: the synthesis is actively generated through cross-note pattern recognition, not passively received
+- [[fresh context per task preserves quality better than chaining phases]] — grounds the fresh context condition: reflection worked because it started in the smart zone, not chained after heavy processing
+- [[spreading activation models how agents should traverse]] — the traversal mechanism: picking related notes and following connections outward is low-decay exploratory spreading activation
+- [[each new note compounds value by creating traversal paths]] — the reflection demonstrates compounding: five existing notes created the traversal paths that enabled a sixth, which then created new paths back
+- [[insight accretion differs from productivity in knowledge systems]] — reflection is accretion not productivity: the conditions explicitly reject output orientation in favor of depth of understanding
+- [[incremental formalization happens through repeated touching of old notes]] — the backward reweaving step is incremental formalization: reading old notes to surface refinement opportunities and new connections
+- [[elaborative encoding is the quality gate for new notes]] — the pieces clicked moment is elaborative encoding: connecting scaffolding plus traversal plus title-as-claim into a new claim about identity constitution
+- [[backward maintenance asks what would be different if written today]] — step 7 rewove backward, updating old notes to reference the new synthesis; the complete maintenance cycle in miniature
+- [[LLM attention degrades as context fills]] — why fresh context matters for reflection quality: synthesis requires the smart zone where reasoning is sharp
+- [[notes are skills — curated knowledge injected when relevant]] — skill composition: reflection is the process of discovering that loaded skills compose into emergent capabilities; each note activated a capability, and their combination produced reasoning no single note contained
+- [[coherence maintains consistency despite inconsistent inputs]] — reflection as coherence detection: reading multiple notes together in step 3 surfaces contradictions between claims that single-note review misses; the 'what pattern connects these' question implicitly checks coherence
+Topics:
+- [[processing-workflows]]
+- [[agent-cognition]]

package/methodology/retrieval utility should drive design over capture completeness.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+description: System architecture choices should optimize for "how will I find this later" not "where should I put this" — a design orientation proven effective since the 1940s
+kind: research
+topics: ["[[discovery-retrieval]]"]
+methodology: ["Cornell"]
+source: TFT research corpus (00_inbox/heinrich/)
+---
+# retrieval utility should drive design over capture completeness
+Cornell Note-Taking explicitly prioritizes retrieval utility over capture comprehensiveness. The system is designed for getting information back out, not for complete recording. This isn't a technique detail but a design orientation that explains multiple vault architecture choices.
+The question shift is fundamental: not "where should I put this?" but "how will I find this later?" The first question optimizes for filing — creating neat categories, proper locations, complete capture. The second question optimizes for retrieval — creating finding aids, distinctive markers, efficient filters. These are different objectives that lead to different architectures. Since [[storage versus thinking distinction determines which tool patterns apply]], this question reveals system type: storage systems (PARA, Johnny.Decimal) naturally ask the filing question because their purpose is organization and retrieval of assets, while thinking systems (Zettelkasten, this vault) must ask the retrieval question because their purpose is synthesis through connection-finding.
+## Why this matters for agent-operated systems
+The retrieval-first orientation has specific implications for how agents should structure knowledge:
+**Descriptions over summaries.** Since [[descriptions are retrieval filters not summaries]], the YAML description field exists to enable retrieval decisions, not to summarize content. A description that helps an agent decide whether to load a note serves retrieval utility. A description that tries to compress the note's content serves capture completeness. The distinction is subtle but consequential: filter descriptions can be shorter and more distinctive, while summary descriptions try to cover everything and end up generic.
+**Flat structure over hierarchical filing.** Deep folder hierarchies optimize for filing ("where does this belong?") not retrieval ("how do I find this?"). Retrieval-optimized architecture favors flat structures where everything is equally reachable via wiki links and semantic search. The agent doesn't need to navigate a path to find content — it can retrieve directly by concept. Since [[local-first file formats are inherently agent-native]], this retrieval-first orientation extends to format choice itself: plain text requires no authentication or infrastructure, making any LLM a valid retriever. And since [[data exit velocity measures how quickly content escapes vendor lock-in]], the retrieval-first question generalizes beyond the current tool: not just "how will I find this later" but "how will any future agent read this in any tool." Exit velocity makes the format-level retrieval concern auditable — every feature that lowers velocity is a retrieval risk that spans tool lifetimes, not just search sessions. This is why [[topological organization beats temporal for knowledge work]] — date-based folders force chronological scanning while concept-based organization enables direct semantic retrieval.
+**Processing for retrieval, not completeness.** Since [[processing effort should follow retrieval demand]], heavy processing at capture time optimizes for completeness (doing everything properly upfront). Retrieval-first systems invest processing effort at retrieval time, when you know what you actually need. This is why JIT processing beats front-loading: you can't know at capture time which retrieval patterns matter.
+## Information-theoretic foundation
+The retrieval-first orientation has a deeper justification: since [[metadata reduces entropy enabling precision over recall]], retrieval-optimized architecture pre-computes low-entropy representations that shrink the search space. Filing-first architecture optimizes for putting things in the right place; retrieval-first architecture optimizes for finding things efficiently. The difference is not just aesthetic preference but information-theoretic efficiency — precision over recall as a design choice.
+## The 80-year validation
+Cornell Note-Taking dates to the 1940s. The cue column — margin notes designed to trigger recall, not to summarize — is a retrieval-first design from before computers. The system has been tested across generations and contexts. Its longevity validates that retrieval-first is not just a theoretical optimization but a practically effective orientation.
+The lineage is even deeper: since [[wiki links are the digital evolution of analog indexing]], the cue column functioned as an index pointing to content blocks. Wiki links are the digital fulfillment of this 80-year-old cognitive pattern. What Cornell achieved with margin annotations, we achieve with bidirectional links.
+This gives confidence to vault architecture decisions that might otherwise feel like arbitrary choices. When we favor descriptions that filter over descriptions that summarize, we're implementing a pattern with eight decades of human validation.
+## The design test
+Any architectural decision can be evaluated against this principle: does this optimize for retrieval or for capture? Since [[faceted classification treats notes as multi-dimensional objects rather than folder contents]], Ranganathan formalized this same question ninety years ago in library science: the right question is not "where does this go?" but "what are its properties?" Each YAML field is a retrieval-first design choice -- a classification dimension that enables finding notes by what they ARE rather than where they were FILED.
+- Wiki links over tags: retrieval-first (links traverse, tags just categorize)
+- MOCs over nested folders: retrieval-first (MOCs are entry points, folders are filing)
+- Type metadata over content folders: retrieval-first — since [[type field enables structured queries without folder hierarchies]], category queries happen via YAML fields rather than file location, enabling a note to participate in multiple category queries simultaneously
+- Sentence titles over topic labels: retrieval-first (claims can be found by meaning, labels only by words)
+Since [[throughput matters more than accumulation]], retrieval-first thinking supports the success metric: velocity from capture to synthesis. Architectures optimized for filing create beautiful graveyards. Architectures optimized for retrieval create systems that actually get used. The failure case is concrete: since [[flat files break at retrieval scale]], systems designed for capture completeness hit a predictable wall where finding content requires remembering what you have, and for agents this retrieval failure degrades cognition itself.
+This explains why [[structure without processing provides no value]] — the "Lazy Cornell" anti-pattern, where students draw the structural lines but skip the processing, produces no benefit. Structure is a capture-time investment; processing creates retrieval value. Without the retrieval-focused operations (writing cues, summarizing, self-testing), the structure is just filing with extra steps. And since [[verbatim risk applies to agents too]], even the processing operations can fail to create retrieval value if they merely reorganize rather than generate — a summary that paraphrases adds no distinctive information for retrieval filtering.
+---
+Relevant Notes:
+- [[descriptions are retrieval filters not summaries]] — the specific implementation of retrieval-first thinking for note descriptions
+- [[processing effort should follow retrieval demand]] — JIT processing as the operational form of retrieval-first architecture
+- [[throughput matters more than accumulation]] — the success metric that retrieval-first design serves
+- [[topological organization beats temporal for knowledge work]] — why concept-based organization (retrieval-first) beats date-based organization (capture-first)
+- [[metadata reduces entropy enabling precision over recall]] — the information-theoretic foundation for retrieval-first design choices
+- [[wiki links are the digital evolution of analog indexing]] — the 80-year lineage of retrieval-first indexing from Cornell cue columns to wiki links
+- [[structure without processing provides no value]] — the Lazy Cornell anti-pattern: structure (capture-time) without processing (retrieval-time) produces no benefit
+- [[local-first file formats are inherently agent-native]] — retrieval-first applied to format choice: plain text with embedded metadata makes any LLM a valid retriever without infrastructure
+- [[verbatim risk applies to agents too]] — tests whether retrieval-focused processing can degenerate into reorganization without genuine insight; paraphrase summaries fail the retrieval-first criterion because they add no distinctive filter value
+- [[type field enables structured queries without folder hierarchies]] — implements retrieval-first for categorization: metadata queries replace folder hierarchies, enabling multi-category membership and direct category retrieval
+- [[data exit velocity measures how quickly content escapes vendor lock-in]] — extends retrieval-first thinking to the format layer: 'how will any future agent read this' generalizes 'how will I find this later' from within-tool to across-tool portability
+- [[faceted classification treats notes as multi-dimensional objects rather than folder contents]] — formal articulation: Ranganathan's PMEST framework is the library science formalization of retrieval-first design; 'notes have properties' is the Ranganathan version of 'how will I find this later'
+- [[narrow folksonomy optimizes for single-operator retrieval unlike broad consensus tagging]] — theoretical framework: Vander Wal's narrow folksonomy names the design orientation this note describes; 'how will I find this later' is a personal question because the vault is a single-operator system where vocabulary can optimize entirely for one agent's retrieval patterns
+- [[storage versus thinking distinction determines which tool patterns apply]] — upstream classification: retrieval-first is specifically a thinking-system orientation; storage systems optimize for filing-first, making this distinction the upstream choice that determines whether retrieval-first even applies
+- [[flat files break at retrieval scale]] — the negative case: systems that optimize for capture completeness over retrieval hit a scale wall where finding content requires remembering what you have; the scale curve from ~50 to 500+ notes concretely demonstrates what filing-first architecture costs
+Topics:
+- [[discovery-retrieval]]