npm - arscontexta - Versions diffs - 0.6.0 - Mend

arscontexta 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (418) hide show

package/methodology/live index via periodic regeneration keeps discovery current.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+description: A maintenance agent regenerating index files on note changes bridges static indices that go stale with dynamic queries that cost tokens each time
+kind: research
+topics: ["[[discovery-retrieval]]"]
+methodology: ["Original"]
+source: [[2-4-metadata-properties]]
+---
+# live index via periodic regeneration keeps discovery current
+Static index files solve the problem of repeated queries: instead of running `rg "^description:"` every time an agent needs to scan available notes, a pre-built index provides immediate access. But static files go stale. A note created this morning won't appear in an index generated last week. Since [[stale navigation actively misleads because agents trust curated maps completely]], the agent operates on outdated information without knowing it — and because it trusts the curated view, it never searches for what the stale index omits.
+Dynamic queries solve staleness but introduce cost. Every JIT query costs tokens and time. When an agent needs to understand what exists before deciding what to read, querying descriptions across hundreds of notes adds overhead to every decision point. Since [[progressive disclosure means reading right not reading less]], the agent needs to filter frequently — and frequent filtering shouldn't require repeated full-vault scans.
+The solution is periodic regeneration by a maintenance agent. When notes change, the agent regenerates the relevant index. The index remains a static file that loads instantly, but its contents stay current because regeneration happens at the boundary of change rather than the moment of query.
+This pattern has several implementation variations:
+| Approach | Trigger | Trade-off |
+|----------|---------|-----------|
+| Post-commit hook | Every git commit | Frequent updates, minimal staleness, some overhead |
+| Session-end regeneration | When agent session ends | Groups changes, slight delay before currency |
+| Scheduled regeneration | Cron-style periodic runs | Predictable staleness window, low overhead |
+| Change detection | File watcher triggers rebuild | Near-instant currency, requires infrastructure |
+The vault already demonstrates this pattern implicitly. CLAUDE.md documents dynamic querying via ripgrep (`rg "^description:" 01_thinking/*.md`), but also injects the file tree at session start via hook — that's periodic regeneration of structural information. The tree injection keeps agents current without requiring them to run `find` commands repeatedly.
+Multiple specialized indices become possible when regeneration is cheap:
+- **By topic:** Which MOC would each note appear under?
+- **By maturity:** Seedling notes vs fully developed claims (since [[maturity field enables agent context prioritization]], pre-computed maturity indices let agents filter before loading)
+- **By type:** All methodology notes, all tensions, all problems
+- **By recency:** Notes modified in last 7 days
+- **By link density:** Well-connected vs orphan notes
+Each index serves a different navigation mode. An agent exploring new territory might want the by-topic index. An agent doing maintenance wants the by-recency or by-link-density view. Since [[type field enables structured queries without folder hierarchies]], pre-computing type-based indices avoids repetitive filtering at query time.
+Periodic regeneration is a specific instance of a broader architectural pattern. Since [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]], the index regeneration pattern is a reconciliation loop where the desired state is "index matches current file state" and the detection mechanism is comparing file modification times or counts against index entries. The reconciliation table in that note includes qmd index freshness as one of its rows, with the Phase 0 freshness check (comparing indexed document count against actual file count) as the detection tool and `qmd update && qmd embed` as the remediation. Periodic index regeneration is the most mature reconciliation loop in the vault because it already has both automated detection and automated remediation — the check is deterministic and the fix is idempotent.
+The deeper insight is that this dissolves the static/dynamic trade-off. Traditional systems force a choice: maintain indices manually (they go stale) or query dynamically (expensive at scale). Agent-operated systems can regenerate automatically because the maintenance work itself can be delegated. The human doesn't maintain the index — the agent does, as part of routine vault hygiene. Since [[skills encode methodology so manual execution bypasses quality gates]], index regeneration should be encoded in hooks or skills rather than run ad-hoc — the quality gate here is ensuring regeneration happens at the right moments without human memory.
+This connects to the broader pattern in CLAUDE.md of hooks automating what would otherwise be manual ceremony. The file tree injection hook already implements this pattern for structure. Extending it to description aggregation, link density metrics, or topic indices follows the same logic: compute once at change boundaries, use cheaply at query time. This is the [[bootstrapping principle enables self-improving systems]] in action at the infrastructure level — the vault already uses its own patterns (hooks, automation) to improve its own discovery mechanisms.
+---
+Relevant Notes:
+- [[progressive disclosure means reading right not reading less]] — why agents need efficient filtering mechanisms in the first place
+- [[type field enables structured queries without folder hierarchies]] — metadata dimensions that become cheaply queryable through pre-computed indices
+- [[metadata reduces entropy enabling precision over recall]] — the information-theoretic basis for why indices help: pre-computed low-entropy representations
+- [[skills encode methodology so manual execution bypasses quality gates]] — regeneration should be encoded in hooks/skills, not manual ad-hoc commands, to ensure it happens reliably
+- [[bootstrapping principle enables self-improving systems]] — the vault already demonstrates this pattern: hooks that improve discovery use the same automation philosophy they serve
+- [[maturity field enables agent context prioritization]] — maturity is one of several index dimensions that become cheaply queryable when regeneration is cheap
+- [[spaced repetition scheduling could optimize vault maintenance]] — interval-based scheduling addresses WHEN maintenance happens; trigger-based regeneration is a complementary mechanism
+- [[gardening cycle implements tend prune fertilize operations]] — gardening operations could serve as regeneration triggers; tend the note, regenerate indices that include it
+- [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]] — names the architectural pattern: periodic regeneration is a reconciliation loop where the desired state is index-matches-files and remediation is idempotent rebuild
+- [[stale navigation actively misleads because agents trust curated maps completely]] — motivates the urgency: stale indices and stale MOCs share the same failure mode where agents trust outdated curated views without suspecting staleness, making periodic regeneration not just convenient but necessary to prevent silent navigation corruption
+Topics:
+- [[discovery-retrieval]]

package/methodology/local-first file formats are inherently agent-native.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+description: Plain text with embedded metadata survives tool death and requires no authentication, making any LLM a valid reader without bootstrapping infrastructure
+kind: research
+topics: ["[[agent-cognition]]"]
+---
+# local-first file formats are inherently agent-native
+Jekyll's YAML frontmatter "liberated metadata, placing it in plain text alongside content." When metadata lives in the file itself rather than a database, agents can read any note in isolation without bootstrapping external systems.
+This is why markdown plus wiki links works as an agent substrate.
+## The portability property
+Most knowledge systems encode metadata externally: Notion stores relationships in a database, Roam stores block references in JSON that only Roam interprets, Confluence requires API authentication. To read these formats, you need the software that created them — or reverse-engineered access.
+Markdown with YAML frontmatter and wiki links inverts this. The file IS the complete artifact. Any text reader sees the structure. Any LLM can parse the YAML, follow the wiki links, understand the content. No API keys, no database connections, no format translators.
+This matters for longevity: tools die, but plain text survives. If Obsidian disappears tomorrow, these files remain navigable. The wiki link syntax `[[note title]]` is human-readable even without link resolution — it tells you which concept connects without requiring software to interpret it. Since [[complex systems evolve from simple working systems]], this durability is predictable — the simplest substrate (files + text) proves more robust than complex infrastructures (databases + APIs) because it has fewer failure modes. And since [[data exit velocity measures how quickly content escapes vendor lock-in]], this portability advantage is not just an intuition but an auditable metric: plain text + YAML + wiki links score as high velocity (export = copy a folder), while database-backed tools score low velocity (export requires conversion that loses relationships). The three-tier framework makes format decisions evaluable rather than philosophical.
+## Why agents care about this
+An agent operating a knowledge system needs read access. External metadata creates dependencies:
+- Database-backed: agent needs database credentials
+- API-backed: agent needs API authentication
+- Proprietary format: agent needs format translator
+Local-first file formats eliminate these dependencies. The agent reads files from the filesystem. The metadata is in the file. The connections are in the file. Nothing external is required.
+Since [[wiki links implement GraphRAG without the infrastructure]], the local-first property explains WHY this is possible. GraphRAG typically requires entity extraction pipelines, graph databases, clustering algorithms -- infrastructure. But when the graph IS the wiki links and the metadata IS the YAML, the infrastructure is the filesystem itself. The full extent of this infrastructure-free property goes further than graph traversal alone: since [[markdown plus YAML plus ripgrep implements a queryable graph database without infrastructure]], four layers (wiki link edges, YAML metadata, faceted classification, soft validation) compose into a complete graph database architecture where the infrastructure is just a filesystem and a text search tool. Local-first is the substrate property that makes all four layers possible without external dependencies.
+## The design choice
+This is a CLOSED claim — a foundational decision we made, not a hypothesis we're testing. The corollary is that content which exists in non-text formats must convert: since [[temporal media must convert to spatial text for agent traversal]], video, audio, and podcasts lack the random access that agents need for graph traversal, so they must produce markdown artifacts as their first processing step. The conversion is lossy but mandatory — a lossy transcript that agents can search and link outperforms a perfect recording that sits inert.
+Agent-operated knowledge systems use markdown files with YAML frontmatter and wiki links because they need:
+1. Tool independence: any LLM can read these files
+2. Portability: files survive software transitions
+3. Simplicity: no external services to maintain
+4. Inspectability: humans can read the raw files
+The alternative — using a database, a proprietary format, or API-mediated access — would create dependencies that limit which agents can operate the system. This is precisely why since [[platform capability tiers determine which knowledge system features can be implemented]], the core knowledge system features (markdown conventions, wiki links, YAML schemas, MOCs) work at every tier -- they require nothing beyond what local-first files provide. Local-first is the mechanism behind tier-universality: features that need only files and text are platform-independent by construction.
+## The tradeoff
+Local-first means no server-side features: no real-time sync, no multi-user collaboration, no hosted backups. These are genuine losses. But for a single-operator vault where the agent needs unmediated access to all content, the tradeoff favors local files. A subtler tension emerges as operations scale: since [[intermediate representation pattern enables reliable vault operations beyond regex]], parsing files into structured objects before operating on them trades the simplicity of raw-text operations for reliability. The files remain the source of truth, but the operation layer gains a dependency that raw regex didn't have. This is a tension within the local-first principle rather than a contradiction of it — the storage substrate stays local-first even when the operation layer adds structure.
+The bet: longevity and agent accessibility matter more than collaboration features. For a tools-for-thought-for-agents project, this bet seems right — the agent needs to read everything without credentials, and the files need to outlive any particular software stack. Since [[retrieval utility should drive design over capture completeness]], this is retrieval-first thinking applied to format choice: optimize for any agent being able to find and read content, even at the cost of capture features that require specific infrastructure. The tradeoff accepts losses (collaboration, sync) to maximize retrieval accessibility (any LLM reads without dependencies).
+Local-first formats also enable the [[bootstrapping principle enables self-improving systems]]: the system can modify itself because it needs no external coordination or authentication to access its own files. The system writes the skills that process its own content, which only works because reading and writing require nothing beyond filesystem access. This property is why [[session handoff creates continuity without persistent memory]] works: the briefing mechanism (work queue, task files, structured handoff blocks) requires only filesystem access. Any LLM can read the briefing without authentication or external services. Continuity through structure succeeds where continuity through capability fails precisely because the structure is local-first.
+---
+Relevant Notes:
+- [[markdown plus YAML plus ripgrep implements a queryable graph database without infrastructure]] — synthesis: local-first is the substrate property that enables all four layers of the infrastructure-free graph database; the database requires only filesystem access because every layer builds on plain text with embedded metadata
+- [[wiki links implement GraphRAG without the infrastructure]] — explains what the local-first substrate enables: a curated knowledge graph with no external dependencies
+- [[complex systems evolve from simple working systems]] — explains WHY simple formats survive: fewer failure modes than complex infrastructures
+- [[retrieval utility should drive design over capture completeness]] — local-first is retrieval-first design applied to format choice: optimize for any agent reading over capture features requiring infrastructure
+- [[bootstrapping principle enables self-improving systems]] — local-first enables recursive improvement: the system modifies itself without external coordination
+- [[session handoff creates continuity without persistent memory]] — local-first enables file-based handoffs: any LLM can read the briefing without authentication, making structure-based continuity possible
+- [[digital mutability enables note evolution that physical permanence forbids]] — local-first enables mutability: direct file access means agents can edit notes without APIs, enabling the revision that physical cards forbade
+- [[data exit velocity measures how quickly content escapes vendor lock-in]] — makes the portability principle auditable: a three-tier framework (high/medium/low velocity) turns abstract local-first advantages into a concrete design metric
+- [[intermediate representation pattern enables reliable vault operations beyond regex]] — productive tension: an IR adds an in-memory operation layer above local-first files, trading the simplicity of raw-text operations for reliability at scale while preserving files as the source of truth
+- [[temporal media must convert to spatial text for agent traversal]] — the corollary: content that is NOT already local-first text must become it; temporal media like audio, video, and podcasts lack the random access that agent traversal requires, so conversion to markdown is architecturally mandatory
+- [[four abstraction layers separate platform-agnostic from platform-dependent knowledge system features]] — formalizes local-first as the foundation layer: the bottom of a four-layer hierarchy where each layer adds platform requirements, and the foundation layer has maximum portability precisely because it needs no infrastructure beyond a filesystem
+- [[platform capability tiers determine which knowledge system features can be implemented]] — local-first is the mechanism behind tier-universality: core features (wiki links, YAML, MOCs) work at every tier because they require nothing beyond files and text
+Topics:
+- [[agent-cognition]]

package/methodology/logic column pattern separates reasoning from procedure.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+description: Dual-column structure where right side shows steps and left side shows the principle or rule applied at each step — agents parse reasoning layer first to understand approach before details
+kind: research
+topics: ["[[note-design]]"]
+methodology: ["Cornell"]
+source: [[3-3-cornell-note-taking-system]]
+---
+# logic column pattern separates reasoning from procedure
+Technical content often bundles two distinct information types: the steps to follow and the reasoning behind each step. The logic column pattern unbundles them spatially, creating parallel tracks that can be read independently or together.
+The pattern originates from Cornell Note-Taking's adaptation for mathematics and science. In the standard Cornell format, the cue column holds questions and keywords. For technical content, this transforms: the right column holds procedural steps (problem-solving sequences, code operations, proof steps), while the left column holds the reasoning at each step — which theorem applies, what principle justifies the operation, why this transformation is valid.
+This separation enables a powerful reading strategy: scan the reasoning column first to understand the approach, then dive into procedural details only where needed. Since [[progressive disclosure means reading right not reading less]], the logic column provides another disclosure layer — a compressed view of the thinking that doesn't require parsing implementation details. The summary at the bottom then captures the problem type and strategy pattern, making the note findable by approach rather than specific content.
+For agent-operated vaults, this pattern translates to interspersed annotation blocks. In markdown, a `> [!logic]` callout pattern works: procedural content flows normally, with logic callouts interjected at each significant step explaining the principle being applied. This callout format exemplifies how [[schema templates reduce cognitive overhead at capture time]] — the structure is given (insert logic callout at each step), so attention focuses on articulating the reasoning rather than designing the documentation format. An agent parsing technical documentation can extract just the logic callouts to understand the reasoning structure, then read full context only for steps where the reasoning is unclear or unfamiliar. This is the technical-content version of how [[descriptions are retrieval filters not summaries]] — the logic column provides a compressed view that enables filtering decisions without loading full procedural detail.
+This connects to how [[good descriptions layer heuristic then mechanism then implication]] — the logic column is essentially mechanism-level annotation embedded within procedural content. Where descriptions layer information temporally (read heuristic first, mechanism second), the logic column layers information spatially (reasoning track alongside procedure track). Both serve the same purpose: enabling an agent to choose its reading depth based on need.
+The pattern is most valuable for technical documentation, code explanations, mathematical proofs, and algorithmic walkthroughs — any content where "what to do" and "why this works" are both essential but serve different purposes. An agent verifying correctness needs the reasoning. An agent executing needs the steps. The logic column lets them access what they need without wading through what they don't.
+This parallels how [[trails transform ephemeral navigation into persistent artifacts]] captures not just destinations but the reasoning connecting them. Trails persist the "why this, then that" logic of navigation; logic columns persist the "why this step" reasoning of procedures. Both transform ephemeral understanding — a debugging session, a navigation path — into reusable artifacts where future agents can follow not just the steps but the thinking that made them productive.
+---
+Relevant Notes:
+- [[progressive disclosure means reading right not reading less]] — logic columns implement spatial progressive disclosure for technical content
+- [[good descriptions layer heuristic then mechanism then implication]] — parallel pattern: descriptions layer temporally, logic columns layer spatially
+- [[schema templates reduce cognitive overhead at capture time]] — the `> [!logic]` callout format IS a schema template that externalizes structure so attention focuses on articulating reasoning
+- [[descriptions are retrieval filters not summaries]] — logic columns serve the same filter function for technical content: agents extract reasoning layer without full procedural detail
+- [[trails transform ephemeral navigation into persistent artifacts]] — sibling pattern: trails persist navigation reasoning, logic columns persist procedural reasoning; both capture the why alongside the what
+- [[dual-coding with visual elements could enhance agent traversal]] — logic columns might combine with visual elements for annotated diagrams
+Topics:
+- [[note-design]]

package/methodology/maintenance operations are more universal than creative pipelines because structural health is domain-invariant.md ADDED Viewed

@@ -0,0 +1,47 @@
+---
+description: Structural health checks (validation, orphans, links, MOC coherence) transfer across domains and platforms while creative processing varies wildly by use case
+kind: research
+topics: ["[[maintenance-patterns]]", "[[design-dimensions]]"]
+methodology: ["Original", "Digital Gardening"]
+source: [[arscontexta-notes]]
+---
+# maintenance operations are more universal than creative pipelines because structural health is domain-invariant
+The creative pipeline gets all the attention. Claim extraction, pattern detection, decision logging, insight synthesis — these are the glamorous operations that feel like knowledge work. But they are also the operations that differ wildly across domains. How a research vault processes sources has almost nothing in common with how a therapy journal processes sessions, even though both use the same structural substrate. Maintenance operations, by contrast, are nearly identical everywhere: validate schemas, detect orphan notes, check link integrity, flag stale content, review MOC coherence. These operations care about structural health, and structural health is domain-invariant.
+This asymmetry has practical consequences for system design. Since [[every knowledge domain shares a four-phase processing skeleton that diverges only in the process step]], the skeleton itself predicts the portability gap. The process step — where creative work lives — is the domain-specific phase. But the operations that keep the surrounding phases healthy (capture integrity, connection validity, verification accuracy) are structural operations that apply regardless of what content flows through them. Maintenance serves the invariant parts of the skeleton, which is why it transfers across domains without adaptation.
+The deeper explanation for why this universality holds is that since [[the vault methodology transfers because it encodes cognitive science not domain specifics]], the structural properties maintenance checks are grounded in cognitive principles that operate identically across domains. Schema compliance, link integrity, and MOC coherence are domain-invariant because they encode operations cognition itself requires — external memory integrity, associative connection validity, attention management health — not because they happen to work the same way in our particular vault.
+This means that while [[knowledge systems share universal operations and structural components across all methodology traditions]], the universality is not uniform across the inventory — maintenance is genuinely more portable than its sibling operations. The universality becomes concrete when you list what maintenance actually checks. Schema validation asks whether required fields exist and enum values are valid — the answer is purely structural regardless of whether the note is a research claim or a project decision. Orphan detection asks whether a note has incoming links — graph topology, not content semantics. Link integrity asks whether wiki link targets exist — filesystem reality, not domain knowledge. MOC coherence asks whether navigation structures match their contents — organizational health, not intellectual depth. Stale note detection asks about temporal patterns — when was this last touched, has the surrounding graph changed — not about what the note argues. Every one of these checks operates on the same structural properties that since [[schema enforcement via validation agents enables soft consistency]] describes: metadata fields, link targets, graph topology, temporal patterns. And since [[schema field names are the only domain specific element in the universal note pattern]], maintenance operations check the invariant components of the note architecture — title structure, link targets, topics presence, body format — rather than the YAML field names that carry domain semantics. The maintenance surface and the universality surface overlap almost entirely.
+The contrast with creative operations sharpens the point. Claim extraction requires understanding what counts as a claim in a given domain. Pattern detection requires knowing what patterns matter. Decision logging requires understanding organizational context. These operations require domain expertise that cannot be abstracted away without producing what [[structure without processing provides no value]] calls Lazy Cornell — structural motions that look like work but create no value. This is exactly the trap that [[false universalism applies same processing logic regardless of domain]] identifies: the confidence that structural patterns transfer can seduce a derivation agent into assuming operational patterns transfer too, but creative processing carries domain semantics that maintenance never touches. Maintenance has no such dependency. A link is broken or it isn't. A schema field is present or it isn't. An orphan note exists or it doesn't. The checks are objective in a way that creative processing never can be — which is why since [[over-automation corrupts quality when hooks encode judgment rather than verification]], maintenance operations sit safely on the deterministic side of the automation boundary while creative processing does not.
+Since [[four abstraction layers separate platform-agnostic from platform-dependent knowledge system features]], maintenance operations cluster disproportionately in the lower layers. Schema definitions live at the convention layer. File integrity lives at the foundation layer. Even automated maintenance (validation hooks, health check scripts) requires only basic automation infrastructure. Creative pipelines, by contrast, often require orchestration-layer features: multi-phase processing with fresh context per phase, queue management, parallel subagent coordination. This means maintenance is not only more universal across domains but also more portable across platforms — a critical advantage for any system that aims to work across diverse agent environments. Since [[platform capability tiers determine which knowledge system features can be implemented]], even tier-three platforms with minimal infrastructure can run maintenance checks because schema validation, link verification, and orphan detection operate on plain text files that any agent can parse. Creative pipelines need tier-one capabilities to function properly.
+The implication for knowledge system design is that maintenance should be a first-class concern, not an afterthought bolted on after the creative pipeline ships. Since [[backward maintenance asks what would be different if written today]], the intellectual quality of a vault depends as much on how notes evolve as on how they are born. A system that creates brilliantly but never maintains will accumulate a graveyard of notes that were once accurate, once well-connected, once navigable — but no longer. Since [[gardening cycle implements tend prune fertilize operations]], the operations are well-understood: tend (update against current knowledge), prune (remove or split overgrown content), fertilize (add connections). These operations preserve the value that creative work generates.
+There is a shadow side worth naming. Maintenance can become its own form of procrastination — endlessly polishing structure while avoiding the harder work of creating new understanding. The claim is not that maintenance matters more than creation, but that the two are equally necessary and maintenance is systematically undervalued. The creative pipeline is visible and satisfying: a new note appears, a new connection forms, the graph grows. Maintenance is invisible and thankless: a broken link gets fixed, a stale description gets updated, an orphan finds its MOC. But without the invisible work, the visible work decays. The knowledge graph is not a building that stays up once constructed — it is a garden that dies without tending.
+---
+---
+Relevant Notes:
+- [[every knowledge domain shares a four-phase processing skeleton that diverges only in the process step]] — the skeleton explains why maintenance is universal: capture, connect, and verify are domain-invariant, and maintenance operations serve those invariant phases
+- [[structure without processing provides no value]] — the complementary risk: creation without maintenance produces graveyards, but maintenance without creation produces empty gardens; the claim here is that the two are equally necessary
+- [[backward maintenance asks what would be different if written today]] — the intellectual core of maintenance: genuine reconsideration rather than mechanical link-adding
+- [[gardening cycle implements tend prune fertilize operations]] — operational decomposition of maintenance into focused phases
+- [[schema enforcement via validation agents enables soft consistency]] — implementation pattern: soft enforcement accumulates maintenance signals without blocking creation
+- [[four abstraction layers separate platform-agnostic from platform-dependent knowledge system features]] — maintenance operations cluster in the lower layers (foundation and convention), making them more portable than creative pipelines that often require automation and orchestration
+- [[the vault methodology transfers because it encodes cognitive science not domain specifics]] — grounds the universality claim: maintenance transfers completely because it operates on structural properties grounded in cognitive science (schema compliance, graph topology, link integrity), while creative processing fails to transfer because it operates on domain semantics that cognition does not universalize
+- [[false universalism applies same processing logic regardless of domain]] — the essential counterweight: maintenance universality is genuine because structural health checks are objective, but false universalism is what happens when the same confidence is extended to creative processing where domain-specific judgment is irreducible
+- [[schema field names are the only domain specific element in the universal note pattern]] — parallel universality at the note-format level: the five-component note architecture is domain-invariant except for YAML field names, and maintenance operations check the invariant components (title structure, link targets, topics presence), not the domain-specific fields
+- [[over-automation corrupts quality when hooks encode judgment rather than verification]] — the determinism boundary explains why maintenance can be safely automated: schema validation and link integrity are deterministic verification operations, not judgment calls, so automating them does not trigger the Goodhart corruption that plagues automated creative processing
+- [[platform capability tiers determine which knowledge system features can be implemented]] — extends the portability argument: maintenance operations function at every platform tier (even tier 3 with minimal infrastructure) because they check structural properties of plain text files, while creative pipelines require tier-1 features like subagent spawning and context forking
+- [[knowledge systems share universal operations and structural components across all methodology traditions]] — the inventory this note nuances: all eight operations are universal in the sense that every tradition implements them, but this note argues that the maintain operation is MORE universal than sibling operations like process or synthesize because it operates on domain-invariant structural properties
+- [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]] — the scheduling architecture that leverages universality: reconciliation tables are portable precisely because every desired-state declaration checks structural properties (link integrity, schema compliance, orphan status) rather than domain semantics
+Topics:
+- [[maintenance-patterns]]
+- [[design-dimensions]]

package/methodology/maintenance scheduling frequency should match consequence speed not detection capability.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+description: Problems that develop instantly need per-event checks while problems that develop over weeks need monthly checks — matching frequency to propagation rate prevents both wasted overhead and undetected
+kind: research
+topics: ["[[maintenance-patterns]]"]
+methodology: ["Systems Theory", "Original"]
+source: [[automated-knowledge-maintenance-blueprint]]
+---
+# maintenance scheduling frequency should match consequence speed not detection capability
+The intuitive approach to maintenance scheduling asks: how often can we check? If a detection mechanism is cheap, run it frequently. If it is expensive, run it rarely. But this gets the question backwards. The right question is: how fast does the problem develop? A schema violation exists the instant a malformed note is written — waiting until the next session to detect it means every operation between now and then may build on the broken foundation. A stale description develops over weeks as understanding evolves — checking for it on every file write wastes attention budget on a problem that cannot yet exist.
+Consequence speed — the rate at which a problem propagates or worsens after it first appears — determines the appropriate detection frequency. The principle yields a five-tier scheduling spectrum:
+**Instant consequences** need per-event detection. Schema violations and broken links from bad edits create problems the moment the file is written, because downstream operations immediately consume the malformed output. This is why since [[hook enforcement guarantees quality while instruction enforcement merely suggests it]], schema validation runs as a PostToolUse hook rather than a periodic batch check. The per-event tier is the only tier where detection must be synchronous with the operation that creates the problem. Since [[schema validation hooks externalize inhibitory control that degrades under cognitive load]], the mechanism is not just frequent but externalized — the agent's own degrading attention cannot be trusted to catch instant-consequence problems, so the infrastructure catches them instead.
+**Session-scale consequences** need per-session detection. Orphan notes created during a session and dangling links introduced by edits accumulate over the course of a work session but do not propagate beyond it. Since [[session boundary hooks implement cognitive bookends for orientation and reflection]], the session-start health dashboard catches problems that developed since the last session — orphan count, dangling links, MOC coverage gaps. Checking more frequently would be wasteful because the problems themselves do not worsen within a session (an orphan note is equally orphaned whether you check after one minute or one hour).
+**Multi-session consequences** need weekly detection. Orphan accumulation rate and index drift develop over multiple sessions, each session potentially adding small increments of degradation. No single session creates a crisis, but unchecked accumulation over days creates structural drift. Weekly scheduled checks prevent the gradual creep from crossing into the damage zone. Since [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]], the detection mechanism for this tier is periodic state comparison — declaring what healthy looks like and measuring divergence — rather than event-driven hooks or session boundary checks.
+**Slow consequences** need monthly detection. Stale descriptions and content quality degradation develop as understanding evolves — a description written last month may no longer capture the note's current role in the graph. These problems cannot be detected at write time because they are not wrong when written; they become wrong as the surrounding context changes. Monthly checks match the pace at which this drift becomes actionable.
+**Structural consequences** need threshold-triggered detection rather than periodic checks. Methodology drift and assumption invalidation do not follow a temporal schedule — they accumulate observation by observation until a pattern emerges. This is why the vault uses observation counts (>10) and tension counts (>5) as triggers for meta-cognitive review rather than a fixed monthly schedule. The consequence is not time-dependent but evidence-dependent.
+The spectrum also has a terminal tier: frequency zero. Since [[automation should be retired when its false positive rate exceeds its true positive rate or it catches zero issues]], some checks should not run at any frequency because the condition they guard against has been structurally eliminated, their false positive rate exceeds their true positive rate, or they have been superseded by a better mechanism. Retirement is the scheduling decision that removes a check from the spectrum entirely — not just reducing its frequency to "rarely" but recognizing that the optimal frequency is never. This completes the consequence speed framework: instant, session, multi-session, slow, threshold-triggered, and retired.
+The distinction from since [[spaced repetition scheduling could optimize vault maintenance]] is important: spaced repetition asks "how often should THIS note be reviewed?" based on the note's maturity and review history. Consequence speed asks "how often should THIS class of problem be checked for?" based on the problem's propagation rate. A newly created note needs frequent review (spaced repetition) AND its schema needs per-event validation (consequence speed). These are orthogonal scheduling dimensions — one targets note quality through maturity-based intervals, the other targets system health through propagation-based frequencies.
+The practical implication is that since [[gardening cycle implements tend prune fertilize operations]], the three gardening operations themselves operate at different tiers. Tending (correcting based on new information) addresses problems that develop at the multi-session to monthly timescale — content becomes inaccurate as understanding changes. Pruning (splitting overgrown notes) addresses problems that develop over weeks to months — notes gradually accumulate content that should be separate. Fertilizing (creating connections) addresses problems that develop at the session to multi-session timescale — each new note creates a connection gap that widens with every subsequent note that does not reference it. Different operations, different consequence speeds, different optimal frequencies.
+There is a design trap in matching detection frequency to detection cost rather than consequence speed. Cheap detection run too frequently wastes the automation budget — context tokens consumed by health dashboard output that reports no change. Expensive detection run too rarely misses problems that have already propagated. The consequence speed principle resolves the ambiguity: if the problem cannot have developed since the last check, the check is wasted regardless of how cheap it is. If the problem could have propagated catastrophically since the last check, the check is essential regardless of how expensive it is. The resolution is safe because since [[automated detection is always safe because it only reads state while automated remediation risks content corruption]], scheduling detection at any frequency carries zero risk — the worst outcome of an unnecessary check is wasted tokens, not corrupted content. And since [[idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once]], repeated detection at the same tier cannot compound into errors. The risk asymmetry means consequence speed should govern detection frequency directly, while remediation at each tier still needs the gating that [[confidence thresholds gate automated action between the mechanical and judgment zones]] provides — determining whether the response to a detected problem should be automatic correction, a suggestion for review, or a logged observation. Consequence speed and remediation gating are complementary scheduling dimensions: consequence speed determines WHEN to detect, and since [[the fix-versus-report decision depends on determinism reversibility and accumulated trust]], the four conjunctive conditions determine WHETHER to fix what detection finds. A fast-consequence problem detected per-event still needs to pass all four conditions before auto-fix; a slow-consequence problem detected monthly might pass all four trivially if the fix is mechanical.
+---
+Relevant Notes:
+- [[spaced repetition scheduling could optimize vault maintenance]] — complementary scheduling dimension: spaced repetition asks how often to review THIS note based on maturity, consequence speed asks how often to check for THIS class of problem based on propagation rate
+- [[hook enforcement guarantees quality while instruction enforcement merely suggests it]] — hooks implement the per-event tier of the consequence speed spectrum: schema violations propagate instantly so the detection mechanism fires on every write event
+- [[schema validation hooks externalize inhibitory control that degrades under cognitive load]] — concrete instance of instant-consequence detection: schema violations are created at the moment of writing and externalized inhibitory control catches them at that moment
+- [[gardening cycle implements tend prune fertilize operations]] — the three gardening operations have different consequence speeds: pruning problems (overgrown notes) develop over weeks while fertilizing gaps (missing connections) develop over days as new notes arrive
+- [[session boundary hooks implement cognitive bookends for orientation and reflection]] — session start implements the per-session detection tier: health dashboard catches problems that accumulated since the last session
+- [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]] — implements the middle tiers: reconciliation is the architectural pattern for multi-session and slow-consequence checks, where desired state is declared and periodically compared to actual state rather than detected per-event
+- [[idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once]] — enables aggressive scheduling at every tier: because detection is read-only and therefore trivially idempotent, running it at whatever frequency consequence speed demands carries zero risk of corruption from repeated execution
+- [[automated detection is always safe because it only reads state while automated remediation risks content corruption]] — extends the scheduling principle: the detection side of maintenance can be scheduled at whatever frequency consequence speed demands because it only reads state, while remediation at each tier needs judgment gates regardless of how fast the problem propagates
+- [[confidence thresholds gate automated action between the mechanical and judgment zones]] — complements consequence speed with response calibration: consequence speed determines WHEN to check, confidence thresholds determine HOW AGGRESSIVELY to act on what the check finds, together parameterizing the full automation scheduling decision
+- [[the fix-versus-report decision depends on determinism reversibility and accumulated trust]] — complementary gating dimension: consequence speed determines detection frequency, the four conjunctive conditions determine whether detected problems should be auto-fixed or merely reported; together they parameterize the complete automation scheduling question of when to check AND what to do about it
+- [[automation should be retired when its false positive rate exceeds its true positive rate or it catches zero issues]] — the terminal tier of the scheduling spectrum: frequency zero; retirement is the scheduling decision that removes a check entirely rather than merely reducing its frequency
+- [[three concurrent maintenance loops operate at different timescales to catch different classes of problems]] — architectural embodiment: the three-loop architecture groups problems by consequence speed into discrete operational tiers (per-event, per-session, per-month), implementing this note's scheduling principle as a concrete system design
+- [[agent session boundaries create natural automation checkpoints that human-operated systems lack]] — structural enabler of the session tier: session boundaries provide the enforcement points that make per-session detection reliable rather than aspirational, connecting scheduling theory to the discrete architecture that guarantees it
+Topics:
+- [[maintenance-patterns]]

package/methodology/maintenance targeting should prioritize mechanism and theory notes.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+description: When reweaving experiments, find notes that discuss the MECHANISM being tested rather than just topic-related notes — for theory-testing experiments, target the theory notes themselves
+kind: research
+topics: ["[[maintenance-patterns]]"]
+---
+Reweave targeting benefits from semantic judgment about WHAT an experiment tests, not just topic proximity. Since [[spreading activation models how agents should traverse]] describes traversal as decay-based context loading through wiki links, the question "where should activation spread?" during reweave has two answers: topic proximity (same MOC) or mechanism connection (same underlying concept). This note argues mechanism connection produces higher-value reweave targets for experiments.
+For experiments, the productive reweave targets are notes that discuss the **mechanism** the experiment tests. The cognitive outsourcing experiment tests skill atrophy via delegation — so notes about [[skills encode methodology so manual execution bypasses quality gates]] and [[the generation effect requires active transformation not just storage]] are productive targets because they discuss the mechanisms being tested.
+For experiments that test theories, the most valuable reweave targets are the **theory notes themselves**. The testing effect experiment tests description quality theory — so [[descriptions are retrieval filters not summaries]] and [[progressive disclosure means reading right not reading less]] theorize about description quality. Reweaving connects the test to what's being tested, making the relationship bidirectional.
+The distinction matters because topic proximity misleads. Notes about "unrelated experiments" share the experiments MOC but have low reweave value despite being in the same MOC. Notes about the tested mechanism may be in different MOCs entirely but have high reweave value because they provide theoretical grounding. Topic vocabulary similarity (what semantic search optimizes for) is not the same as mechanism connection. Reweave targeting should compensate by prioritizing mechanism connection over topic proximity.
+The heuristic: before reweaving an experiment, ask "what mechanism does this test?" and "what notes theorize about that mechanism?" — then target those notes rather than running semantic search on topic keywords. This is a domain-specific refinement of [[processing effort should follow retrieval demand]]: for experiments, mechanism connection predicts higher value than retrieval frequency or topic proximity. The demand signal is semantic (what the experiment tests) rather than observed (what's frequently traversed).
+---
+Relevant Notes:
+- [[backward maintenance asks what would be different if written today]] — the foundational reweaving concept; this note adds targeting heuristics
+- [[spreading activation models how agents should traverse]] — provides the traversal framework; this note argues mechanism connection should guide where activation spreads during reweave
+- [[processing effort should follow retrieval demand]] — this note refines demand-following for experiments: mechanism connection predicts value better than retrieval frequency
+- [[testing effect could enable agent knowledge verification]] — an experiment whose reweave benefits from targeting theory notes (description quality)
+- [[cognitive outsourcing risk in agent-operated systems]] — an experiment whose reweave benefits from targeting mechanism notes (skill atrophy)
+Topics:
+- [[maintenance-patterns]]

package/methodology/maintenance-patterns.md ADDED Viewed

@@ -0,0 +1,72 @@
+---
+description: Condition-based maintenance, health checks, reweaving, and the backward pass -- keeping vaults alive
+type: moc
+---
+# maintenance-patterns
+How knowledge vaults stay healthy over time. Condition-based maintenance, reweaving, orphan detection, staleness management. Why temporal schedules fail for agent-operated systems.
+## Core Ideas
+### Research
+- [[MOC construction forces synthesis that automated generation from metadata cannot replicate]] -- The Dump-Lump-Jump pattern reveals that writing context phrases and identifying tensions IS the thinking — automated top
+- [[MOC maintenance investment compounds because orientation savings multiply across every future session]] -- The compounding mechanism is temporal repetition across sessions rather than graph connectivity — one context phrase edi
+- [[agent session boundaries create natural automation checkpoints that human-operated systems lack]] -- Discrete session architecture turns "no persistent memory" into a maintenance advantage because health checks fire at ev
+- [[automated detection is always safe because it only reads state while automated remediation risks content corruption]] -- The read/write asymmetry in automation safety means detection at any confidence level produces at worst a false alert, w
+- [[automation should be retired when its false positive rate exceeds its true positive rate or it catches zero issues]] -- Without retirement criteria the automation layer grows monotonically — checks added when problems appear but never remov
+- [[backward maintenance asks what would be different if written today]] -- This mental model distinguishes reweave from reflect — maintenance becomes genuine reconsideration rather than mechanica
+- [[behavioral anti-patterns matter more than tool selection]] -- PKM failure research shows systems break through habits not software — the Collector's Fallacy, productivity porn, and u
+- [[coherence maintains consistency despite inconsistent inputs]] -- memory systems must actively maintain coherent beliefs despite accumulating contradictory inputs — through detection, re
+- [[community detection algorithms can inform when MOCs should split or merge]] -- Louvain and similar algorithms detect dense note clusters and track how cluster boundaries shift over time, providing ac
+- [[confidence thresholds gate automated action between the mechanical and judgment zones]] -- A three-tier response pattern (auto-apply, suggest, log-only) based on confidence scoring fills the gap between determin
+- [[derived systems follow a seed-evolve-reseed lifecycle]] -- Minimum viable seeding, friction-driven evolution, principled restructuring when incoherence accumulates — reseeding re-
+- [[digital mutability enables note evolution that physical permanence forbids]] -- Physical index cards cannot be edited without destruction, so Luhmann designed for permanence — digital files have no su
+- [[evolution observations provide actionable signals for system adaptation]] -- Six diagnostic patterns map operational symptoms to structural causes and prescribed responses, converting accumulated o
+- [[friction reveals architecture]] -- agents cannot push through friction with intuition, so discomfort that humans ignore becomes blocking — and the forced a
+- [[gardening cycle implements tend prune fertilize operations]] -- Separating vault maintenance into tend (update), prune (remove/split), and fertilize (connect) operations may produce be
+- [[hooks cannot replace genuine cognitive engagement yet more automation is always tempting]] -- The same mechanism that frees agents for substantive work -- delegating procedural checks to hooks -- could progressivel
+- [[idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once]] -- Four patterns from distributed systems — compare-before-acting, upsert semantics, unique identifiers, state declarations
+- [[implicit dependencies create distributed monoliths that fail silently across configurations]] -- When modules share undeclared coupling through conventions, environment, or co-activation assumptions, the system looks
+- [[incremental formalization happens through repeated touching of old notes]] -- Vague inklings crystallize into rigorous concepts over months through maintenance passes — each traversal is an opportun
+- [[maintenance operations are more universal than creative pipelines because structural health is domain-invariant]] -- Structural health checks (validation, orphans, links, MOC coherence) transfer across domains and platforms while creativ
+- [[maintenance scheduling frequency should match consequence speed not detection capability]] -- Problems that develop instantly need per-event checks while problems that develop over weeks need monthly checks — match
+- [[maintenance targeting should prioritize mechanism and theory notes]] -- When reweaving experiments, find notes that discuss the MECHANISM being tested rather than just topic-related notes — fo
+- [[mnemonic medium embeds verification into navigation]] -- Notes could include self-test prompts encountered during traversal so verification becomes ambient rather than a schedul
+- [[module deactivation must account for structural artifacts that survive the toggle]] -- Enabling a module creates YAML fields, MOC links, and validation rules that persist after deactivation — ghost infrastru
+- [[navigation infrastructure passes through distinct scaling regimes that require qualitative strategy shifts]] -- At 50 notes keyword search suffices, at 500 curated MOCs become essential, at 5000 automated maintenance replaces manual
+- [[observation and tension logs function as dead-letter queues for failed automation]] -- Automation failures captured as observation or tension notes rather than dropped silently, with /rethink triaging the ac
+- [[operational wisdom requires contextual observation]] -- tacit knowledge doesn't fit in claim notes — it's learned through exposure, logged as observations, and pattern-matched
+- [[organic emergence versus active curation creates a fundamental vault governance tension]] -- Curation prunes possible futures while emergence accumulates structural debt — the question is not which pole to choose
+- [[orphan notes are seeds not failures]] -- Digital gardening reframes unlinked notes as work-in-progress — health checks flag connection opportunities rather than
+- [[over-automation corrupts quality when hooks encode judgment rather than verification]] -- Hooks that approximate semantic judgment through keyword matching produce the appearance of methodology compliance -- va
+- [[productivity porn risk in meta-system building]] -- Building sophisticated agent workflows becomes procrastination when output stays flat while complexity grows—building su
+- [[programmable notes could enable property-triggered workflows]] -- When notes have queryable metadata, the vault can shift from passive storage to active participant — notes surfacing the
+- [[random note resurfacing prevents write-only memory]] -- Without random selection, vault maintenance exhibits selection bias toward recently active notes, leaving older content
+- [[reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring]] -- The GitOps pattern of declaring desired state and periodically converging toward it replaces imperative maintenance comm
+- [[session transcript mining enables experiential validation that structural tests cannot provide]] -- Traditional tests check if output is correct but session mining checks if the experience achieved its purpose — friction
+- [[spaced repetition scheduling could optimize vault maintenance]] -- Maintenance intervals adapted to note age and maturity could catch early issues while avoiding overhead on stable evergr
+- [[stale navigation actively misleads because agents trust curated maps completely]] -- A stale MOC is worse than no MOC because agents fall back to search (current content) without one, but trust an outdated
+- [[tag rot applies to wiki links because titles serve as both identifier and display text]] -- Unlike opaque identifiers that persist through vocabulary drift, wiki link titles carry semantic content that must stay
+- [[the derivation engine improves recursively as deployed systems generate observations]] -- Each deployed knowledge system is an experiment whose operational observations enrich the claim graph, making every subs
+- [[the fix-versus-report decision depends on determinism reversibility and accumulated trust]] -- Four conditions gate self-healing — deterministic outcome, reversible via git, low cost if wrong, and proven accuracy at
+- [[three concurrent maintenance loops operate at different timescales to catch different classes of problems]] -- Fast loops (per-event hooks) catch instant violations, medium loops (per-session checks) catch accumulated drift, and sl
+- [[wiki links as social contract transforms agents into stewards of incomplete references]] -- Cunningham's norm that creating a link means accepting elaboration responsibility translates from human peer accountabil
+### Guidance
+- [[design MOCs as attention management devices with lifecycle governance]] -- MOC best practices for derived knowledge systems — hierarchy patterns, lifecycle management, and health metrics adapted
+- [[implement condition-based maintenance triggers for derived systems]] -- Condition-based maintenance triggers for derived knowledge systems — what to check, when to check it, and how to fix it
+## Tensions
+(Capture conflicts as they emerge)
+## Open Questions
+- What maintenance conditions have the highest impact-to-effort ratio?
+- How frequently should reweaving be triggered?
+---
+Topics:
+- [[index]]

package/methodology/markdown plus YAML plus ripgrep implements a queryable graph database without infrastructure.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+description: Wiki link edges, YAML metadata, faceted query dimensions, and soft validation compose into graph database capabilities where the infrastructure is just files and ripgrep
+kind: research
+topics: ["[[graph-structure]]", "[[discovery-retrieval]]"]
+confidence: speculative
+methodology: ["Original"]
+---
+# markdown plus YAML plus ripgrep implements a queryable graph database without infrastructure
+Graph databases offer three capabilities that flat document stores lack: typed edges between entities, structured property queries across nodes, and multi-hop traversal along relationship paths. Traditional implementations require infrastructure -- a graph database server, entity extraction pipelines, schema enforcement layers, query languages. But this vault achieves all three capabilities with nothing beyond markdown files, YAML frontmatter, and ripgrep.
+The argument rests on four layers, each covered by an existing note, that compose into something none of them claims individually.
+The first layer is graph edges. Since [[wiki links implement GraphRAG without the infrastructure]], explicit wiki links create a human-curated knowledge graph where every edge passed judgment. These are not statistical co-occurrences inferred by entity extraction but intentional connections with prose context explaining why the relationship exists. Multi-hop traversal works because the edges are semantic, not noisy -- following three curated links compounds signal rather than diluting it.
+The second layer is node properties. Since [[metadata reduces entropy enabling precision over recall]], YAML frontmatter pre-computes structured attributes for every node. Type, methodology, topics, status -- each field is a queryable dimension. `rg "^type: tension" 01_thinking/` is a structured property query, functionally equivalent to a Cypher `WHERE n.type = 'tension'` clause, but running against plain text files with no database server.
+The third layer is multi-dimensional access. Since [[faceted classification treats notes as multi-dimensional objects rather than folder contents]], Ranganathan's framework explains why these YAML fields compose multiplicatively. Two facets with five values each narrow the search space by roughly 25x, not 10x. This is the formal justification for why combining `type` and `methodology` filters produces precision that neither achieves alone -- the same compositional power that graph databases provide through multi-attribute queries, achieved through independent YAML fields and piped grep commands.
+The fourth layer is data integrity. Since [[schema enforcement via validation agents enables soft consistency]], asynchronous validation hooks maintain schema compliance without blocking creation. This is the consistency guarantee that keeps the "database" queryable over time -- without it, metadata fields drift, query patterns break, and the structured access degrades into the unstructured search it replaced.
+What makes this a genuine synthesis rather than just listing four features is that the layers are architecturally dependent. Faceted queries (layer three) require structured metadata (layer two). Multi-hop traversal (layer one) requires consistent link targets, which validation (layer four) maintains. Remove any layer and the others degrade: wiki links without metadata give you traversal but not filtering; metadata without validation gives you queries that rot; faceted access without curated edges gives you filtered isolation without connection. This compositional pattern -- independent single-concern components producing emergent capability -- is the same mechanism that [[hook composition creates emergent methodology from independent single-concern components]]. There, nine hooks compose into quality pipelines and session bookends no single hook was designed to create. Here, four structural conventions compose into graph database capabilities no single convention provides. The shared principle is that composition creates architecture that lives between the components rather than inside any one of them.
+The infrastructure cost is zero beyond what any filesystem provides. Since [[local-first file formats are inherently agent-native]], the entire graph database -- edges, properties, query engine, consistency checks -- lives in files that any text editor can read and any LLM can parse. No credentials, no server processes, no query language beyond ripgrep patterns. This means the graph database lives entirely within what [[four abstraction layers separate platform-agnostic from platform-dependent knowledge system features]] calls the foundation layer -- the bottom of the portability gradient, where exit velocity is maximum. Since [[data exit velocity measures how quickly content escapes vendor lock-in]], the infrastructure-free property is not just a philosophical claim but an auditable one: every graph operation (edge traversal, property query, faceted filtering) works with any text tool on any platform, and export means copying a folder.
+There is a genuine question about whether this framing illuminates or merely repackages. Calling markdown files a "graph database" risks the kind of metaphorical inflation where everything becomes everything else. The honest test is whether the framing reveals something the component notes miss individually. What it reveals is the compositional architecture: these four layers are not independent features that happen to coexist but a system whose layers depend on each other in specific ways. That dependency structure -- edges need consistency, queries need metadata, access needs facets -- is the database architecture hiding in plain text. And because [[the system is the argument]], the vault itself is the proof: every ripgrep query in CLAUDE.md, every backlink search in the scripts, every wiki link in every note is this graph database in operation. The claim is not theoretical -- it is the mechanism the vault uses to function.
+The claim remains open because the scaling question is unresolved. Graph databases handle millions of nodes and edges with indexed lookups. Ripgrep on YAML handles hundreds of notes with millisecond queries. At what scale does the infrastructure-free approach fail? Since [[navigation infrastructure passes through distinct scaling regimes that require qualitative strategy shifts]], the answer likely follows the same regime boundaries: at Regime 1 (1-50 notes) ripgrep is effortless, at Regime 2 (50-500) it remains fast but the navigation layer matters more, and at Regime 3 (500-5000+) the query engine likely needs augmentation. The bet is that for vaults up to roughly 10,000 notes, the filesystem-based approach matches or exceeds purpose-built infrastructure because the curation quality of edges and the precision of metadata compensate for the lack of indexed queries. When that crossover arrives, since [[intermediate representation pattern enables reliable vault operations beyond regex]], the evolutionary path is not abandoning the filesystem model but adding a parsed layer above it -- an intermediate representation that provides structured query reliability while the files remain the source of truth. This follows [[complex systems evolve from simple working systems]]: the graph database starts as files with conventions and adds infrastructure only where friction emerges, rather than deploying a database server on day one. Beyond that threshold, the answer is genuinely uncertain.
+---
+Source: /envision synthesis session (no source file -- synthesis of four existing notes)
+---
+Relevant Notes:
+- [[wiki links implement GraphRAG without the infrastructure]] -- foundation: provides the graph traversal layer where curated wiki links replace entity extraction pipelines and graph databases
+- [[metadata reduces entropy enabling precision over recall]] -- foundation: provides the query filter layer where YAML frontmatter pre-computes low-entropy representations that enable precision-first retrieval
+- [[faceted classification treats notes as multi-dimensional objects rather than folder contents]] -- foundation: provides the multi-dimensional access layer where independent YAML fields compose multiplicatively for retrieval
+- [[schema enforcement via validation agents enables soft consistency]] -- foundation: provides the consistency layer that keeps the database queryable over time without rigid constraints
+- [[local-first file formats are inherently agent-native]] -- substrate: explains why this infrastructure-free property holds; plain text with embedded metadata requires no external dependencies
+- [[structure enables navigation without reading everything]] -- the retrieval payoff: four structural mechanisms compose into discovery layers that make this graph database practically navigable
+- [[type field enables structured queries without folder hierarchies]] -- concrete example: type metadata demonstrates one facet dimension in action, enabling category-based queries via ripgrep
+- [[hook composition creates emergent methodology from independent single-concern components]] -- structural parallel: the same compositional pattern where independent single-concern components produce emergent capability that no component achieves alone
+- [[four abstraction layers separate platform-agnostic from platform-dependent knowledge system features]] -- portability grounding: the graph database lives entirely in the foundation layer, the bottom of the portability gradient with maximum exit velocity
+- [[data exit velocity measures how quickly content escapes vendor lock-in]] -- makes the infrastructure-free claim auditable: every graph operation works with any text tool, and export means copying a folder
+- [[the system is the argument]] -- self-referential proof: the vault IS this graph database in operation; every ripgrep query and wiki link traversal demonstrates the claim
+- [[navigation infrastructure passes through distinct scaling regimes that require qualitative strategy shifts]] -- scaling framework: regime boundaries predict where ripgrep-on-YAML reaches its limits and what augmentation the query layer needs
+- [[intermediate representation pattern enables reliable vault operations beyond regex]] -- evolutionary path: when ripgrep fragility emerges at scale, the next step is a parsed layer above the filesystem rather than abandoning the model
+- [[complex systems evolve from simple working systems]] -- design principle: the graph database evolved from conventions at friction points rather than being designed as infrastructure upfront
+- [[ten universal primitives form the kernel of every viable agent knowledge system]] -- kernel relationship: this note's four layers map to kernel primitives 1, 2, 6, and 7, showing the graph database is built from the non-negotiable base layer
+Topics:
+- [[graph-structure]]
+- [[discovery-retrieval]]

package/methodology/maturity field enables agent context prioritization.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+description: A seedling/developing/evergreen maturity field could help agents prefer mature notes when context is tight and surface seedlings for development
+kind: research
+topics: ["[[discovery-retrieval]]"]
+source: TFT research corpus (00_inbox/heinrich/)
+---
+# maturity field enables agent context prioritization
+The digital gardening tradition suggests that notes exist on a development spectrum. A seedling is a captured idea that hasn't been fully developed — maybe just a title and rough description. A developing note has some content but needs connections or deeper treatment. An evergreen is mature — well-connected, thoroughly reasoned, ready to be built upon.
+Currently, the system treats all notes equally during context loading. An agent building context for a task will load notes based on relevance (via semantic search or curated navigation), but two equally relevant notes get equal treatment even if one is a well-developed evergreen and the other is a half-formed seedling. Since [[LLM attention degrades as context fills]], this is a missed opportunity — the smart zone should be filled with the highest-value content, and note maturity signals value.
+Adding a `maturity:` field to thinking notes with values like seedling, developing, evergreen would enable smarter context loading decisions. Agents could prefer mature content when token budget is tight and surface seedlings when looking for development opportunities.
+Since [[progressive disclosure means reading right not reading less]], maturity signals would add another dimension to disclosure. Currently we have: file tree → description → outline → section → full note. Maturity could filter at the description level: "this is relevant but only a seedling, prioritize the evergreen alternative."
+The question remains whether note development stage actually predicts note usefulness in context. If a seedling on exactly the right topic is as useful as an evergreen on a related topic, maturity filtering adds overhead without benefit. If evergreens consistently enable better agent outputs than seedlings, maturity becomes a valuable signal.
+Since [[live index via periodic regeneration keeps discovery current]], maturity-based filtering becomes computationally free when pre-computed. A maintenance agent regenerating maturity indices at change boundaries would let agents filter seedlings from evergreens without query cost — the filtering decision happens before context loading, not during it.
+---
+Relevant Notes:
+- [[progressive disclosure means reading right not reading less]] — maturity adds a fourth dimension to progressive disclosure: alongside relevance, recency, and depth, note development stage becomes another curation signal
+- [[processing effort should follow retrieval demand]] — maturity signals which notes deserve development investment; seedlings need work, evergreens can be used as-is
+- [[LLM attention degrades as context fills]] — maturity filtering maximizes value in the scarce smart zone by preferring well-developed content when attention budget is tight
+- [[descriptions are retrieval filters not summaries]] — maturity is another retrieval filter dimension; descriptions filter by relevance, maturity filters by development completeness
+- [[trails transform ephemeral navigation into persistent artifacts]] — sibling claim from the same batch, exploring TFT patterns for agent optimization
+- [[spaced repetition scheduling could optimize vault maintenance]] — if maturity tracks development stage, scheduling uses that signal to allocate review intervals; seedlings get frequent checks, evergreens get sparse confirmation
+- [[live index via periodic regeneration keeps discovery current]] — maturity-based indices become computationally free when pre-computed at change boundaries
+Topics:
+- [[discovery-retrieval]]

package/methodology/memory-architecture.md ADDED Viewed

@@ -0,0 +1,27 @@
+---
+description: Three-space separation (notes, self, ops) -- why content boundaries matter for retrieval and trust
+type: moc
+---
+# memory-architecture
+The three-space architecture: notes/ (domain knowledge), self/ (agent identity), ops/ (operational scaffolding). Why boundaries matter, what happens when they dissolve, and how to detect boundary violations.
+## Core Ideas
+### Guidance
+- [[build automatic memory through cognitive offloading and session handoffs]] -- How to build automatic memory systems that compound over time — cognitive offloading as foundation, the retrieval bottle
+## Tensions
+(Capture conflicts as they emerge)
+## Open Questions
+- When is self/ genuinely needed vs when does ops/ suffice?
+- How do three-space boundaries interact with semantic search?
+---
+Topics:
+- [[index]]