arscontexta 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +11 -0
- package/.claude-plugin/plugin.json +22 -0
- package/README.md +683 -0
- package/agents/knowledge-guide.md +49 -0
- package/bin/cli.mjs +66 -0
- package/generators/agents-md.md +240 -0
- package/generators/claude-md.md +379 -0
- package/generators/features/atomic-notes.md +124 -0
- package/generators/features/ethical-guardrails.md +58 -0
- package/generators/features/graph-analysis.md +188 -0
- package/generators/features/helper-functions.md +92 -0
- package/generators/features/maintenance.md +164 -0
- package/generators/features/methodology-knowledge.md +70 -0
- package/generators/features/mocs.md +144 -0
- package/generators/features/multi-domain.md +61 -0
- package/generators/features/personality.md +71 -0
- package/generators/features/processing-pipeline.md +428 -0
- package/generators/features/schema.md +149 -0
- package/generators/features/self-evolution.md +229 -0
- package/generators/features/self-space.md +78 -0
- package/generators/features/semantic-search.md +99 -0
- package/generators/features/session-rhythm.md +85 -0
- package/generators/features/templates.md +85 -0
- package/generators/features/wiki-links.md +88 -0
- package/generators/soul-md.md +121 -0
- package/hooks/hooks.json +45 -0
- package/hooks/scripts/auto-commit.sh +44 -0
- package/hooks/scripts/session-capture.sh +35 -0
- package/hooks/scripts/session-orient.sh +86 -0
- package/hooks/scripts/write-validate.sh +42 -0
- package/methodology/AI shifts knowledge systems from externalizing memory to externalizing attention.md +59 -0
- package/methodology/BM25 retrieval fails on full-length descriptions because query term dilution reduces match scores.md +39 -0
- package/methodology/IBIS framework maps claim-based architecture to structured argumentation.md +58 -0
- package/methodology/LLM attention degrades as context fills.md +49 -0
- package/methodology/MOC construction forces synthesis that automated generation from metadata cannot replicate.md +49 -0
- package/methodology/MOC maintenance investment compounds because orientation savings multiply across every future session.md +41 -0
- package/methodology/MOCs are attention management devices not just organizational tools.md +51 -0
- package/methodology/PKM failure follows a predictable cycle.md +50 -0
- package/methodology/ThreadMode to DocumentMode transformation is the core value creation step.md +52 -0
- package/methodology/WIP limits force processing over accumulation.md +53 -0
- package/methodology/Zeigarnik effect validates capture-first philosophy because open loops drain attention.md +42 -0
- package/methodology/academic research uses structured extraction with cross-source synthesis.md +566 -0
- package/methodology/adapt the four-phase processing pipeline to domain-specific throughput needs.md +197 -0
- package/methodology/agent notes externalize navigation intuition that search cannot discover and traversal cannot reconstruct.md +48 -0
- package/methodology/agent self-memory should be architecturally separate from user knowledge systems.md +48 -0
- package/methodology/agent session boundaries create natural automation checkpoints that human-operated systems lack.md +56 -0
- package/methodology/agent-cognition.md +107 -0
- package/methodology/agents are simultaneously methodology executors and subjects creating a unique trust asymmetry.md +66 -0
- package/methodology/aspect-oriented programming solved the same cross-cutting concern problem that hooks solve.md +39 -0
- package/methodology/associative ontologies beat hierarchical taxonomies because heterarchy adapts while hierarchy brittles.md +53 -0
- package/methodology/attention residue may have a minimum granularity that cannot be subdivided.md +46 -0
- package/methodology/auto-commit hooks eliminate prospective memory failures by converting remember-to-act into guaranteed execution.md +47 -0
- package/methodology/automated detection is always safe because it only reads state while automated remediation risks content corruption.md +42 -0
- package/methodology/automation should be retired when its false positive rate exceeds its true positive rate or it catches zero issues.md +56 -0
- package/methodology/backlinks implicitly define notes by revealing usage context.md +35 -0
- package/methodology/backward maintenance asks what would be different if written today.md +62 -0
- package/methodology/balance onboarding enforcement and questions to prevent premature complexity.md +229 -0
- package/methodology/basic level categorization determines optimal MOC granularity.md +51 -0
- package/methodology/batching by context similarity reduces switching costs in agent processing.md +43 -0
- package/methodology/behavioral anti-patterns matter more than tool selection.md +42 -0
- package/methodology/betweenness centrality identifies bridge notes connecting disparate knowledge domains.md +57 -0
- package/methodology/blueprints that teach construction outperform downloads that provide pre-built code for platform-dependent modules.md +42 -0
- package/methodology/bootstrapping principle enables self-improving systems.md +62 -0
- package/methodology/build automatic memory through cognitive offloading and session handoffs.md +285 -0
- package/methodology/capture the reaction to content not just the content itself.md +41 -0
- package/methodology/claims must be specific enough to be wrong.md +36 -0
- package/methodology/closure rituals create clean breaks that prevent attention residue bleed.md +44 -0
- package/methodology/cognitive offloading is the architectural foundation for vault design.md +46 -0
- package/methodology/cognitive outsourcing risk in agent-operated systems.md +55 -0
- package/methodology/coherence maintains consistency despite inconsistent inputs.md +96 -0
- package/methodology/coherent architecture emerges from wiki links spreading activation and small-world topology.md +48 -0
- package/methodology/community detection algorithms can inform when MOCs should split or merge.md +52 -0
- package/methodology/complete navigation requires four complementary types that no single mechanism provides.md +43 -0
- package/methodology/complex systems evolve from simple working systems.md +59 -0
- package/methodology/composable knowledge architecture builds systems from independent toggleable modules not monolithic templates.md +61 -0
- package/methodology/compose multi-domain systems through separate templates and shared graph.md +372 -0
- package/methodology/concept-orientation beats source-orientation for cross-domain connections.md +51 -0
- package/methodology/confidence thresholds gate automated action between the mechanical and judgment zones.md +50 -0
- package/methodology/configuration dimensions interact so choices in one create pressure on others.md +58 -0
- package/methodology/configuration paralysis emerges when derivation surfaces too many decisions.md +44 -0
- package/methodology/context files function as agent operating systems through self-referential self-extension.md +46 -0
- package/methodology/context phrase clarity determines how deep a navigation hierarchy can scale.md +46 -0
- package/methodology/continuous small-batch processing eliminates review dread.md +48 -0
- package/methodology/controlled disorder engineers serendipity through semantic rather than topical linking.md +51 -0
- package/methodology/creative writing uses worldbuilding consistency with character tracking.md +672 -0
- package/methodology/cross-links between MOC territories indicate creative leaps and integration depth.md +43 -0
- package/methodology/dangling links reveal which notes want to exist.md +62 -0
- package/methodology/data exit velocity measures how quickly content escapes vendor lock-in.md +74 -0
- package/methodology/decontextualization risk means atomicity may strip meaning that cannot be recovered.md +48 -0
- package/methodology/dense interlinked research claims enable derivation while sparse references only enable templating.md +47 -0
- package/methodology/dependency resolution through topological sort makes module composition transparent and verifiable.md +56 -0
- package/methodology/derivation generates knowledge systems from composable research claims not template customization.md +63 -0
- package/methodology/derivation-engine.md +27 -0
- package/methodology/derived systems follow a seed-evolve-reseed lifecycle.md +56 -0
- package/methodology/description quality for humans diverges from description quality for keyword search.md +73 -0
- package/methodology/descriptions are retrieval filters not summaries.md +112 -0
- package/methodology/design MOCs as attention management devices with lifecycle governance.md +318 -0
- package/methodology/design-dimensions.md +66 -0
- package/methodology/digital mutability enables note evolution that physical permanence forbids.md +54 -0
- package/methodology/discovery-retrieval.md +48 -0
- package/methodology/distinctiveness scoring treats description quality as measurable.md +69 -0
- package/methodology/does agent processing recover what fast capture loses.md +43 -0
- package/methodology/domain-compositions.md +37 -0
- package/methodology/dual-coding with visual elements could enhance agent traversal.md +55 -0
- package/methodology/each module must be describable in one sentence under 200 characters or it does too many things.md +45 -0
- package/methodology/each new note compounds value by creating traversal paths.md +55 -0
- package/methodology/eight configuration dimensions parameterize the space of possible knowledge systems.md +56 -0
- package/methodology/elaborative encoding is the quality gate for new notes.md +55 -0
- package/methodology/enforce schema with graduated strictness across capture processing and query zones.md +221 -0
- package/methodology/enforcing atomicity can create paralysis when ideas resist decomposition.md +43 -0
- package/methodology/engineering uses technical decision tracking with architectural memory.md +766 -0
- package/methodology/every knowledge domain shares a four-phase processing skeleton that diverges only in the process step.md +53 -0
- package/methodology/evolution observations provide actionable signals for system adaptation.md +67 -0
- package/methodology/external memory shapes cognition more than base model.md +60 -0
- package/methodology/faceted classification treats notes as multi-dimensional objects rather than folder contents.md +65 -0
- package/methodology/failure-modes.md +27 -0
- package/methodology/false universalism applies same processing logic regardless of domain.md +49 -0
- package/methodology/federated wiki pattern enables multi-agent divergence as feature not bug.md +59 -0
- package/methodology/flat files break at retrieval scale.md +75 -0
- package/methodology/forced engagement produces weak connections.md +48 -0
- package/methodology/four abstraction layers separate platform-agnostic from platform-dependent knowledge system features.md +47 -0
- package/methodology/fresh context per task preserves quality better than chaining phases.md +44 -0
- package/methodology/friction reveals architecture.md +63 -0
- package/methodology/friction-driven module adoption prevents configuration debt by adding complexity only at pain points.md +48 -0
- package/methodology/gardening cycle implements tend prune fertilize operations.md +41 -0
- package/methodology/generation effect gate blocks processing without transformation.md +40 -0
- package/methodology/goal-driven memory orchestration enables autonomous domain learning through directed compute allocation.md +41 -0
- package/methodology/good descriptions layer heuristic then mechanism then implication.md +57 -0
- package/methodology/graph-structure.md +65 -0
- package/methodology/guided notes might outperform post-hoc structuring for high-volume capture.md +37 -0
- package/methodology/health wellness uses symptom-trigger correlation with multi-dimensional tracking.md +819 -0
- package/methodology/hook composition creates emergent methodology from independent single-concern components.md +47 -0
- package/methodology/hook enforcement guarantees quality while instruction enforcement merely suggests it.md +51 -0
- package/methodology/hook-driven learning loops create self-improving methodology through observation accumulation.md +62 -0
- package/methodology/hooks are the agent habit system that replaces the missing basal ganglia.md +40 -0
- package/methodology/hooks cannot replace genuine cognitive engagement yet more automation is always tempting.md +87 -0
- package/methodology/hooks enable context window efficiency by delegating deterministic checks to external processes.md +47 -0
- package/methodology/idempotent maintenance operations are safe to automate because running them twice produces the same result as running them once.md +44 -0
- package/methodology/implement condition-based maintenance triggers for derived systems.md +255 -0
- package/methodology/implicit dependencies create distributed monoliths that fail silently across configurations.md +58 -0
- package/methodology/implicit knowledge emerges from traversal.md +55 -0
- package/methodology/incremental formalization happens through repeated touching of old notes.md +60 -0
- package/methodology/incremental reading enables cross-source connection finding.md +39 -0
- package/methodology/index.md +32 -0
- package/methodology/inline links carry richer relationship data than metadata fields.md +91 -0
- package/methodology/insight accretion differs from productivity in knowledge systems.md +41 -0
- package/methodology/intermediate packets enable assembly over creation.md +52 -0
- package/methodology/intermediate representation pattern enables reliable vault operations beyond regex.md +62 -0
- package/methodology/justification chains enable forward backward and evolution reasoning about configuration decisions.md +46 -0
- package/methodology/knowledge system architecture is parameterized by platform capabilities not fixed by methodology.md +51 -0
- package/methodology/knowledge systems become communication partners through complexity and memory humans cannot sustain.md +47 -0
- package/methodology/knowledge systems share universal operations and structural components across all methodology traditions.md +46 -0
- package/methodology/legal case management uses precedent chains with regulatory change propagation.md +892 -0
- package/methodology/live index via periodic regeneration keeps discovery current.md +58 -0
- package/methodology/local-first file formats are inherently agent-native.md +69 -0
- package/methodology/logic column pattern separates reasoning from procedure.md +35 -0
- package/methodology/maintenance operations are more universal than creative pipelines because structural health is domain-invariant.md +47 -0
- package/methodology/maintenance scheduling frequency should match consequence speed not detection capability.md +50 -0
- package/methodology/maintenance targeting should prioritize mechanism and theory notes.md +26 -0
- package/methodology/maintenance-patterns.md +72 -0
- package/methodology/markdown plus YAML plus ripgrep implements a queryable graph database without infrastructure.md +55 -0
- package/methodology/maturity field enables agent context prioritization.md +33 -0
- package/methodology/memory-architecture.md +27 -0
- package/methodology/metacognitive confidence can diverge from retrieval capability.md +42 -0
- package/methodology/metadata reduces entropy enabling precision over recall.md +91 -0
- package/methodology/methodology development should follow the trajectory from documentation to skill to hook as understanding hardens.md +80 -0
- package/methodology/methodology traditions are named points in a shared configuration space not competing paradigms.md +64 -0
- package/methodology/mnemonic medium embeds verification into navigation.md +46 -0
- package/methodology/module communication through shared YAML fields creates loose coupling without direct dependencies.md +44 -0
- package/methodology/module deactivation must account for structural artifacts that survive the toggle.md +49 -0
- package/methodology/multi-domain systems compose through separate templates and shared graph.md +61 -0
- package/methodology/multi-domain-composition.md +27 -0
- package/methodology/narrow folksonomy optimizes for single-operator retrieval unlike broad consensus tagging.md +53 -0
- package/methodology/navigation infrastructure passes through distinct scaling regimes that require qualitative strategy shifts.md +48 -0
- package/methodology/navigational vertigo emerges in pure association systems without local hierarchy.md +54 -0
- package/methodology/note titles should function as APIs enabling sentence transclusion.md +51 -0
- package/methodology/note-design.md +57 -0
- package/methodology/notes are skills /342/200/224 curated knowledge injected when relevant.md" +62 -0
- package/methodology/notes function as cognitive anchors that stabilize attention during complex tasks.md +41 -0
- package/methodology/novel domains derive by mapping knowledge type to closest reference domain then adapting.md +50 -0
- package/methodology/nudge theory explains graduated hook enforcement as choice architecture for agents.md +59 -0
- package/methodology/observation and tension logs function as dead-letter queues for failed automation.md +51 -0
- package/methodology/operational memory and knowledge memory serve different functions in agent architecture.md +48 -0
- package/methodology/operational wisdom requires contextual observation.md +52 -0
- package/methodology/orchestrated vault creation transforms arscontexta from tool to autonomous knowledge factory.md +40 -0
- package/methodology/organic emergence versus active curation creates a fundamental vault governance tension.md +68 -0
- package/methodology/orphan notes are seeds not failures.md +38 -0
- package/methodology/over-automation corrupts quality when hooks encode judgment rather than verification.md +62 -0
- package/methodology/people relationships uses Dunbar-layered graphs with interaction tracking.md +659 -0
- package/methodology/personal assistant uses life area management with review automation.md +610 -0
- package/methodology/platform adapter translation is semantic not mechanical because hook event meanings differ.md +40 -0
- package/methodology/platform capability tiers determine which knowledge system features can be implemented.md +48 -0
- package/methodology/platform fragmentation means identical conceptual operations require different implementations across agent environments.md +44 -0
- package/methodology/premature complexity is the most common derivation failure mode.md +45 -0
- package/methodology/prevent domain-specific failure modes through the vulnerability matrix.md +336 -0
- package/methodology/processing effort should follow retrieval demand.md +57 -0
- package/methodology/processing-workflows.md +75 -0
- package/methodology/product management uses feedback pipelines with experiment tracking.md +789 -0
- package/methodology/productivity porn risk in meta-system building.md +30 -0
- package/methodology/programmable notes could enable property-triggered workflows.md +64 -0
- package/methodology/progressive disclosure means reading right not reading less.md +69 -0
- package/methodology/progressive schema validates only what active modules require not the full system schema.md +49 -0
- package/methodology/project management uses decision tracking with stakeholder context.md +776 -0
- package/methodology/propositional link semantics transform wiki links from associative to reasoned.md +87 -0
- package/methodology/prospective memory requires externalization.md +53 -0
- package/methodology/provenance tracks where beliefs come from.md +62 -0
- package/methodology/queries evolve during search so agents should checkpoint.md +35 -0
- package/methodology/question-answer metadata enables inverted search patterns.md +39 -0
- package/methodology/random note resurfacing prevents write-only memory.md +33 -0
- package/methodology/reconciliation loops that compare desired state to actual state enable drift correction without continuous monitoring.md +59 -0
- package/methodology/reflection synthesizes existing notes into new insight.md +100 -0
- package/methodology/retrieval utility should drive design over capture completeness.md +69 -0
- package/methodology/retrieval verification loop tests description quality at scale.md +81 -0
- package/methodology/role field makes graph structure explicit.md +94 -0
- package/methodology/scaffolding enables divergence that fine-tuning cannot.md +67 -0
- package/methodology/schema enforcement via validation agents enables soft consistency.md +60 -0
- package/methodology/schema evolution follows observe-then-formalize not design-then-enforce.md +65 -0
- package/methodology/schema field names are the only domain specific element in the universal note pattern.md +46 -0
- package/methodology/schema fields should use domain-native vocabulary not abstract terminology.md +47 -0
- package/methodology/schema templates reduce cognitive overhead at capture time.md +55 -0
- package/methodology/schema validation hooks externalize inhibitory control that degrades under cognitive load.md +48 -0
- package/methodology/schema-enforcement.md +27 -0
- package/methodology/self-extension requires context files to contain platform operations knowledge not just methodology.md +47 -0
- package/methodology/sense-making vs storage does compression lose essential nuance.md +73 -0
- package/methodology/session boundary hooks implement cognitive bookends for orientation and reflection.md +60 -0
- package/methodology/session handoff creates continuity without persistent memory.md +43 -0
- package/methodology/session outputs are packets for future selves.md +43 -0
- package/methodology/session transcript mining enables experiential validation that structural tests cannot provide.md +38 -0
- package/methodology/skill context budgets constrain knowledge system complexity on agent platforms.md +52 -0
- package/methodology/skills encode methodology so manual execution bypasses quality gates.md +50 -0
- package/methodology/small-world topology requires hubs and dense local links.md +99 -0
- package/methodology/source attribution enables tracing claims to foundations.md +38 -0
- package/methodology/spaced repetition scheduling could optimize vault maintenance.md +44 -0
- package/methodology/spreading activation models how agents should traverse.md +79 -0
- package/methodology/stale navigation actively misleads because agents trust curated maps completely.md +43 -0
- package/methodology/stigmergy coordinates agents through environmental traces without direct communication.md +62 -0
- package/methodology/storage versus thinking distinction determines which tool patterns apply.md +56 -0
- package/methodology/structure enables navigation without reading everything.md +52 -0
- package/methodology/structure without processing provides no value.md +56 -0
- package/methodology/student learning uses prerequisite graphs with spaced retrieval.md +770 -0
- package/methodology/summary coherence tests composability before filing.md +37 -0
- package/methodology/tag rot applies to wiki links because titles serve as both identifier and display text.md +50 -0
- package/methodology/temporal media must convert to spatial text for agent traversal.md +43 -0
- package/methodology/temporal processing priority creates age-based inbox urgency.md +45 -0
- package/methodology/temporal separation of capture and processing preserves context freshness.md +39 -0
- package/methodology/ten universal primitives form the kernel of every viable agent knowledge system.md +162 -0
- package/methodology/testing effect could enable agent knowledge verification.md +38 -0
- package/methodology/the AgentSkills standard embodies progressive disclosure at the skill level.md +40 -0
- package/methodology/the derivation engine improves recursively as deployed systems generate observations.md +49 -0
- package/methodology/the determinism boundary separates hook methodology from skill methodology.md +46 -0
- package/methodology/the fix-versus-report decision depends on determinism reversibility and accumulated trust.md +45 -0
- package/methodology/the generation effect requires active transformation not just storage.md +57 -0
- package/methodology/the no wrong patches guarantee ensures any valid module combination produces a valid system.md +58 -0
- package/methodology/the system is the argument.md +46 -0
- package/methodology/the vault constitutes identity for agents.md +86 -0
- package/methodology/the vault methodology transfers because it encodes cognitive science not domain specifics.md +47 -0
- package/methodology/therapy journal uses warm personality with pattern detection for emotional processing.md +584 -0
- package/methodology/three capture schools converge through agent-mediated synthesis.md +55 -0
- package/methodology/three concurrent maintenance loops operate at different timescales to catch different classes of problems.md +56 -0
- package/methodology/throughput matters more than accumulation.md +58 -0
- package/methodology/title as claim enables traversal as reasoning.md +50 -0
- package/methodology/topological organization beats temporal for knowledge work.md +52 -0
- package/methodology/trading uses conviction tracking with thesis-outcome correlation.md +699 -0
- package/methodology/trails transform ephemeral navigation into persistent artifacts.md +39 -0
- package/methodology/transform universal vocabulary to domain-native language through six levels.md +259 -0
- package/methodology/type field enables structured queries without folder hierarchies.md +53 -0
- package/methodology/use-case presets dissolve the tension between composability and simplicity.md +44 -0
- package/methodology/vault conventions may impose hidden rigidity on thinking.md +44 -0
- package/methodology/verbatim risk applies to agents too.md +31 -0
- package/methodology/vibe notetaking is the emerging industry consensus for AI-native self-organization.md +56 -0
- package/methodology/vivid memories need verification.md +45 -0
- package/methodology/vocabulary-transformation.md +27 -0
- package/methodology/voice capture is the highest-bandwidth channel for agent-delegated knowledge systems.md +45 -0
- package/methodology/wiki links are the digital evolution of analog indexing.md +73 -0
- package/methodology/wiki links as social contract transforms agents into stewards of incomplete references.md +52 -0
- package/methodology/wiki links create navigation paths that shape retrieval.md +63 -0
- package/methodology/wiki links implement GraphRAG without the infrastructure.md +101 -0
- package/methodology/writing for audience blocks authentic creation.md +22 -0
- package/methodology/you operate a system that takes notes.md +79 -0
- package/openclaw/SKILL.md +110 -0
- package/package.json +45 -0
- package/platforms/README.md +51 -0
- package/platforms/claude-code/generator.md +61 -0
- package/platforms/claude-code/hooks/README.md +186 -0
- package/platforms/claude-code/hooks/auto-commit.sh.template +38 -0
- package/platforms/claude-code/hooks/session-capture.sh.template +72 -0
- package/platforms/claude-code/hooks/session-orient.sh.template +189 -0
- package/platforms/claude-code/hooks/write-validate.sh.template +106 -0
- package/platforms/openclaw/generator.md +82 -0
- package/platforms/openclaw/hooks/README.md +89 -0
- package/platforms/openclaw/hooks/bootstrap.ts.template +224 -0
- package/platforms/openclaw/hooks/command-new.ts.template +165 -0
- package/platforms/openclaw/hooks/heartbeat.ts.template +214 -0
- package/platforms/shared/features/README.md +70 -0
- package/platforms/shared/skill-blocks/graph.md +145 -0
- package/platforms/shared/skill-blocks/learn.md +119 -0
- package/platforms/shared/skill-blocks/next.md +131 -0
- package/platforms/shared/skill-blocks/pipeline.md +326 -0
- package/platforms/shared/skill-blocks/ralph.md +616 -0
- package/platforms/shared/skill-blocks/reduce.md +1142 -0
- package/platforms/shared/skill-blocks/refactor.md +129 -0
- package/platforms/shared/skill-blocks/reflect.md +780 -0
- package/platforms/shared/skill-blocks/remember.md +524 -0
- package/platforms/shared/skill-blocks/rethink.md +574 -0
- package/platforms/shared/skill-blocks/reweave.md +680 -0
- package/platforms/shared/skill-blocks/seed.md +320 -0
- package/platforms/shared/skill-blocks/stats.md +145 -0
- package/platforms/shared/skill-blocks/tasks.md +171 -0
- package/platforms/shared/skill-blocks/validate.md +323 -0
- package/platforms/shared/skill-blocks/verify.md +562 -0
- package/platforms/shared/templates/README.md +35 -0
- package/presets/experimental/categories.yaml +1 -0
- package/presets/experimental/preset.yaml +38 -0
- package/presets/experimental/starter/README.md +7 -0
- package/presets/experimental/vocabulary.yaml +7 -0
- package/presets/personal/categories.yaml +7 -0
- package/presets/personal/preset.yaml +41 -0
- package/presets/personal/starter/goals.md +21 -0
- package/presets/personal/starter/index.md +17 -0
- package/presets/personal/starter/life-areas.md +21 -0
- package/presets/personal/starter/people.md +21 -0
- package/presets/personal/vocabulary.yaml +32 -0
- package/presets/research/categories.yaml +8 -0
- package/presets/research/preset.yaml +41 -0
- package/presets/research/starter/index.md +17 -0
- package/presets/research/starter/methods.md +21 -0
- package/presets/research/starter/open-questions.md +21 -0
- package/presets/research/vocabulary.yaml +33 -0
- package/reference/AUDIT-REPORT.md +238 -0
- package/reference/claim-map.md +172 -0
- package/reference/components.md +327 -0
- package/reference/conversation-patterns.md +542 -0
- package/reference/derivation-validation.md +649 -0
- package/reference/dimension-claim-map.md +134 -0
- package/reference/evolution-lifecycle.md +297 -0
- package/reference/failure-modes.md +235 -0
- package/reference/interaction-constraints.md +204 -0
- package/reference/kernel.yaml +242 -0
- package/reference/methodology.md +283 -0
- package/reference/open-questions.md +279 -0
- package/reference/personality-layer.md +302 -0
- package/reference/self-space.md +299 -0
- package/reference/semantic-vs-keyword.md +288 -0
- package/reference/session-lifecycle.md +298 -0
- package/reference/templates/base-note.md +16 -0
- package/reference/templates/companion-note.md +70 -0
- package/reference/templates/creative-note.md +16 -0
- package/reference/templates/learning-note.md +16 -0
- package/reference/templates/life-note.md +16 -0
- package/reference/templates/moc.md +26 -0
- package/reference/templates/relationship-note.md +17 -0
- package/reference/templates/research-note.md +19 -0
- package/reference/templates/session-log.md +24 -0
- package/reference/templates/therapy-note.md +16 -0
- package/reference/test-fixtures/edge-case-constraints.md +148 -0
- package/reference/test-fixtures/multi-domain.md +164 -0
- package/reference/test-fixtures/novel-domain-gaming.md +138 -0
- package/reference/test-fixtures/research-minimal.md +102 -0
- package/reference/test-fixtures/therapy-full.md +155 -0
- package/reference/testing-milestones.md +1087 -0
- package/reference/three-spaces.md +363 -0
- package/reference/tradition-presets.md +203 -0
- package/reference/use-case-presets.md +341 -0
- package/reference/validate-kernel.sh +432 -0
- package/reference/vocabulary-transforms.md +85 -0
- package/scripts/sync-thinking.sh +147 -0
- package/skill-sources/graph/SKILL.md +567 -0
- package/skill-sources/graph/skill.json +17 -0
- package/skill-sources/learn/SKILL.md +254 -0
- package/skill-sources/learn/skill.json +17 -0
- package/skill-sources/next/SKILL.md +407 -0
- package/skill-sources/next/skill.json +17 -0
- package/skill-sources/pipeline/SKILL.md +314 -0
- package/skill-sources/pipeline/skill.json +17 -0
- package/skill-sources/ralph/SKILL.md +604 -0
- package/skill-sources/ralph/skill.json +17 -0
- package/skill-sources/reduce/SKILL.md +1113 -0
- package/skill-sources/reduce/skill.json +17 -0
- package/skill-sources/refactor/SKILL.md +448 -0
- package/skill-sources/refactor/skill.json +17 -0
- package/skill-sources/reflect/SKILL.md +747 -0
- package/skill-sources/reflect/skill.json +17 -0
- package/skill-sources/remember/SKILL.md +534 -0
- package/skill-sources/remember/skill.json +17 -0
- package/skill-sources/rethink/SKILL.md +658 -0
- package/skill-sources/rethink/skill.json +17 -0
- package/skill-sources/reweave/SKILL.md +657 -0
- package/skill-sources/reweave/skill.json +17 -0
- package/skill-sources/seed/SKILL.md +303 -0
- package/skill-sources/seed/skill.json +17 -0
- package/skill-sources/stats/SKILL.md +371 -0
- package/skill-sources/stats/skill.json +17 -0
- package/skill-sources/tasks/SKILL.md +402 -0
- package/skill-sources/tasks/skill.json +17 -0
- package/skill-sources/validate/SKILL.md +310 -0
- package/skill-sources/validate/skill.json +17 -0
- package/skill-sources/verify/SKILL.md +532 -0
- package/skill-sources/verify/skill.json +17 -0
- package/skills/add-domain/SKILL.md +441 -0
- package/skills/add-domain/skill.json +17 -0
- package/skills/architect/SKILL.md +568 -0
- package/skills/architect/skill.json +17 -0
- package/skills/ask/SKILL.md +388 -0
- package/skills/ask/skill.json +17 -0
- package/skills/health/SKILL.md +760 -0
- package/skills/health/skill.json +17 -0
- package/skills/help/SKILL.md +348 -0
- package/skills/help/skill.json +17 -0
- package/skills/recommend/SKILL.md +553 -0
- package/skills/recommend/skill.json +17 -0
- package/skills/reseed/SKILL.md +385 -0
- package/skills/reseed/skill.json +17 -0
- package/skills/setup/SKILL.md +1688 -0
- package/skills/setup/skill.json +17 -0
- package/skills/tutorial/SKILL.md +496 -0
- package/skills/tutorial/skill.json +17 -0
- package/skills/upgrade/SKILL.md +395 -0
- package/skills/upgrade/skill.json +17 -0
|
@@ -0,0 +1,766 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: engineering-team knowledge system — inspirational composition showing derived architecture for technical decision tracking, dependency graphs, and architectural memory
|
|
3
|
+
kind: example
|
|
4
|
+
domain: engineering
|
|
5
|
+
topics: ["[[domain-compositions]]"]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# engineering uses technical decision tracking with architectural memory
|
|
9
|
+
|
|
10
|
+
A derived architecture for a team that needs to remember why things are the way they are, understand what depends on what, and learn from what went wrong. Not documentation in the traditional sense — a living architectural memory that connects decisions to consequences, incidents to root causes, and changes to their ripple effects across the system.
|
|
11
|
+
|
|
12
|
+
The agent translation here is about temporal scale and cross-referencing scope. Human engineers remember the decisions they were personally involved in, for about a year. After that, institutional knowledge degrades: people leave, context fades, and "why is it like this?" becomes unanswerable. An agent never forgets a decision, never loses the thread between an ADR from 2024 and an incident in 2026, and can trace the dependency graph across hundreds of services without holding any of it in working memory. The system transforms tribal knowledge into traversable architecture.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Persona
|
|
17
|
+
|
|
18
|
+
**The Meridian Platform Team** is a 28-person engineering organization split into five squads, building a B2B analytics platform. They have been operating for four years, which means they have accumulated roughly 200 architecture decisions, 85 incident postmortems, a dependency graph spanning 40+ microservices, and a tech debt backlog that nobody has a complete picture of. Their documentation lives in Confluence (where it goes to die), Notion (where it gets half-updated), Slack threads (where it is unsearchable), and the heads of senior engineers (where it leaves when they do).
|
|
19
|
+
|
|
20
|
+
**Ravi Krishnamurthy**, the Staff Engineer, is the primary operator. He is the person people go to when they need to understand why a system was built a certain way, and he is running out of head-space. He has watched two other staff engineers leave in the past year, each taking irreplaceable context about subsystems they designed. Ravi does not want to be a bottleneck. He wants a system where the architecture explains itself — where a new engineer can follow a dependency edge from a service to the decisions that shaped it, to the incidents that tested it, to the trade-offs that constrain it.
|
|
21
|
+
|
|
22
|
+
Ravi's agent operates as the team's architectural memory. When someone proposes a change to the payment processing service, the agent surfaces: which ADRs constrain this service's design, which other services depend on it, what incidents have involved it, and what tech debt exists in it. The agent does not make the decision — it ensures the decision-maker has the full picture, including the parts nobody currently remembers.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Configuration
|
|
27
|
+
|
|
28
|
+
| Dimension | Position | Rationale |
|
|
29
|
+
|-----------|----------|-----------|
|
|
30
|
+
| **Granularity** | Atomic for decisions and incidents, compound for service descriptions | Each ADR is a distinct, referenceable decision. Each postmortem is a distinct event. But a service description is inherently compound — architecture, dependencies, operational characteristics, and known issues all belong together because they are accessed together. |
|
|
31
|
+
| **Organization** | Flat within categories, with type-based subdirectories | ADRs, postmortems, services, and runbooks each have their own directory. Within each directory, files are flat. The graph connects across directories: an ADR links to the services it affects, a postmortem links to the ADR that created the vulnerable architecture. |
|
|
32
|
+
| **Linking** | Heavily explicit with typed relationships | Engineering decisions have precise relationships: ADR-017 *supersedes* ADR-003. Service A *depends on* Service B. Postmortem PM-042 *was caused by* tech debt TD-019. Vague "related" links are nearly useless here — the type of relationship determines what you do with the information. |
|
|
33
|
+
| **Metadata** | Dense — structured fields enable automated impact analysis | Every ADR needs status, affected services, superseded_by. Every postmortem needs severity, affected services, root cause category. Every service needs dependencies, SLA, team owner. The density pays for itself through programmatic queries: "show me every P1 incident that involved the payment service in the last six months." |
|
|
34
|
+
| **Processing** | Medium — capture is lightweight, connection is heavy | Writing an ADR or postmortem is already a significant cognitive investment. The agent's processing work is in connections: ensuring every new ADR links to affected services, every postmortem links to relevant ADRs, every service's dependency graph stays current. |
|
|
35
|
+
| **Formalization** | High — templates enforce consistency, status lifecycles are explicit | ADRs without status tracking are just documents. Postmortems without action item tracking are just stories. The value comes from lifecycle management: proposed to accepted to deprecated to superseded. Templates enforce the fields that enable this. |
|
|
36
|
+
| **Review** | Event-triggered plus monthly staleness sweep | New deployments trigger dependency verification. Incidents trigger postmortem creation within 48 hours. Monthly sweeps check for stale runbooks, outdated ADRs, and unresolved postmortem action items. |
|
|
37
|
+
| **Scope** | Team-wide with per-squad ownership | The entire engineering org shares one architectural memory. Individual notes have owners (squads), but the graph is shared. A dependency does not care about team boundaries. |
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Vault Structure
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
vault/
|
|
45
|
+
├── self/
|
|
46
|
+
│ ├── identity.md # Agent identity, operational principles
|
|
47
|
+
│ └── memory/
|
|
48
|
+
│ └── [operational learnings]
|
|
49
|
+
├── notes/
|
|
50
|
+
│ ├── index.md # Hub: entry point
|
|
51
|
+
│ ├── architecture.md # MOC: architectural patterns and principles
|
|
52
|
+
│ ├── services.md # MOC: service catalog with dependency overview
|
|
53
|
+
│ ├── incidents.md # MOC: postmortem patterns and trends
|
|
54
|
+
│ ├── tech-debt.md # MOC: debt landscape and prioritization
|
|
55
|
+
│ ├── migrations.md # MOC: active and planned migrations
|
|
56
|
+
│ │
|
|
57
|
+
│ ├── adrs/
|
|
58
|
+
│ │ ├── adr-001-adopt-event-sourcing-for-analytics-pipeline.md
|
|
59
|
+
│ │ ├── adr-002-use-grpc-for-inter-service-communication.md
|
|
60
|
+
│ │ └── ...
|
|
61
|
+
│ ├── services/
|
|
62
|
+
│ │ ├── payment-service.md
|
|
63
|
+
│ │ ├── analytics-pipeline.md
|
|
64
|
+
│ │ ├── user-auth.md
|
|
65
|
+
│ │ └── ...
|
|
66
|
+
│ ├── postmortems/
|
|
67
|
+
│ │ ├── pm-001-2025-03-payment-timeout-cascade.md
|
|
68
|
+
│ │ ├── pm-002-2025-06-analytics-data-loss.md
|
|
69
|
+
│ │ └── ...
|
|
70
|
+
│ ├── runbooks/
|
|
71
|
+
│ │ ├── rb-payment-service-restart.md
|
|
72
|
+
│ │ ├── rb-analytics-pipeline-backfill.md
|
|
73
|
+
│ │ └── ...
|
|
74
|
+
│ ├── tech-debt/
|
|
75
|
+
│ │ ├── td-001-payment-service-retry-logic.md
|
|
76
|
+
│ │ ├── td-002-analytics-schema-migration-debt.md
|
|
77
|
+
│ │ └── ...
|
|
78
|
+
│ └── rfcs/
|
|
79
|
+
│ ├── rfc-001-migrate-to-kafka-streams.md
|
|
80
|
+
│ └── ...
|
|
81
|
+
├── ops/
|
|
82
|
+
│ ├── templates/
|
|
83
|
+
│ │ ├── adr.md
|
|
84
|
+
│ │ ├── postmortem.md
|
|
85
|
+
│ │ ├── service.md
|
|
86
|
+
│ │ ├── runbook.md
|
|
87
|
+
│ │ ├── tech-debt.md
|
|
88
|
+
│ │ └── rfc.md
|
|
89
|
+
│ ├── logs/
|
|
90
|
+
│ │ ├── staleness-alerts.md # Agent-detected documentation drift
|
|
91
|
+
│ │ ├── dependency-changes.md # Dependency graph mutations
|
|
92
|
+
│ │ └── action-item-tracker.md # Cross-postmortem action items
|
|
93
|
+
│ └── derivation.md
|
|
94
|
+
└── inbox/
|
|
95
|
+
└── [quick captures, meeting notes, slack excerpts]
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Note Schemas
|
|
101
|
+
|
|
102
|
+
### Architecture Decision Record (ADR)
|
|
103
|
+
|
|
104
|
+
```yaml
|
|
105
|
+
---
|
|
106
|
+
description: [one sentence stating the decision and primary motivation]
|
|
107
|
+
adr_id: ADR-NNN
|
|
108
|
+
status: proposed | accepted | deprecated | superseded
|
|
109
|
+
date_decided: YYYY-MM-DD
|
|
110
|
+
deciders: ["name (role)"]
|
|
111
|
+
affected_services: ["[[service-name]]"]
|
|
112
|
+
supersedes: "[[adr-nnn-title]]" | null
|
|
113
|
+
superseded_by: "[[adr-nnn-title]]" | null
|
|
114
|
+
topics: ["[[architecture]]"]
|
|
115
|
+
relevant_notes: ["[[note]] -- relationship context"]
|
|
116
|
+
---
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Incident Postmortem
|
|
120
|
+
|
|
121
|
+
```yaml
|
|
122
|
+
---
|
|
123
|
+
description: [one sentence describing what happened and the impact]
|
|
124
|
+
pm_id: PM-NNN
|
|
125
|
+
date: YYYY-MM-DD
|
|
126
|
+
severity: P1 | P2 | P3
|
|
127
|
+
duration_minutes: NNN
|
|
128
|
+
affected_services: ["[[service-name]]"]
|
|
129
|
+
root_cause_category: configuration | code-bug | dependency-failure | capacity | human-error
|
|
130
|
+
contributing_factors: ["description"]
|
|
131
|
+
action_items:
|
|
132
|
+
- task: [what needs to happen]
|
|
133
|
+
owner: name
|
|
134
|
+
status: open | in-progress | done
|
|
135
|
+
due: YYYY-MM-DD
|
|
136
|
+
related_adrs: ["[[adr-nnn-title]]"]
|
|
137
|
+
related_tech_debt: ["[[td-nnn-title]]"]
|
|
138
|
+
topics: ["[[incidents]]"]
|
|
139
|
+
---
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Service Description
|
|
143
|
+
|
|
144
|
+
```yaml
|
|
145
|
+
---
|
|
146
|
+
description: [one sentence describing what this service does and why it exists]
|
|
147
|
+
service_id: SVC-NNN
|
|
148
|
+
team: squad-name
|
|
149
|
+
status: active | deprecated | migrating
|
|
150
|
+
depends_on: ["[[service-name]] -- interface type"]
|
|
151
|
+
depended_by: ["[[service-name]] -- interface type"]
|
|
152
|
+
sla: "99.9% availability, p99 < 200ms"
|
|
153
|
+
on_call: team-name
|
|
154
|
+
runbook: "[[rb-service-name]]"
|
|
155
|
+
last_architecture_review: YYYY-MM-DD
|
|
156
|
+
topics: ["[[services]]"]
|
|
157
|
+
relevant_notes: ["[[adr-nnn-title]] -- architectural decision constraining this service"]
|
|
158
|
+
---
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Runbook
|
|
162
|
+
|
|
163
|
+
```yaml
|
|
164
|
+
---
|
|
165
|
+
description: [one sentence describing when to use this runbook]
|
|
166
|
+
rb_id: RB-NNN
|
|
167
|
+
service: "[[service-name]]"
|
|
168
|
+
trigger: [when to execute this runbook]
|
|
169
|
+
last_verified: YYYY-MM-DD
|
|
170
|
+
verified_by: name
|
|
171
|
+
owner: squad-name
|
|
172
|
+
topics: ["[[service-name]]"]
|
|
173
|
+
---
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Tech Debt Item
|
|
177
|
+
|
|
178
|
+
```yaml
|
|
179
|
+
---
|
|
180
|
+
description: [one sentence describing the debt and its risk]
|
|
181
|
+
td_id: TD-NNN
|
|
182
|
+
service: "[[service-name]]"
|
|
183
|
+
severity: critical | high | medium | low
|
|
184
|
+
estimated_effort: days | weeks | months
|
|
185
|
+
business_impact: [what happens if this is not addressed]
|
|
186
|
+
related_incidents: ["[[pm-nnn-title]]"]
|
|
187
|
+
created: YYYY-MM-DD
|
|
188
|
+
owner: squad-name
|
|
189
|
+
status: identified | scheduled | in-progress | resolved
|
|
190
|
+
topics: ["[[tech-debt]]"]
|
|
191
|
+
---
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Example Notes
|
|
197
|
+
|
|
198
|
+
### ADR: Adopt Event Sourcing for Analytics Pipeline
|
|
199
|
+
|
|
200
|
+
```markdown
|
|
201
|
+
---
|
|
202
|
+
description: Chose event sourcing over traditional CRUD for the analytics pipeline to enable retroactive reprocessing and audit trail requirements
|
|
203
|
+
adr_id: ADR-001
|
|
204
|
+
status: accepted
|
|
205
|
+
date_decided: 2024-03-15
|
|
206
|
+
deciders: ["Ravi Krishnamurthy (Staff Engineer)", "Lisa Park (Analytics Lead)"]
|
|
207
|
+
affected_services: ["[[analytics-pipeline]]", "[[event-store]]", "[[reporting-service]]"]
|
|
208
|
+
supersedes: null
|
|
209
|
+
superseded_by: null
|
|
210
|
+
topics: ["[[architecture]]"]
|
|
211
|
+
relevant_notes: ["[[adr-017-migrate-event-store-to-kafka-streams]] -- evolves the storage layer underneath this decision without changing the sourcing model", "[[pm-002-2025-06-analytics-data-loss]] -- incident that validated this decision: event sourcing enabled full reprocessing from the event log after corrupted aggregates were detected"]
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
# ADR-001: Adopt event sourcing for analytics pipeline
|
|
215
|
+
|
|
216
|
+
## Context
|
|
217
|
+
|
|
218
|
+
The analytics pipeline processes approximately 2 billion events per day from 40+ event sources. The original CRUD-based design made it impossible to reprocess historical data when business logic changed — every schema migration was a one-way door. Compliance requirements (SOC 2, GDPR audit trails) added a second forcing function: we needed an immutable record of every state change, not just current state.
|
|
219
|
+
|
|
220
|
+
## Decision
|
|
221
|
+
|
|
222
|
+
Adopt event sourcing as the primary data model for the analytics pipeline. All state changes are stored as immutable events in a time-ordered log. Current state is derived by replaying events through projection functions. The event store is the source of truth; materialized views are disposable.
|
|
223
|
+
|
|
224
|
+
## Consequences
|
|
225
|
+
|
|
226
|
+
**Positive:**
|
|
227
|
+
- Retroactive reprocessing becomes trivial — change the projection function, replay from any point
|
|
228
|
+
- Complete audit trail satisfies SOC 2 and GDPR requirements without additional infrastructure
|
|
229
|
+
- Debugging production issues reduces to "find the event sequence that produced this state"
|
|
230
|
+
|
|
231
|
+
**Negative:**
|
|
232
|
+
- Engineering team needs to learn event sourcing patterns (3-month ramp estimated, actual was 5 months)
|
|
233
|
+
- Storage costs increase significantly — events are append-only, never deleted
|
|
234
|
+
- Eventually consistent reads require careful UX design for the reporting service
|
|
235
|
+
- Operational complexity increases: the event store itself becomes a critical dependency
|
|
236
|
+
|
|
237
|
+
## Alternatives Considered
|
|
238
|
+
|
|
239
|
+
1. **CRUD with audit log** — Simpler but creates two sources of truth (current state vs audit log). Audit log inevitably drifts from reality.
|
|
240
|
+
2. **Change Data Capture (CDC)** — Captures changes at the database level but loses business semantics. "Row updated" is less useful than "user changed subscription tier."
|
|
241
|
+
3. **Hybrid: CRUD for hot path, event log for audit** — Considered but rejected because it splits the team's mental model and doubles the write path.
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
Relevant Notes:
|
|
246
|
+
- [[analytics-pipeline]] -- the service this decision primarily constrains
|
|
247
|
+
- [[adr-017-migrate-event-store-to-kafka-streams]] -- evolution of the storage layer
|
|
248
|
+
- [[pm-002-2025-06-analytics-data-loss]] -- incident that validated event sourcing's reprocessing capability
|
|
249
|
+
|
|
250
|
+
Topics:
|
|
251
|
+
- [[architecture]]
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Incident Postmortem: Payment Timeout Cascade
|
|
255
|
+
|
|
256
|
+
```markdown
|
|
257
|
+
---
|
|
258
|
+
description: Payment service timeout cascade caused 47-minute P1 outage affecting all checkout flows due to missing circuit breaker on inventory service dependency
|
|
259
|
+
pm_id: PM-001
|
|
260
|
+
date: 2025-03-12
|
|
261
|
+
severity: P1
|
|
262
|
+
duration_minutes: 47
|
|
263
|
+
affected_services: ["[[payment-service]]", "[[inventory-service]]", "[[checkout-gateway]]"]
|
|
264
|
+
root_cause_category: dependency-failure
|
|
265
|
+
contributing_factors: ["Missing circuit breaker on payment-to-inventory call", "Inventory service deployed new version with 10x slower response under load", "No load shedding in checkout gateway"]
|
|
266
|
+
action_items:
|
|
267
|
+
- task: Add circuit breakers to all inter-service calls in payment service
|
|
268
|
+
owner: Carlos Mendez
|
|
269
|
+
status: done
|
|
270
|
+
due: 2025-03-26
|
|
271
|
+
- task: Implement load shedding in checkout gateway
|
|
272
|
+
owner: Aisha Johnson
|
|
273
|
+
status: done
|
|
274
|
+
due: 2025-04-09
|
|
275
|
+
- task: Add latency budget alerts to all critical-path services
|
|
276
|
+
owner: Platform team
|
|
277
|
+
status: in-progress
|
|
278
|
+
due: 2025-04-30
|
|
279
|
+
- task: Create pre-deployment load test gate for inventory service
|
|
280
|
+
owner: Commerce squad
|
|
281
|
+
status: open
|
|
282
|
+
due: 2025-05-15
|
|
283
|
+
related_adrs: ["[[adr-002-use-grpc-for-inter-service-communication]]"]
|
|
284
|
+
related_tech_debt: ["[[td-001-payment-service-retry-logic]]"]
|
|
285
|
+
topics: ["[[incidents]]"]
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
# PM-001: Payment timeout cascade — March 12, 2025
|
|
289
|
+
|
|
290
|
+
## Timeline
|
|
291
|
+
|
|
292
|
+
- **14:23 UTC** — Commerce squad deploys inventory-service v2.3.1 with new stock validation logic
|
|
293
|
+
- **14:31 UTC** — Inventory service p99 latency increases from 45ms to 480ms under production load
|
|
294
|
+
- **14:33 UTC** — Payment service begins timing out on inventory checks. Default timeout is 5s with 3 retries, creating a 15s worst-case per request
|
|
295
|
+
- **14:35 UTC** — Payment service thread pool exhausted. All checkout requests begin failing
|
|
296
|
+
- **14:37 UTC** — PagerDuty alert fires for payment service error rate > 5%
|
|
297
|
+
- **14:38 UTC** — Checkout gateway begins returning 503s to all users. Revenue impact begins
|
|
298
|
+
- **14:42 UTC** — On-call engineer (Carlos) acknowledges. Initial hypothesis: payment service itself is down
|
|
299
|
+
- **14:51 UTC** — Carlos identifies inventory service as the source. Begins rollback
|
|
300
|
+
- **14:58 UTC** — Inventory service v2.3.0 deployed. Latency returns to normal
|
|
301
|
+
- **15:03 UTC** — Payment service thread pool recovers. Checkout flow restored
|
|
302
|
+
- **15:10 UTC** — All services confirmed healthy. Incident closed
|
|
303
|
+
|
|
304
|
+
## Root Cause
|
|
305
|
+
|
|
306
|
+
Inventory service v2.3.1 introduced a new stock validation query that performed a full table scan on a 200M-row table under specific product category combinations. The query worked fine in staging (small dataset) but degraded catastrophically under production load.
|
|
307
|
+
|
|
308
|
+
The payment service had no circuit breaker on its inventory dependency. When inventory slowed down, payment kept retrying, exhausting its own connection pool and cascading the failure upward to the checkout gateway.
|
|
309
|
+
|
|
310
|
+
## What went well
|
|
311
|
+
|
|
312
|
+
- PagerDuty alert fired within 2 minutes of customer impact
|
|
313
|
+
- Rollback was clean and fast (7 minutes from decision to deployment)
|
|
314
|
+
- No data loss — event sourcing in the analytics pipeline meant we could reprocess the gap
|
|
315
|
+
|
|
316
|
+
## What went poorly
|
|
317
|
+
|
|
318
|
+
- 9 minutes from alert to correct diagnosis because the alert pointed at payment, not inventory
|
|
319
|
+
- No pre-deployment load test would have caught this with production-scale data
|
|
320
|
+
- The missing circuit breaker in payment service had been identified as tech debt ([[td-001-payment-service-retry-logic]]) six months earlier and never prioritized
|
|
321
|
+
|
|
322
|
+
## Lessons
|
|
323
|
+
|
|
324
|
+
The gap between "we know this is tech debt" and "we are going to fix it" is where incidents live. [[td-001-payment-service-retry-logic]] was logged, triaged as "medium," and deprioritized three times. This incident cost approximately $180K in lost revenue during the 47-minute outage. The tech debt fix would have taken an estimated two days of engineering time.
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
Relevant Notes:
|
|
329
|
+
- [[td-001-payment-service-retry-logic]] -- the tech debt that directly enabled this failure
|
|
330
|
+
- [[payment-service]] -- the service that cascaded
|
|
331
|
+
- [[inventory-service]] -- the service that originated the failure
|
|
332
|
+
- [[adr-002-use-grpc-for-inter-service-communication]] -- the communication pattern that lacked circuit breakers
|
|
333
|
+
|
|
334
|
+
Topics:
|
|
335
|
+
- [[incidents]]
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Service Description: Payment Service
|
|
339
|
+
|
|
340
|
+
```markdown
|
|
341
|
+
---
|
|
342
|
+
description: Processes all payment transactions for the platform — critical path for checkout, highest SLA requirement, owned by Commerce squad
|
|
343
|
+
service_id: SVC-003
|
|
344
|
+
team: commerce-squad
|
|
345
|
+
status: active
|
|
346
|
+
depends_on: ["[[inventory-service]] -- gRPC, stock validation before charge", "[[user-auth]] -- gRPC, token validation", "[[notification-service]] -- async, receipt emails", "[[event-store]] -- async, payment events"]
|
|
347
|
+
depended_by: ["[[checkout-gateway]] -- gRPC, payment processing", "[[reporting-service]] -- async, revenue aggregation", "[[refund-service]] -- gRPC, original transaction lookup"]
|
|
348
|
+
sla: "99.95% availability, p99 < 500ms"
|
|
349
|
+
on_call: commerce-squad
|
|
350
|
+
runbook: "[[rb-payment-service-restart]]"
|
|
351
|
+
last_architecture_review: 2025-09-15
|
|
352
|
+
topics: ["[[services]]"]
|
|
353
|
+
relevant_notes: ["[[adr-009-payment-service-idempotency-keys]] -- ensures exactly-once processing for payment requests", "[[adr-002-use-grpc-for-inter-service-communication]] -- communication protocol constraint", "[[pm-001-2025-03-payment-timeout-cascade]] -- incident revealing missing circuit breaker on inventory dependency", "[[td-001-payment-service-retry-logic]] -- known tech debt in retry/circuit breaker implementation"]
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
# Payment Service
|
|
357
|
+
|
|
358
|
+
The payment service is the most critical component in the checkout path. Every dollar of revenue flows through it. It processes charges, validates inventory availability, and emits payment events to the event store for downstream consumption.
|
|
359
|
+
|
|
360
|
+
## Architecture
|
|
361
|
+
|
|
362
|
+
The service is a stateless gRPC server running on Kubernetes (3 replicas minimum, autoscales to 12). State lives in PostgreSQL (primary + 2 read replicas) and the event store. Idempotency keys ensure exactly-once semantics for payment processing, which matters because retry logic in upstream services can produce duplicate requests.
|
|
363
|
+
|
|
364
|
+
## Known risks
|
|
365
|
+
|
|
366
|
+
The circuit breaker implementation ([[td-001-payment-service-retry-logic]]) remains incomplete. The basic breaker from the PM-001 remediation covers the inventory dependency but the pattern has not been generalized to all outbound calls. The user-auth dependency currently has no breaker — if auth service degrades, payment service will cascade just as it did with inventory in March 2025.
|
|
367
|
+
|
|
368
|
+
## Dependency notes
|
|
369
|
+
|
|
370
|
+
The inventory-service dependency is synchronous and on the critical path — payment will not process without stock validation. This is the dependency that caused [[pm-001-2025-03-payment-timeout-cascade]]. Notification-service and event-store dependencies are asynchronous and can tolerate temporary unavailability.
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
Relevant Notes:
|
|
375
|
+
- [[adr-009-payment-service-idempotency-keys]] -- exactly-once guarantee
|
|
376
|
+
- [[pm-001-2025-03-payment-timeout-cascade]] -- cascade incident from inventory dependency
|
|
377
|
+
- [[td-001-payment-service-retry-logic]] -- incomplete circuit breaker implementation
|
|
378
|
+
|
|
379
|
+
Topics:
|
|
380
|
+
- [[services]]
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
### Tech Debt Item: Payment Service Retry Logic
|
|
384
|
+
|
|
385
|
+
```markdown
|
|
386
|
+
---
|
|
387
|
+
description: Payment service lacks generalized circuit breakers on outbound dependencies — partial fix after PM-001 covers only inventory call, leaving auth and other dependencies vulnerable to the same cascade pattern
|
|
388
|
+
td_id: TD-001
|
|
389
|
+
service: "[[payment-service]]"
|
|
390
|
+
severity: high
|
|
391
|
+
estimated_effort: weeks
|
|
392
|
+
business_impact: Another dependency-failure cascade could cause a P1 outage identical to PM-001, with estimated revenue impact of $180K+ per hour
|
|
393
|
+
related_incidents: ["[[pm-001-2025-03-payment-timeout-cascade]]"]
|
|
394
|
+
created: 2024-09-20
|
|
395
|
+
owner: commerce-squad
|
|
396
|
+
status: in-progress
|
|
397
|
+
topics: ["[[tech-debt]]"]
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
# TD-001: Payment service retry logic and circuit breakers
|
|
401
|
+
|
|
402
|
+
## History
|
|
403
|
+
|
|
404
|
+
Originally logged in September 2024 after a code review identified that the payment service's outbound calls used a naive retry strategy (3 retries, 5s timeout, no backoff, no circuit breaker). Severity was assessed as "medium" at the time.
|
|
405
|
+
|
|
406
|
+
Deprioritized in Q4 2024 planning in favor of the billing migration project. Deprioritized again in Q1 2025 planning in favor of the analytics pipeline upgrade. Then [[pm-001-2025-03-payment-timeout-cascade]] happened in March 2025, converting this from theoretical risk to demonstrated vulnerability.
|
|
407
|
+
|
|
408
|
+
## Current state
|
|
409
|
+
|
|
410
|
+
After PM-001, Carlos Mendez implemented a circuit breaker specifically for the payment-to-inventory call. This addressed the immediate vulnerability but did not generalize the pattern. The payment service currently makes outbound calls to four services:
|
|
411
|
+
|
|
412
|
+
| Dependency | Circuit breaker | Timeout strategy |
|
|
413
|
+
|-----------|----------------|-----------------|
|
|
414
|
+
| [[inventory-service]] | Yes (post PM-001) | Exponential backoff, 2s initial, 3 retries |
|
|
415
|
+
| [[user-auth]] | **No** | Fixed 5s, 3 retries |
|
|
416
|
+
| [[notification-service]] | No (async, less critical) | Fire-and-forget with DLQ |
|
|
417
|
+
| [[event-store]] | No (async, less critical) | Async with local buffer |
|
|
418
|
+
|
|
419
|
+
The user-auth dependency is the remaining critical risk. It is synchronous, on the critical path, and uses the same naive retry strategy that caused the inventory cascade.
|
|
420
|
+
|
|
421
|
+
## Proposed resolution
|
|
422
|
+
|
|
423
|
+
1. Implement a shared circuit breaker library for all gRPC outbound calls
|
|
424
|
+
2. Configure per-dependency: timeout, retry count, backoff strategy, breaker threshold
|
|
425
|
+
3. Add circuit breaker state to service metrics dashboard
|
|
426
|
+
4. Add integration test that verifies breaker behavior under simulated dependency failure
|
|
427
|
+
|
|
428
|
+
---
|
|
429
|
+
|
|
430
|
+
Relevant Notes:
|
|
431
|
+
- [[pm-001-2025-03-payment-timeout-cascade]] -- the incident that proved this debt is real
|
|
432
|
+
- [[payment-service]] -- the service carrying this debt
|
|
433
|
+
- [[adr-002-use-grpc-for-inter-service-communication]] -- the protocol layer where breakers operate
|
|
434
|
+
|
|
435
|
+
Topics:
|
|
436
|
+
- [[tech-debt]]
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
### Runbook: Payment Service Restart
|
|
440
|
+
|
|
441
|
+
```markdown
|
|
442
|
+
---
|
|
443
|
+
description: Step-by-step procedure for restarting the payment service during degraded performance or after a deployment rollback
|
|
444
|
+
rb_id: RB-003
|
|
445
|
+
service: "[[payment-service]]"
|
|
446
|
+
trigger: Payment service p99 > 2s for 5+ minutes, or error rate > 2% after deployment
|
|
447
|
+
last_verified: 2026-01-10
|
|
448
|
+
verified_by: Carlos Mendez
|
|
449
|
+
owner: commerce-squad
|
|
450
|
+
topics: ["[[payment-service]]"]
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
# RB-003: Payment service restart procedure
|
|
454
|
+
|
|
455
|
+
## Pre-restart checks
|
|
456
|
+
|
|
457
|
+
1. Confirm the issue is the payment service, not a dependency
|
|
458
|
+
- Check [[inventory-service]] latency dashboard
|
|
459
|
+
- Check [[user-auth]] health endpoint
|
|
460
|
+
- If a dependency is degraded, the payment service circuit breaker should handle it — restart will not help
|
|
461
|
+
|
|
462
|
+
2. Check for in-flight transactions
|
|
463
|
+
```bash
|
|
464
|
+
kubectl exec -it payment-svc-0 -- curl localhost:8080/admin/inflight
|
|
465
|
+
```
|
|
466
|
+
- If > 0 in-flight transactions, wait for them to complete (max 30s) or drain gracefully
|
|
467
|
+
|
|
468
|
+
3. Notify #incidents Slack channel: "Restarting payment service — expect brief degradation"
|
|
469
|
+
|
|
470
|
+
## Restart procedure
|
|
471
|
+
|
|
472
|
+
```bash
|
|
473
|
+
# Rolling restart (preferred — maintains availability)
|
|
474
|
+
kubectl rollout restart deployment/payment-service -n production
|
|
475
|
+
|
|
476
|
+
# Monitor rollout
|
|
477
|
+
kubectl rollout status deployment/payment-service -n production --timeout=120s
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
## Post-restart verification
|
|
481
|
+
|
|
482
|
+
1. Check health endpoint: `curl https://payment.internal/health`
|
|
483
|
+
2. Verify p99 latency < 500ms on dashboard
|
|
484
|
+
3. Verify error rate < 0.1% on dashboard
|
|
485
|
+
4. Run a test transaction via the staging checkout flow
|
|
486
|
+
5. Update #incidents: "Payment service restart complete, metrics nominal"
|
|
487
|
+
|
|
488
|
+
## Escalation
|
|
489
|
+
|
|
490
|
+
If restart does not resolve the issue within 5 minutes:
|
|
491
|
+
- Page the commerce squad lead (currently: Ravi Krishnamurthy)
|
|
492
|
+
- Consider rollback to previous deployment version
|
|
493
|
+
- If rollback also fails, engage platform team for infrastructure investigation
|
|
494
|
+
|
|
495
|
+
---
|
|
496
|
+
|
|
497
|
+
Topics:
|
|
498
|
+
- [[payment-service]]
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
---
|
|
502
|
+
|
|
503
|
+
## Processing Workflow
|
|
504
|
+
|
|
505
|
+
### Capture
|
|
506
|
+
|
|
507
|
+
Engineering knowledge enters the system through four channels:
|
|
508
|
+
|
|
509
|
+
1. **Decision meetings** — ADR drafted during or immediately after architecture discussions
|
|
510
|
+
2. **Incident response** — Postmortem created within 48 hours of incident resolution
|
|
511
|
+
3. **Code reviews** — Tech debt items created when reviewers identify structural issues
|
|
512
|
+
4. **Deployment events** — Service descriptions updated when dependencies or configurations change
|
|
513
|
+
|
|
514
|
+
The agent monitors these channels (Slack threads, PR comments, deployment logs) and prompts for capture when it detects events that should produce notes. A deployment to the payment service triggers: "Payment service was updated. Should the service description be reviewed?"
|
|
515
|
+
|
|
516
|
+
### Process
|
|
517
|
+
|
|
518
|
+
The agent's primary processing work is connection maintenance:
|
|
519
|
+
|
|
520
|
+
1. **ADR impact mapping** — When a new ADR is accepted, the agent identifies all services it affects and adds `relevant_notes` links bidirectionally. If the ADR supersedes a previous decision, the agent updates both ADRs' `supersedes`/`superseded_by` fields and checks whether any tech debt or runbook references the superseded ADR.
|
|
521
|
+
|
|
522
|
+
2. **Postmortem linking** — When a postmortem is created, the agent links it to all affected services, relevant ADRs (decisions that shaped the architecture that failed), and related tech debt items (known issues that contributed). The agent also checks for pattern matches: "This is the third P2 incident involving the inventory service in six months."
|
|
523
|
+
|
|
524
|
+
3. **Dependency graph maintenance** — When a service description changes its `depends_on` or `depended_by` fields, the agent verifies bidirectional consistency. If Service A says it depends on Service B, Service B's `depended_by` must include Service A.
|
|
525
|
+
|
|
526
|
+
4. **Tech debt correlation** — When a postmortem identifies tech debt as a contributing factor, the agent links them and updates the tech debt item's severity if the incident demonstrates higher risk than originally assessed.
|
|
527
|
+
|
|
528
|
+
### Connect
|
|
529
|
+
|
|
530
|
+
Cross-cutting connections that span note types:
|
|
531
|
+
|
|
532
|
+
- **Decision chains** — ADR-001 is superseded by ADR-017, which is constrained by ADR-023. The agent maintains these chains and can traverse them: "What is the current active decision about event storage?"
|
|
533
|
+
- **Incident patterns** — Postmortems cluster by root cause category, affected service, and time period. The agent detects patterns: "Three of the last five P1 incidents involved the same service boundary."
|
|
534
|
+
- **Debt-to-incident correlation** — Tech debt items linked to incidents gain quantified business impact. TD-001's severity changed from "medium" to "high" after PM-001 demonstrated $180K/hour revenue impact.
|
|
535
|
+
- **Runbook verification** — When a service description changes, the agent checks whether the runbook still matches. If the deployment procedure changed but the runbook was not updated, the agent flags the drift.
|
|
536
|
+
|
|
537
|
+
### Verify
|
|
538
|
+
|
|
539
|
+
Event-triggered and periodic checks:
|
|
540
|
+
|
|
541
|
+
1. **Post-deployment verification** — After any deployment, check: is the service description current? Is the runbook still accurate? Are dependency declarations still correct?
|
|
542
|
+
2. **Postmortem action item tracking** — Weekly sweep of all open action items across postmortems. Flag items past their due date
|
|
543
|
+
3. **Staleness detection** — Monthly check for service descriptions not updated in 90+ days, runbooks not verified in 180+ days, ADRs in "proposed" status for 30+ days
|
|
544
|
+
4. **Dependency graph integrity** — Monthly bidirectional consistency check across all service `depends_on`/`depended_by` fields
|
|
545
|
+
5. **ADR lifecycle review** — Quarterly scan for accepted ADRs whose context may have changed (referenced technologies deprecated, team structure changed, scale assumptions exceeded)
|
|
546
|
+
|
|
547
|
+
---
|
|
548
|
+
|
|
549
|
+
## MOC Structure
|
|
550
|
+
|
|
551
|
+
### Hub (index.md)
|
|
552
|
+
|
|
553
|
+
```markdown
|
|
554
|
+
---
|
|
555
|
+
description: Entry point for the Meridian engineering knowledge system — navigate by category, by service, or by pattern
|
|
556
|
+
type: moc
|
|
557
|
+
---
|
|
558
|
+
|
|
559
|
+
# index
|
|
560
|
+
|
|
561
|
+
## By Category
|
|
562
|
+
- [[architecture]] -- ADRs, design principles, architectural patterns
|
|
563
|
+
- [[services]] -- service catalog, dependency graph overview
|
|
564
|
+
- [[incidents]] -- postmortem patterns, incident trends
|
|
565
|
+
- [[tech-debt]] -- debt landscape, prioritization, correlation to incidents
|
|
566
|
+
- [[migrations]] -- active and planned system migrations
|
|
567
|
+
|
|
568
|
+
## Quick Access
|
|
569
|
+
- [[services]] -- "what services exist and how do they connect?"
|
|
570
|
+
- [[incidents]] -- "what has gone wrong and what patterns emerge?"
|
|
571
|
+
- [[architecture]] -- "why was it built this way?"
|
|
572
|
+
|
|
573
|
+
## Maintenance
|
|
574
|
+
- [[staleness-alerts]] -- agent-detected documentation drift
|
|
575
|
+
- [[action-item-tracker]] -- cross-postmortem action item status
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
### Services MOC (services.md)
|
|
579
|
+
|
|
580
|
+
```markdown
|
|
581
|
+
---
|
|
582
|
+
description: Service catalog with dependency overview — the entry point for understanding what exists, what depends on what, and where the risks concentrate
|
|
583
|
+
type: moc
|
|
584
|
+
topics: ["[[index]]"]
|
|
585
|
+
---
|
|
586
|
+
|
|
587
|
+
# services
|
|
588
|
+
|
|
589
|
+
The Meridian platform consists of 40+ services owned by five squads. This MOC provides navigational entry and dependency awareness. For architectural decisions constraining specific services, follow the relevant_notes links from service descriptions to their ADRs.
|
|
590
|
+
|
|
591
|
+
## Critical Path Services
|
|
592
|
+
- [[payment-service]] -- highest SLA, revenue-critical, known circuit breaker debt
|
|
593
|
+
- [[checkout-gateway]] -- user-facing entry point, aggregates payment + inventory + auth
|
|
594
|
+
- [[user-auth]] -- token validation for all authenticated requests
|
|
595
|
+
|
|
596
|
+
## Data Infrastructure
|
|
597
|
+
- [[analytics-pipeline]] -- event sourcing architecture, 2B events/day
|
|
598
|
+
- [[event-store]] -- append-only event log, source of truth for analytics
|
|
599
|
+
- [[reporting-service]] -- materialized views over event store
|
|
600
|
+
|
|
601
|
+
## Commerce
|
|
602
|
+
- [[inventory-service]] -- stock validation, source of PM-001 cascade
|
|
603
|
+
- [[pricing-service]] -- dynamic pricing engine
|
|
604
|
+
- [[refund-service]] -- refund processing, depends on payment-service for original transaction lookup
|
|
605
|
+
|
|
606
|
+
## Dependency Risks
|
|
607
|
+
|
|
608
|
+
Three dependency patterns require monitoring:
|
|
609
|
+
|
|
610
|
+
1. **Payment service fan-out** — payment-service depends on 4 services synchronously. Any degradation cascades. Circuit breaker coverage is incomplete (see [[td-001-payment-service-retry-logic]]).
|
|
611
|
+
|
|
612
|
+
2. **Event store as single source of truth** — analytics-pipeline, reporting-service, and 6 other consumers depend on event-store. It has no hot standby.
|
|
613
|
+
|
|
614
|
+
3. **Auth as universal dependency** — every authenticated service depends on user-auth. It has no circuit breaker consumers and no fallback mode.
|
|
615
|
+
|
|
616
|
+
---
|
|
617
|
+
|
|
618
|
+
Agent Notes:
|
|
619
|
+
- When someone asks "what depends on X?", start with the service description's `depended_by` field, then check for implicit dependencies not yet documented. Slack threads mentioning both service names can reveal undocumented dependencies.
|
|
620
|
+
- The dependency graph is only as current as the last service description update. Flag services not updated in 90+ days during monthly staleness sweeps.
|
|
621
|
+
|
|
622
|
+
Topics:
|
|
623
|
+
- [[index]]
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
### Architecture MOC (architecture.md)
|
|
627
|
+
|
|
628
|
+
```markdown
|
|
629
|
+
---
|
|
630
|
+
description: Architectural decisions and patterns — the "why" behind system design, organized by decision status and domain
|
|
631
|
+
type: moc
|
|
632
|
+
topics: ["[[index]]"]
|
|
633
|
+
---
|
|
634
|
+
|
|
635
|
+
# architecture
|
|
636
|
+
|
|
637
|
+
Architecture decisions are documented as ADRs. Each ADR captures the context, the decision, the alternatives considered, and the consequences — both intended and actual. ADRs are living documents: their status changes as systems evolve.
|
|
638
|
+
|
|
639
|
+
## Active Decisions
|
|
640
|
+
- [[adr-001-adopt-event-sourcing-for-analytics-pipeline]] -- event sourcing as primary data model, validated by PM-002 incident recovery
|
|
641
|
+
- [[adr-002-use-grpc-for-inter-service-communication]] -- gRPC for synchronous inter-service calls, constrains circuit breaker implementation
|
|
642
|
+
- [[adr-009-payment-service-idempotency-keys]] -- exactly-once payment processing via idempotency keys
|
|
643
|
+
|
|
644
|
+
## Superseded Decisions
|
|
645
|
+
- [[adr-003-use-rest-for-internal-apis]] -- superseded by ADR-002 (gRPC migration), completed Q3 2024
|
|
646
|
+
|
|
647
|
+
## Proposed
|
|
648
|
+
- [[rfc-001-migrate-to-kafka-streams]] -- replace custom event store consumer with Kafka Streams, currently in design review
|
|
649
|
+
|
|
650
|
+
## Cross-Cutting Patterns
|
|
651
|
+
- Circuit breakers: partially implemented post-PM-001, see [[td-001-payment-service-retry-logic]]
|
|
652
|
+
- Idempotency: established pattern via [[adr-009-payment-service-idempotency-keys]], should extend to all write paths
|
|
653
|
+
- Event sourcing: foundational via [[adr-001-adopt-event-sourcing-for-analytics-pipeline]], constrains how all services emit state changes
|
|
654
|
+
|
|
655
|
+
---
|
|
656
|
+
|
|
657
|
+
Agent Notes:
|
|
658
|
+
- The supersession chain is the most important traversal pattern for ADRs. When someone asks "what is the current decision about X?", follow superseded_by links until you reach an ADR with no superseded_by — that is the active decision.
|
|
659
|
+
- When a new ADR is proposed, check whether it implicitly supersedes or conflicts with existing accepted ADRs. The decider may not be aware of all prior decisions.
|
|
660
|
+
|
|
661
|
+
Topics:
|
|
662
|
+
- [[index]]
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
---
|
|
666
|
+
|
|
667
|
+
## Graph Query Examples
|
|
668
|
+
|
|
669
|
+
```bash
|
|
670
|
+
# Find all ADRs affecting a specific service
|
|
671
|
+
rg 'affected_services:.*\[\[payment-service\]\]' notes/adrs/
|
|
672
|
+
|
|
673
|
+
# Find all P1 incidents in the last 6 months
|
|
674
|
+
rg '^severity: P1' notes/postmortems/ -l
|
|
675
|
+
|
|
676
|
+
# Find all open postmortem action items
|
|
677
|
+
rg -U 'status: open' notes/postmortems/ -B 2
|
|
678
|
+
|
|
679
|
+
# Find all tech debt items correlated with incidents
|
|
680
|
+
rg '^related_incidents:' notes/tech-debt/ | rg -v 'null'
|
|
681
|
+
|
|
682
|
+
# Find services with no runbook
|
|
683
|
+
for svc in notes/services/*.md; do
|
|
684
|
+
rg -q '^runbook:' "$svc" || echo "NO RUNBOOK: $svc"
|
|
685
|
+
done
|
|
686
|
+
|
|
687
|
+
# Find all superseded ADRs (decisions that have been replaced)
|
|
688
|
+
rg '^superseded_by:' notes/adrs/ | rg -v 'null'
|
|
689
|
+
|
|
690
|
+
# Find the current active decision for any superseded ADR
|
|
691
|
+
rg '^status: accepted' notes/adrs/ -l
|
|
692
|
+
|
|
693
|
+
# Find stale runbooks (not verified in 180+ days)
|
|
694
|
+
rg '^last_verified:' notes/runbooks/ | while read line; do
|
|
695
|
+
file=$(echo "$line" | cut -d: -f1)
|
|
696
|
+
date=$(echo "$line" | awk '{print $2}')
|
|
697
|
+
if [[ "$date" < "2025-08-18" ]]; then
|
|
698
|
+
echo "STALE: $file — last verified: $date"
|
|
699
|
+
fi
|
|
700
|
+
done
|
|
701
|
+
|
|
702
|
+
# Dependency fan-out analysis: which services have the most dependents?
|
|
703
|
+
rg '^depended_by:' notes/services/ -A 10 | rg '\[\[' | \
|
|
704
|
+
sed 's/.*\[\[//' | sed 's/\]\].*//' | sort | uniq -c | sort -rn
|
|
705
|
+
```
|
|
706
|
+
|
|
707
|
+
---
|
|
708
|
+
|
|
709
|
+
## What Makes This Domain Unique
|
|
710
|
+
|
|
711
|
+
### The dependency graph is the primary structural artifact
|
|
712
|
+
|
|
713
|
+
In a research vault, the knowledge graph emerges from wiki links between claim notes — each link represents an intellectual relationship. In an engineering team system, the dependency graph is not emergent but declarative: services explicitly state what they depend on and what depends on them. This graph is not just navigation — it is an operational map. When a service changes, the dependency graph determines who needs to know, what might break, and how far a failure can cascade. The graph has engineering consequences that research graphs do not: a missing edge in the dependency graph means an undetected blast radius.
|
|
714
|
+
|
|
715
|
+
### ADR lifecycle creates a temporal knowledge layer
|
|
716
|
+
|
|
717
|
+
Research claims do not get "superseded" in the same way ADRs do. A claim about knowledge management might be refined, extended, or contradicted, but it remains relevant as a historical argument. An ADR that has been superseded is not just historically interesting — it is potentially dangerous if someone follows it without realizing it has been replaced. The `supersedes`/`superseded_by` chain creates a temporal layer that the agent must actively maintain: every new ADR must be checked against the chain, every reference to an ADR must verify that the referenced decision is still active, and every deprecated ADR must link forward to its replacement.
|
|
718
|
+
|
|
719
|
+
### Incident patterns accumulate across postmortems invisibly
|
|
720
|
+
|
|
721
|
+
No individual postmortem reveals that the same service boundary has caused three P1 incidents. No individual tech debt item reveals that unprioritized debt has cost $540K in incident revenue loss. These patterns exist only in the aggregate, and humans are notoriously bad at maintaining aggregate awareness across months of blameless postmortems. The system's value is in making the invisible visible: correlating debt to incidents, tracking action item completion rates, detecting recurring failure patterns, and quantifying the business cost of architectural decisions.
|
|
722
|
+
|
|
723
|
+
---
|
|
724
|
+
|
|
725
|
+
## Agent-Native Advantages
|
|
726
|
+
|
|
727
|
+
### Exhaustive impact analysis across the dependency graph
|
|
728
|
+
|
|
729
|
+
When someone proposes changing the payment service's API, a human engineer thinks about the services they personally know depend on it. They might remember checkout-gateway and reporting-service. They will not remember that the refund-service also depends on the original transaction lookup endpoint, or that the analytics pipeline consumes payment events in a specific schema. The agent traces every edge in the dependency graph, every consumer of every endpoint, every downstream system that makes assumptions about the current behavior. The result is a complete blast radius assessment, not an approximate one.
|
|
730
|
+
|
|
731
|
+
This is not just graph traversal — it is graph traversal plus schema awareness plus temporal context. The agent knows not just that refund-service depends on payment-service, but what interface type they use (gRPC), which ADR defined that interface (ADR-002), and what incidents have involved that dependency (PM-001). The impact analysis is multi-dimensional in a way that no human engineer can sustain across 40+ services.
|
|
732
|
+
|
|
733
|
+
### Automatic staleness detection through code-documentation divergence
|
|
734
|
+
|
|
735
|
+
Documentation rots because humans write it once and forget to update it. The agent never forgets that documentation exists. When a deployment changes service behavior, the agent checks: does the service description still match? Does the runbook still work? Are the dependency declarations still accurate? This is not periodic review — it is event-triggered verification. Every deployment, every PR that touches interface definitions, every infrastructure change triggers a staleness check on the relevant documentation.
|
|
736
|
+
|
|
737
|
+
The staleness detection extends to ADRs: when a technology referenced in an ADR reaches end-of-life, or when a scaling assumption documented in an ADR's context section is exceeded by actual load, the agent flags the ADR for re-evaluation. A human engineer would need to periodically re-read every ADR and compare its assumptions to current reality. The agent does this continuously.
|
|
738
|
+
|
|
739
|
+
### Postmortem pattern analysis that builds institutional learning
|
|
740
|
+
|
|
741
|
+
Individual postmortems are valuable. The pattern across postmortems is transformative. The agent maintains a running analysis:
|
|
742
|
+
|
|
743
|
+
- **Root cause category distribution** — "60% of our P1 incidents are dependency failures, not code bugs. Our investment in code review is high; our investment in circuit breakers is low."
|
|
744
|
+
- **Service hotspots** — "The inventory service boundary has been involved in 4 of the last 10 incidents. This is not bad luck — it is architectural."
|
|
745
|
+
- **Action item completion rate** — "We complete 73% of postmortem action items. The 27% we do not complete are disproportionately 'systemic' items (circuit breakers, load tests) vs 'targeted' items (specific bug fixes)."
|
|
746
|
+
- **Time-to-repeat** — "The mean time between incidents with the same contributing factor is 4.2 months. We are not learning from postmortems at the systemic level."
|
|
747
|
+
|
|
748
|
+
No human maintains this analysis. Postmortems are written, reviewed in a blameless retro, and filed. The patterns between them remain invisible. The agent makes patterns visible, turning individual failures into institutional learning.
|
|
749
|
+
|
|
750
|
+
### Tech debt quantification through incident correlation
|
|
751
|
+
|
|
752
|
+
Tech debt is notoriously hard to prioritize because the cost of not fixing it is speculative — until it causes an incident, at which point it is no longer speculative but also no longer preventable. The agent closes this loop by correlating tech debt items with incidents after the fact:
|
|
753
|
+
|
|
754
|
+
- TD-001 (payment service retry logic) was logged in September 2024 as "medium" severity. PM-001 occurred in March 2025, costing $180K in lost revenue over 47 minutes. The agent retroactively updates TD-001's business impact with real numbers. The next quarterly prioritization discussion starts from "this tech debt has already cost us $180K" rather than "this tech debt might cause problems."
|
|
755
|
+
|
|
756
|
+
Across the full tech debt backlog, the agent calculates: total incident cost attributable to known tech debt, percentage of incidents involving previously identified but unfixed debt, and mean time between debt identification and incident occurrence. These numbers transform tech debt prioritization from opinion-driven to evidence-driven.
|
|
757
|
+
|
|
758
|
+
### ADR chain traversal that maintains decision coherence
|
|
759
|
+
|
|
760
|
+
When a team operates for four years, the ADR chain becomes a complex graph. ADR-001 establishes event sourcing. ADR-017 migrates the event store to Kafka Streams (evolving the storage layer without changing the sourcing model). ADR-023 introduces a new event schema format (constrained by ADR-001's immutability guarantee). RFC-001 proposes replacing custom consumers with Kafka Streams (potentially conflicting with ADR-023's schema assumptions).
|
|
761
|
+
|
|
762
|
+
A new engineer reading RFC-001 would need to trace backward through three ADRs to understand the full constraint landscape. A human reviewer might catch the ADR-023 conflict if they remember it exists. The agent traces every chain automatically and surfaces conflicts that span years of decisions. It answers the question no human can: "What is the complete set of constraints that this proposal must satisfy, including constraints established before anyone currently on the team was here?"
|
|
763
|
+
---
|
|
764
|
+
|
|
765
|
+
Topics:
|
|
766
|
+
- [[domain-compositions]]
|