@bgicli/bgicli 2.2.8 → 2.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (113) hide show
  1. package/data/skills/anthropic-algorithmic-art/SKILL.md +405 -0
  2. package/data/skills/anthropic-canvas-design/SKILL.md +130 -0
  3. package/data/skills/anthropic-claude-api/SKILL.md +243 -0
  4. package/data/skills/anthropic-doc-coauthoring/SKILL.md +375 -0
  5. package/data/skills/anthropic-docx/SKILL.md +590 -0
  6. package/data/skills/anthropic-frontend-design/SKILL.md +42 -0
  7. package/data/skills/anthropic-internal-comms/SKILL.md +32 -0
  8. package/data/skills/anthropic-mcp-builder/SKILL.md +236 -0
  9. package/data/skills/anthropic-pdf/SKILL.md +314 -0
  10. package/data/skills/anthropic-pptx/SKILL.md +232 -0
  11. package/data/skills/anthropic-skill-creator/SKILL.md +485 -0
  12. package/data/skills/anthropic-webapp-testing/SKILL.md +96 -0
  13. package/data/skills/anthropic-xlsx/SKILL.md +292 -0
  14. package/data/skills/arxiv-database/SKILL.md +362 -0
  15. package/data/skills/astropy/SKILL.md +329 -0
  16. package/data/skills/ctx-advanced-evaluation/SKILL.md +402 -0
  17. package/data/skills/ctx-bdi-mental-states/SKILL.md +311 -0
  18. package/data/skills/ctx-context-compression/SKILL.md +272 -0
  19. package/data/skills/ctx-context-degradation/SKILL.md +206 -0
  20. package/data/skills/ctx-context-fundamentals/SKILL.md +201 -0
  21. package/data/skills/ctx-context-optimization/SKILL.md +195 -0
  22. package/data/skills/ctx-evaluation/SKILL.md +251 -0
  23. package/data/skills/ctx-filesystem-context/SKILL.md +287 -0
  24. package/data/skills/ctx-hosted-agents/SKILL.md +260 -0
  25. package/data/skills/ctx-memory-systems/SKILL.md +225 -0
  26. package/data/skills/ctx-multi-agent-patterns/SKILL.md +257 -0
  27. package/data/skills/ctx-project-development/SKILL.md +291 -0
  28. package/data/skills/ctx-tool-design/SKILL.md +271 -0
  29. package/data/skills/dhdna-profiler/SKILL.md +162 -0
  30. package/data/skills/generate-image/SKILL.md +183 -0
  31. package/data/skills/geomaster/SKILL.md +365 -0
  32. package/data/skills/get-available-resources/SKILL.md +275 -0
  33. package/data/skills/hamelsmu-build-review-interface/SKILL.md +96 -0
  34. package/data/skills/hamelsmu-error-analysis/SKILL.md +164 -0
  35. package/data/skills/hamelsmu-eval-audit/SKILL.md +183 -0
  36. package/data/skills/hamelsmu-evaluate-rag/SKILL.md +177 -0
  37. package/data/skills/hamelsmu-generate-synthetic-data/SKILL.md +131 -0
  38. package/data/skills/hamelsmu-validate-evaluator/SKILL.md +212 -0
  39. package/data/skills/hamelsmu-write-judge-prompt/SKILL.md +144 -0
  40. package/data/skills/hf-cli/SKILL.md +174 -0
  41. package/data/skills/hf-mcp/SKILL.md +178 -0
  42. package/data/skills/hugging-face-dataset-viewer/SKILL.md +121 -0
  43. package/data/skills/hugging-face-datasets/SKILL.md +542 -0
  44. package/data/skills/hugging-face-evaluation/SKILL.md +651 -0
  45. package/data/skills/hugging-face-jobs/SKILL.md +1042 -0
  46. package/data/skills/hugging-face-model-trainer/SKILL.md +717 -0
  47. package/data/skills/hugging-face-paper-pages/SKILL.md +239 -0
  48. package/data/skills/hugging-face-paper-publisher/SKILL.md +624 -0
  49. package/data/skills/hugging-face-tool-builder/SKILL.md +110 -0
  50. package/data/skills/hugging-face-trackio/SKILL.md +115 -0
  51. package/data/skills/hugging-face-vision-trainer/SKILL.md +593 -0
  52. package/data/skills/huggingface-gradio/SKILL.md +245 -0
  53. package/data/skills/matlab/SKILL.md +376 -0
  54. package/data/skills/modal/SKILL.md +381 -0
  55. package/data/skills/openai-cloudflare-deploy/SKILL.md +224 -0
  56. package/data/skills/openai-develop-web-game/SKILL.md +149 -0
  57. package/data/skills/openai-doc/SKILL.md +80 -0
  58. package/data/skills/openai-figma/SKILL.md +42 -0
  59. package/data/skills/openai-figma-implement-design/SKILL.md +264 -0
  60. package/data/skills/openai-gh-address-comments/SKILL.md +25 -0
  61. package/data/skills/openai-gh-fix-ci/SKILL.md +69 -0
  62. package/data/skills/openai-imagegen/SKILL.md +174 -0
  63. package/data/skills/openai-jupyter-notebook/SKILL.md +107 -0
  64. package/data/skills/openai-linear/SKILL.md +87 -0
  65. package/data/skills/openai-netlify-deploy/SKILL.md +247 -0
  66. package/data/skills/openai-notion-knowledge-capture/SKILL.md +56 -0
  67. package/data/skills/openai-notion-meeting-intelligence/SKILL.md +60 -0
  68. package/data/skills/openai-notion-research-documentation/SKILL.md +59 -0
  69. package/data/skills/openai-notion-spec-to-implementation/SKILL.md +58 -0
  70. package/data/skills/openai-openai-docs/SKILL.md +69 -0
  71. package/data/skills/openai-pdf/SKILL.md +67 -0
  72. package/data/skills/openai-playwright/SKILL.md +147 -0
  73. package/data/skills/openai-render-deploy/SKILL.md +479 -0
  74. package/data/skills/openai-screenshot/SKILL.md +267 -0
  75. package/data/skills/openai-security-best-practices/SKILL.md +86 -0
  76. package/data/skills/openai-security-ownership-map/SKILL.md +206 -0
  77. package/data/skills/openai-security-threat-model/SKILL.md +81 -0
  78. package/data/skills/openai-sentry/SKILL.md +123 -0
  79. package/data/skills/openai-sora/SKILL.md +178 -0
  80. package/data/skills/openai-speech/SKILL.md +144 -0
  81. package/data/skills/openai-spreadsheet/SKILL.md +145 -0
  82. package/data/skills/openai-transcribe/SKILL.md +81 -0
  83. package/data/skills/openai-vercel-deploy/SKILL.md +77 -0
  84. package/data/skills/openai-yeet/SKILL.md +28 -0
  85. package/data/skills/pennylane/SKILL.md +224 -0
  86. package/data/skills/polars-bio/SKILL.md +374 -0
  87. package/data/skills/primekg/SKILL.md +97 -0
  88. package/data/skills/pymatgen/SKILL.md +689 -0
  89. package/data/skills/qiskit/SKILL.md +273 -0
  90. package/data/skills/qutip/SKILL.md +316 -0
  91. package/data/skills/recursive-decomposition/SKILL.md +185 -0
  92. package/data/skills/rowan/SKILL.md +427 -0
  93. package/data/skills/scholar-evaluation/SKILL.md +298 -0
  94. package/data/skills/sentry-create-alert/SKILL.md +210 -0
  95. package/data/skills/sentry-fix-issues/SKILL.md +126 -0
  96. package/data/skills/sentry-pr-code-review/SKILL.md +105 -0
  97. package/data/skills/sentry-python-sdk/SKILL.md +317 -0
  98. package/data/skills/sentry-setup-ai-monitoring/SKILL.md +217 -0
  99. package/data/skills/stable-baselines3/SKILL.md +297 -0
  100. package/data/skills/sympy/SKILL.md +498 -0
  101. package/data/skills/trailofbits-ask-questions-if-underspecified/SKILL.md +85 -0
  102. package/data/skills/trailofbits-audit-context-building/SKILL.md +302 -0
  103. package/data/skills/trailofbits-differential-review/SKILL.md +220 -0
  104. package/data/skills/trailofbits-insecure-defaults/SKILL.md +117 -0
  105. package/data/skills/trailofbits-modern-python/SKILL.md +333 -0
  106. package/data/skills/trailofbits-property-based-testing/SKILL.md +123 -0
  107. package/data/skills/trailofbits-semgrep-rule-creator/SKILL.md +172 -0
  108. package/data/skills/trailofbits-sharp-edges/SKILL.md +292 -0
  109. package/data/skills/trailofbits-variant-analysis/SKILL.md +142 -0
  110. package/data/skills/transformers.js/SKILL.md +637 -0
  111. package/data/skills/writing/SKILL.md +419 -0
  112. package/dist/bgi.js +66 -2
  113. package/package.json +1 -1
@@ -0,0 +1,260 @@
1
+ ---
2
+ name: hosted-agents
3
+ description: This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.
4
+ ---
5
+
6
+ # Hosted Agent Infrastructure
7
+
8
+ Hosted agents run in remote sandboxed environments rather than on local machines. When designed well, they provide unlimited concurrency, consistent execution environments, and multiplayer collaboration. The critical insight is that session speed should be limited only by model provider time-to-first-token, with all infrastructure setup completed before the user starts their session.
9
+
10
+ ## When to Activate
11
+
12
+ Activate this skill when:
13
+ - Building background coding agents that run independently of user devices
14
+ - Designing sandboxed execution environments for agent workloads
15
+ - Implementing multiplayer agent sessions with shared state
16
+ - Creating multi-client agent interfaces (Slack, Web, Chrome extensions)
17
+ - Scaling agent infrastructure beyond local machine constraints
18
+ - Building systems where agents spawn sub-agents for parallel work
19
+
20
+ ## Core Concepts
21
+
22
+ Move agent execution to remote sandboxed environments to eliminate the fundamental limits of local execution: resource contention, environment inconsistency, and single-user constraints. Remote sandboxes unlock unlimited concurrency, reproducible environments, and collaborative workflows because each session gets its own isolated compute with a known-good environment image.
23
+
24
+ Design the architecture in three layers because each layer scales independently. Build sandbox infrastructure for isolated execution, an API layer for state management and client coordination, and client interfaces for user interaction across platforms. Keep these layers cleanly separated so sandbox changes do not ripple into clients.
25
+
26
+ ## Detailed Topics
27
+
28
+ ### Sandbox Infrastructure
29
+
30
+ **The Core Challenge**
31
+ Eliminate sandbox spin-up latency because users perceive anything over a few seconds as broken. Development environments require cloning repositories, installing dependencies, and running build steps -- do all of this before the user ever submits a prompt.
32
+
33
+ **Image Registry Pattern**
34
+ Pre-build environment images on a regular cadence (every 30 minutes works well) because this makes synchronization with the latest code a fast delta rather than a full clone. Include in each image:
35
+ - Cloned repository at a known commit
36
+ - All runtime dependencies installed
37
+ - Initial setup and build commands completed
38
+ - Cached files from running app and test suite once
39
+
40
+ When starting a session, spin up a sandbox from the most recent image. The repository is at most 30 minutes out of date, making the remaining git sync fast.
41
+
42
+ **Snapshot and Restore**
43
+ Take filesystem snapshots at key points to enable instant restoration for follow-up prompts without re-running setup:
44
+ - After initial image build (base snapshot)
45
+ - When agent finishes making changes (session snapshot)
46
+ - Before sandbox exit for potential follow-up
47
+
48
+ **Git Configuration for Background Agents**
49
+ Configure git identity explicitly in every sandbox because background agents are not tied to a specific user during image builds:
50
+ - Generate GitHub app installation tokens for repository access during clone
51
+ - Set git config `user.name` and `user.email` when committing and pushing changes
52
+ - Use the prompting user's identity for commits, not the app identity
53
+
54
+ **Warm Pool Strategy**
55
+ Maintain a pool of pre-warmed sandboxes for high-volume repositories because cold starts are the primary source of user frustration:
56
+ - Keep sandboxes ready before users start sessions
57
+ - Expire and recreate pool entries as new image builds complete
58
+ - Start warming a sandbox as soon as a user begins typing (predictive warm-up)
59
+
60
+ ### Agent Framework Selection
61
+
62
+ **Server-First Architecture**
63
+ Structure the agent framework as a server first, with TUI and desktop apps as thin clients, because this prevents duplicating agent logic across surfaces:
64
+ - Multiple custom clients share one agent backend
65
+ - Consistent behavior across all interaction surfaces
66
+ - Plugin systems extend functionality without client changes
67
+ - Event-driven architectures deliver real-time updates to any connected client
68
+
69
+ **Code as Source of Truth**
70
+ Select frameworks where the agent can read its own source code to understand behavior. Prioritize this because having code as source of truth prevents the agent from hallucinating about its own capabilities -- an underrated failure mode in AI development.
71
+
72
+ **Plugin System Requirements**
73
+ Require a plugin system that supports runtime interception because this enables safety controls and observability without modifying core agent logic:
74
+ - Listen to tool execution events (e.g., `tool.execute.before`)
75
+ - Block or modify tool calls conditionally
76
+ - Inject context or state at runtime
77
+
78
+ ### Speed Optimizations
79
+
80
+ **Predictive Warm-Up**
81
+ Start warming the sandbox as soon as a user begins typing their prompt, not when they submit it, because the typing interval (5-30 seconds) is enough to complete most setup:
82
+ - Clone latest changes in parallel with user typing
83
+ - Run initial setup before user hits enter
84
+ - For fast spin-up, sandbox can be ready before user finishes typing
85
+
86
+ **Parallel File Reading**
87
+ Allow the agent to start reading files immediately even if sync from latest base branch is not complete, because in large repositories incoming prompts rarely touch recently-changed files:
88
+ - Agent can research immediately without waiting for git sync
89
+ - Block file edits (not reads) until synchronization completes
90
+ - This separation is safe because read-time data staleness of 30 minutes rarely matters for research
91
+
92
+ **Maximize Build-Time Work**
93
+ Move everything possible to the image build step because build-time duration is invisible to users:
94
+ - Full dependency installation
95
+ - Database schema setup
96
+ - Initial app and test suite runs (populates caches)
97
+
98
+ ### Self-Spawning Agents
99
+
100
+ **Agent-Spawned Sessions**
101
+ Build tools that allow agents to spawn new sessions because frontier models are capable of decomposing work and coordinating sub-tasks:
102
+ - Research tasks across different repositories
103
+ - Parallel subtask execution for large changes
104
+ - Multiple smaller PRs from one major task
105
+
106
+ Expose three primitives: start a new session with specified parameters, read status of any session (check-in capability), and continue main work while sub-sessions run in parallel.
107
+
108
+ **Prompt Engineering for Self-Spawning**
109
+ Engineer prompts that guide when agents should spawn sub-sessions rather than doing work inline:
110
+ - Research tasks that require cross-repository exploration
111
+ - Breaking monolithic changes into smaller PRs
112
+ - Parallel exploration of different approaches
113
+
114
+ ### API Layer
115
+
116
+ **Per-Session State Isolation**
117
+ Isolate state per session (SQLite per session works well) because cross-session interference is a subtle and hard-to-debug failure mode:
118
+ - Dedicated database per session
119
+ - No session can impact another's performance
120
+ - Architecture handles hundreds of concurrent sessions
121
+
122
+ **Real-Time Streaming**
123
+ Stream all agent work in real-time because high-frequency feedback is critical for user trust:
124
+ - Token streaming from model providers
125
+ - Tool execution status updates
126
+ - File change notifications
127
+
128
+ Use WebSocket connections with hibernation APIs to reduce compute costs during idle periods while maintaining open connections.
129
+
130
+ **Synchronization Across Clients**
131
+ Build a single state system that synchronizes across all clients (chat interfaces, Slack bots, Chrome extensions, web interfaces, VS Code instances) because users switch surfaces frequently and expect continuity. All changes sync to the session state, enabling seamless client switching.
132
+
133
+ ### Multiplayer Support
134
+
135
+ **Why Multiplayer Matters**
136
+ Design for multiplayer from day one because it is nearly free to add with proper synchronization architecture, and it unlocks high-value workflows:
137
+ - Teaching non-engineers to use AI effectively
138
+ - Live QA sessions with multiple team members
139
+ - Real-time PR review with immediate changes
140
+ - Collaborative debugging sessions
141
+
142
+ **Implementation Requirements**
143
+ Build the data model so sessions are not tied to single authors because multiplayer fails silently if authorship is hardcoded:
144
+ - Pass authorship info to each prompt
145
+ - Attribute code changes to the prompting user
146
+ - Share session links for instant collaboration
147
+
148
+ ### Authentication and Authorization
149
+
150
+ **User-Based Commits**
151
+ Use GitHub authentication to open PRs on behalf of the user (not the app) because this preserves the audit trail and prevents users from approving their own AI-generated changes:
152
+ - Obtain user tokens for PR creation
153
+ - PRs appear as authored by the human, not the bot
154
+
155
+ **Sandbox-to-API Flow**
156
+ Follow this sequence because it keeps sandbox permissions minimal while letting the API handle sensitive operations:
157
+ 1. Sandbox pushes changes (updating git user config)
158
+ 2. Sandbox sends event to API with branch name and session ID
159
+ 3. API uses user's GitHub token to create PR
160
+ 4. GitHub webhooks notify API of PR events
161
+
162
+ ### Client Implementations
163
+
164
+ **Slack Integration**
165
+ Prioritize Slack as the first distribution channel for internal adoption because it creates a virality loop as team members see others using it:
166
+ - No syntax required, natural chat interface
167
+ - Build a classifier (fast model with repo descriptions) to determine which repository to work in
168
+ - Include hints for common repositories; allow "unknown" for ambiguous cases
169
+
170
+ **Web Interface**
171
+ Build a web interface with these features because it serves as the primary power-user surface:
172
+ - Real-time streaming of agent work on desktop and mobile
173
+ - Hosted VS Code instance running inside sandbox
174
+ - Streamed desktop view for visual verification
175
+ - Before/after screenshots for PRs
176
+ - Statistics page: sessions resulting in merged PRs (primary metric), usage over time, live "humans prompting" count
177
+
178
+ **Chrome Extension**
179
+ Build a Chrome extension for non-engineering users because DOM and React internals extraction gives higher precision than raw screenshots at lower token cost:
180
+ - Sidebar chat interface with screenshot tool
181
+ - Extract DOM/React internals instead of raw images
182
+ - Distribute via managed device policy (bypasses Chrome Web Store)
183
+
184
+ ## Practical Guidance
185
+
186
+ ### Follow-Up Message Handling
187
+
188
+ Choose between queueing and inserting follow-up messages sent during execution. Prefer queueing because it is simpler to manage and lets users send thoughts on next steps while the agent works. Build a mechanism to stop the agent mid-execution when needed, because without it users feel trapped.
189
+
190
+ ### Metrics That Matter
191
+
192
+ Track these metrics because they indicate real value rather than vanity usage:
193
+ - Sessions resulting in merged PRs (primary success metric)
194
+ - Time from session start to first model response
195
+ - PR approval rate and revision count
196
+ - Agent-written code percentage across repositories
197
+
198
+ ### Adoption Strategy
199
+
200
+ Drive internal adoption through visibility rather than mandates because forced usage breeds resentment:
201
+ - Work in public spaces (Slack channels) for visibility
202
+ - Let the product create virality loops
203
+ - Do not force usage over existing tools
204
+ - Build to people's needs, not hypothetical requirements
205
+
206
+ ## Guidelines
207
+
208
+ 1. Pre-build environment images on regular cadence (30 minutes is a good default)
209
+ 2. Start warming sandboxes when users begin typing, not when they submit
210
+ 3. Allow file reads before git sync completes; block only writes
211
+ 4. Structure agent framework as server-first with clients as thin wrappers
212
+ 5. Isolate state per session to prevent cross-session interference
213
+ 6. Attribute commits to the user who prompted, not the app
214
+ 7. Track merged PRs as primary success metric
215
+ 8. Build for multiplayer from the start; it is nearly free with proper sync architecture
216
+
217
+ ## Gotchas
218
+
219
+ 1. **Cold start latency**: First sandbox spin-up takes 30-60s and users perceive this as broken. Use warm pools and predictive warm-up on keystroke to eliminate perceived wait time.
220
+ 2. **Image staleness**: Infrequent image rebuilds mean agents run with outdated dependencies or code. Set a 30-minute rebuild cadence and monitor image age; alert if builds fail silently.
221
+ 3. **Sandbox cost runaway**: Long-running agents without timeout or budget caps accumulate unexpected costs. Set hard timeout limits (default 4 hours) and per-session cost ceilings.
222
+ 4. **Auth token expiration mid-session**: Long tasks fail when GitHub tokens expire partway through. Implement token refresh logic and check token validity before sensitive operations like PR creation.
223
+ 5. **Git config in sandboxes**: Missing `user.name` or `user.email` causes commit failures in background agents. Always set git identity explicitly during sandbox configuration, never assume it carries over from the image.
224
+ 6. **State loss on sandbox recycle**: Agents lose completed work if the sandbox is recycled or times out before results are extracted. Always snapshot before termination and extract artifacts (branches, PRs, files) before letting the sandbox die.
225
+ 7. **Oversubscribing warm pools**: Maintaining too many warm sandboxes wastes money during low-traffic periods. Scale pool size based on traffic patterns and time-of-day; use autoscaling rather than fixed pool sizes.
226
+ 8. **Missing output extraction**: Agents complete work inside the sandbox but results never get pulled out to the user. Build explicit extraction steps (push branch, create PR, return file contents) into the session teardown flow.
227
+
228
+ ## Integration
229
+
230
+ This skill builds on multi-agent-patterns for agent coordination and tool-design for agent-tool interfaces. It connects to:
231
+
232
+ - multi-agent-patterns - Self-spawning agents follow supervisor patterns
233
+ - tool-design - Building tools for agent spawning and status checking
234
+ - context-optimization - Managing context across distributed sessions
235
+ - filesystem-context - Using filesystem for session state and artifacts
236
+
237
+ ## References
238
+
239
+ Internal reference:
240
+ - [Infrastructure Patterns](./references/infrastructure-patterns.md) - Read when: implementing sandbox lifecycle, image builds, or warm pool logic for the first time
241
+
242
+ Related skills in this collection:
243
+ - multi-agent-patterns - Read when: designing self-spawning or supervisor coordination patterns
244
+ - tool-design - Read when: building tools for agent session management or status checking
245
+ - context-optimization - Read when: context windows fill up across distributed agent sessions
246
+
247
+ External resources:
248
+ - [Ramp](https://builders.ramp.com/post/why-we-built-our-background-agent) - Read when: evaluating whether to build vs. buy background agent infrastructure
249
+ - [Modal Sandboxes](https://modal.com/docs/guide/sandbox) - Read when: choosing a cloud sandbox provider or comparing isolation models
250
+ - [Cloudflare Durable Objects](https://developers.cloudflare.com/durable-objects/) - Read when: designing per-session state management with WebSocket hibernation
251
+ - [OpenCode](https://github.com/sst/opencode) - Read when: selecting a server-first agent framework or studying plugin architectures
252
+
253
+ ---
254
+
255
+ ## Skill Metadata
256
+
257
+ **Created**: 2026-01-12
258
+ **Last Updated**: 2026-03-17
259
+ **Author**: Agent Skills for Context Engineering Contributors
260
+ **Version**: 1.1.0
@@ -0,0 +1,225 @@
1
+ ---
2
+ name: memory-systems
3
+ description: >
4
+ Guides implementation of agent memory systems, compares production frameworks
5
+ (Mem0, Zep/Graphiti, Letta, LangMem, Cognee), and designs persistence architectures
6
+ for cross-session knowledge retention. Use when the user asks to "implement
7
+ agent memory", "persist state across sessions", "build knowledge graph for agents",
8
+ "track entities over time", "add long-term memory", "choose a memory framework",
9
+ or mentions temporal knowledge graphs, vector stores, entity memory, adaptive memory, dynamic memory or memory benchmarks (LoCoMo, LongMemEval).
10
+ ---
11
+
12
+ # Memory System Design
13
+
14
+ Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.
15
+
16
+ ## When to Activate
17
+
18
+ Activate this skill when:
19
+ - Building agents that must persist knowledge across sessions
20
+ - Choosing between memory frameworks (Mem0, Zep/Graphiti, Letta, LangMem, Cognee)
21
+ - Needing to maintain entity consistency across conversations
22
+ - Implementing reasoning over accumulated knowledge
23
+ - Designing memory architectures that scale in production
24
+ - Evaluating memory systems against benchmarks (LoCoMo, LongMemEval, DMR)
25
+ - Building dynamic memory with automatic entity/relationship extraction and self-improving memory (Cognee)
26
+
27
+ ## Core Concepts
28
+
29
+ Think of memory as a spectrum from volatile context window to persistent storage. Default to the simplest layer that meets retrieval needs, because benchmark evidence shows **tool complexity matters less than reliable retrieval** — Letta's filesystem agents scored 74% on LoCoMo using basic file operations, beating Mem0's specialized tools at 68.5%. Add structure (graphs, temporal validity) only when retrieval quality degrades or the agent needs multi-hop reasoning, relationship traversal, or time-travel queries.
30
+
31
+ ## Detailed Topics
32
+
33
+ ### Production Framework Landscape
34
+
35
+ Select a framework based on the dominant retrieval pattern the agent requires. Use this table to narrow the shortlist, then validate with the benchmark data below.
36
+
37
+ | Framework | Architecture | Best For | Trade-off |
38
+ |-----------|-------------|----------|-----------|
39
+ | **Mem0** | Vector store + graph memory, pluggable backends | Multi-tenant systems, broad integrations | Less specialized for multi-agent |
40
+ | **Zep/Graphiti** | Temporal knowledge graph, bi-temporal model | Enterprise requiring relationship modeling + temporal reasoning | Advanced features cloud-locked |
41
+ | **Letta** | Self-editing memory with tiered storage (in-context/core/archival) | Full agent introspection, stateful services | Complexity for simple use cases |
42
+ | **Cognee** | Multi-layer semantic graph via customizable ECL pipeline with customizable Tasks | Evolving agent memory that adapts and learns; multi-hop reasoning | Heavier ingest-time processing |
43
+ | **LangMem** | Memory tools for LangGraph workflows | Teams already on LangGraph | Tightly coupled to LangGraph |
44
+ | **File-system** | Plain files with naming conventions | Simple agents, prototyping | No semantic search, no relationships |
45
+
46
+ Choose Zep/Graphiti when the agent needs bi-temporal modeling (tracking both when events occurred and when they were ingested) because its three-tier knowledge graph (episode, semantic entity, community subgraphs) excels at temporal queries. Choose Mem0 when the priority is fast time-to-production with managed infrastructure. Choose Letta when the agent needs deep self-introspection through its Agent Development Environment. Choose Cognee when the agent must build dense multi-layer semantic graphs — it layers text chunks and entity types as nodes with detailed relationship edges, and every core piece (ingestion, entity extraction, post-processing, retrieval) is customizable.
47
+
48
+ **Benchmark Performance Comparison**
49
+
50
+ Consult these benchmarks to set expectations, but treat them as signals for specific retrieval dimensions rather than absolute rankings. No single benchmark is definitive.
51
+
52
+ | System | DMR Accuracy | LoCoMo | HotPotQA (multi-hop) | Latency |
53
+ |--------|-------------|--------|---------------------|---------|
54
+ | Cognee | — | — | Highest on EM, F1, Correctness | Variable |
55
+ | Zep (Temporal KG) | 94.8% | — | Mid-range across metrics | 2.58s |
56
+ | Letta (filesystem) | — | 74.0% | — | — |
57
+ | Mem0 | — | 68.5% | Lowest across metrics | — |
58
+ | MemGPT | 93.4% | — | — | Variable |
59
+ | GraphRAG | ~75-85% | — | — | Variable |
60
+ | Vector RAG baseline | ~60-70% | — | — | Fast |
61
+
62
+ Key takeaways: Zep achieves up to 18.5% accuracy improvement on LongMemEval while reducing latency by 90%. Cognee outperformed Mem0, Graphiti, and LightRAG on HotPotQA multi-hop reasoning benchmarks across Exact Match, F1, and human-like correctness metrics. Letta's filesystem-based agents achieved 74% on LoCoMo using basic file operations, confirming that reliable retrieval beats tool sophistication.
63
+
64
+ ### Memory Layers (Decision Points)
65
+
66
+ Pick the shallowest memory layer that satisfies the persistence requirement. Each deeper layer adds infrastructure cost and operational complexity, so only escalate when the shallower layer cannot meet the retrieval or durability need.
67
+
68
+ | Layer | Persistence | Implementation | When to Use |
69
+ |-------|------------|----------------|-------------|
70
+ | **Working** | Context window only | Scratchpad in system prompt | Always — optimize with attention-favored positions |
71
+ | **Short-term** | Session-scoped | File-system, in-memory cache | Intermediate tool results, conversation state |
72
+ | **Long-term** | Cross-session | Key-value store → graph DB | User preferences, domain knowledge, entity registries |
73
+ | **Entity** | Cross-session | Entity registry + properties | Maintaining identity ("John Doe" = same person across conversations) |
74
+ | **Temporal KG** | Cross-session + history | Graph with validity intervals | Facts that change over time, time-travel queries, preventing context clash |
75
+
76
+ ### Retrieval Strategies
77
+
78
+ Match the retrieval strategy to the query shape. Semantic search handles direct factual lookups well but degrades on multi-hop reasoning; entity-based traversal handles "everything about X" queries but requires graph structure; temporal filtering handles changing facts but requires validity metadata. When accuracy is paramount and infrastructure budget allows, combine strategies into hybrid retrieval.
79
+
80
+ | Strategy | Use When | Limitation |
81
+ |----------|----------|------------|
82
+ | **Semantic** (embedding similarity) | Direct factual queries | Degrades on multi-hop reasoning |
83
+ | **Entity-based** (graph traversal) | "Tell me everything about X" | Requires graph structure |
84
+ | **Temporal** (validity filter) | Facts change over time | Requires validity metadata |
85
+ | **Hybrid** (semantic + keyword + graph) | Best overall accuracy | Most infrastructure |
86
+
87
+ Zep's hybrid approach achieves 90% latency reduction (2.58s vs 28.9s) by retrieving only relevant subgraphs. Cognee implements hybrid retrieval through its 14 search modes — each mode combines different strategies from its three-store architecture (graph, vector, relational), letting agents select the retrieval strategy that fits the query type rather than using a one-size-fits-all approach.
88
+
89
+ ### Memory Consolidation
90
+
91
+ Run consolidation periodically to prevent unbounded growth, because unchecked memory accumulation degrades retrieval quality over time. **Invalidate but do not discard** — preserving history matters for temporal queries that need to reconstruct past states. Trigger consolidation on memory count thresholds, degraded retrieval quality, or scheduled intervals. See [Implementation Reference](./references/implementation.md) for working consolidation code.
92
+
93
+ ## Practical Guidance
94
+
95
+ ### Choosing a Memory Architecture
96
+
97
+ **Start with the simplest viable layer and add complexity only when retrieval quality degrades.** Most agents do not need a temporal knowledge graph on day one. Follow this escalation path:
98
+
99
+ 1. **Prototype**: Use file-system memory. Store facts as structured JSON with timestamps. This validates agent behavior before committing to infrastructure.
100
+ 2. **Scale**: Move to Mem0 or a vector store with metadata when the agent needs semantic search and multi-tenant isolation, because file-based lookup cannot handle similarity queries.
101
+ 3. **Complex reasoning**: Add Zep/Graphiti when the agent needs relationship traversal, temporal validity, or cross-session synthesis. Graphiti uses structured ties with generic relations, keeping graphs simple and easy to reason about; Cognee builds denser multi-layer semantic graphs with detailed relationship edges — choose based on whether the agent needs temporal bi-modeling (Graphiti) or richer interconnected knowledge structures (Cognee).
102
+ 4. **Full control**: Use Letta or Cognee when the agent must self-manage its own memory with deep introspection, because these frameworks expose memory operations as first-class agent actions.
103
+
104
+ ### Integration with Context
105
+
106
+ Load memories just-in-time rather than preloading everything, because large context payloads are expensive and degrade attention quality. Place retrieved memories in attention-favored positions (beginning or end of context) to maximize their influence on generation.
107
+
108
+ ### Error Recovery
109
+
110
+ Handle retrieval failures gracefully because memory systems are inherently noisy. Apply these recovery strategies in order:
111
+
112
+ - **Empty retrieval**: Fall back to broader search (remove entity filter, widen time range). If still empty, prompt user for clarification.
113
+ - **Stale results**: Check `valid_until` timestamps. If most results are expired, trigger consolidation before retrying.
114
+ - **Conflicting facts**: Prefer the fact with the most recent `valid_from`. Surface the conflict to the user if confidence is low.
115
+ - **Storage failure**: Queue writes for retry. Never block the agent's response on a memory write.
116
+
117
+ ## Examples
118
+
119
+ **Example 1: Mem0 Integration**
120
+ ```python
121
+ from mem0 import Memory
122
+
123
+ m = Memory()
124
+ m.add("User prefers dark mode and Python 3.12", user_id="alice")
125
+ m.add("User switched to light mode", user_id="alice")
126
+
127
+ # Retrieves current preference (light mode), not outdated one
128
+ results = m.search("What theme does the user prefer?", user_id="alice")
129
+ ```
130
+
131
+ **Example 2: Temporal Query**
132
+ ```python
133
+ # Track entity with validity periods
134
+ graph.create_temporal_relationship(
135
+ source_id=user_node,
136
+ rel_type="LIVES_AT",
137
+ target_id=address_node,
138
+ valid_from=datetime(2024, 1, 15),
139
+ valid_until=datetime(2024, 9, 1), # moved out
140
+ )
141
+
142
+ # Query: Where did user live on March 1, 2024?
143
+ results = graph.query_at_time(
144
+ {"type": "LIVES_AT", "source_label": "User"},
145
+ query_time=datetime(2024, 3, 1)
146
+ )
147
+ ```
148
+
149
+ **Example 3: Cognee Memory Ingestion and Search**
150
+ ```python
151
+ import cognee
152
+ from cognee.modules.search.types import SearchType
153
+
154
+ # Ingest and build knowledge graph
155
+ await cognee.add("./docs/")
156
+ await cognee.add("any data")
157
+ await cognee.cognify()
158
+
159
+ # Enrich memory
160
+ await cognee.memify()
161
+
162
+ # Agent retrieves relationship-aware context
163
+ results = await cognee.search(
164
+ query_text="Any query for your memory",
165
+ query_type=SearchType.GRAPH_COMPLETION,
166
+ )
167
+ ```
168
+
169
+ ## Guidelines
170
+
171
+ 1. Start with file-system memory; add complexity only when retrieval quality demands it
172
+ 2. Track temporal validity for any fact that can change over time
173
+ 3. Use hybrid retrieval (semantic + keyword + graph) for best accuracy
174
+ 4. Consolidate memories periodically — invalidate but don't discard
175
+ 5. Design for retrieval failure: always have a fallback when memory lookup returns nothing
176
+ 6. Consider privacy implications of persistent memory (retention policies, deletion rights)
177
+ 7. Benchmark your memory system against LoCoMo or LongMemEval before and after changes
178
+ 8. Monitor memory growth and retrieval latency in production
179
+
180
+ ## Gotchas
181
+
182
+ 1. **Stuffing everything into context**: Loading all available memories into the prompt is expensive and degrades attention quality. Use just-in-time retrieval with relevance filtering instead.
183
+ 2. **Ignoring temporal validity**: Facts go stale. Without validity tracking, outdated information poisons the context and the agent acts on wrong assumptions.
184
+ 3. **Over-engineering early**: A filesystem agent can outperform complex memory tooling (Letta scored 74% vs Mem0's 68.5% on LoCoMo). Add sophistication only when simple approaches demonstrably fail.
185
+ 4. **No consolidation strategy**: Unbounded memory growth degrades retrieval quality over time. Set memory count thresholds or scheduled intervals to trigger consolidation.
186
+ 5. **Embedding model mismatch**: Writing memories with one embedding model and reading with another produces poor retrieval because vector spaces are not interchangeable. Pin a single embedding model for each memory store and re-embed all entries if the model changes.
187
+ 6. **Graph schema rigidity**: Over-structured graph schemas (rigid node types, fixed relationship labels) break when the domain evolves. Prefer generic relation types and flexible property bags so new entity kinds do not require schema migrations.
188
+ 7. **Stale memory poisoning**: Old memories that contradict the current state corrupt agent behavior silently. Implement expiry policies or confidence decay so the agent deprioritizes aged facts, and surface contradictions explicitly when detected.
189
+ 8. **Memory-context mismatch**: Retrieving memories that are topically related but contextually wrong (e.g., a memory about "Python" the snake when the agent is discussing Python the language). Mitigate by including session or domain metadata in memory entries and filtering on it during retrieval.
190
+
191
+ ## Integration
192
+
193
+ This skill builds on context-fundamentals. It connects to:
194
+
195
+ - multi-agent-patterns - Shared memory across agents
196
+ - context-optimization - Memory-based context loading
197
+ - evaluation - Evaluating memory quality
198
+
199
+ ## References
200
+
201
+ Internal references:
202
+ - [Implementation Reference](./references/implementation.md) - Read when: implementing vector stores, property graphs, temporal queries, or memory consolidation logic from scratch
203
+
204
+ Related skills in this collection:
205
+ - context-fundamentals - Read when: designing the context layer that memory feeds into
206
+ - multi-agent-patterns - Read when: multiple agents need to share or coordinate memory state
207
+
208
+ External resources:
209
+ - Zep temporal knowledge graph paper (arXiv:2501.13956) - Read when: evaluating bi-temporal modeling or Graphiti's architecture
210
+ - Mem0 production architecture paper (arXiv:2504.19413) - Read when: assessing managed memory infrastructure trade-offs
211
+ - Cognee optimized knowledge graph + LLM reasoning paper (arXiv:2505.24478) - Read when: comparing multi-layer semantic graph approaches
212
+ - LoCoMo benchmark (Snap Research) - Read when: evaluating long-conversation memory retention
213
+ - MemBench evaluation framework (ACL 2025) - Read when: designing memory evaluation suites
214
+ - Graphiti open-source temporal KG engine (github.com/getzep/graphiti) - Read when: implementing temporal knowledge graphs
215
+ - Cognee open-source knowledge graph memory (github.com/topoteretes/cognee) - Read when: building customizable ECL pipelines for memory
216
+ - [Cognee comparison: Form vs Function](https://www.cognee.ai/blog/deep-dives/competition-comparison-form-vs-function) - Read when: comparing graph structures across Mem0, Graphiti, LightRAG, Cognee
217
+
218
+ ---
219
+
220
+ ## Skill Metadata
221
+
222
+ **Created**: 2025-12-20
223
+ **Last Updated**: 2026-03-17
224
+ **Author**: Agent Skills for Context Engineering Contributors
225
+ **Version**: 4.0.0