agent-mind 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +229 -0
  3. package/bin/cli.js +38 -0
  4. package/package.json +33 -0
  5. package/src/commands/doctor.js +269 -0
  6. package/src/commands/init.js +345 -0
  7. package/src/commands/meta.js +34 -0
  8. package/src/commands/upgrade.js +177 -0
  9. package/src/index.js +18 -0
  10. package/src/utils/detect-tools.js +62 -0
  11. package/src/utils/inject-adapter.js +65 -0
  12. package/src/utils/template.js +103 -0
  13. package/src/utils/version.js +71 -0
  14. package/template/.am-tools/compact.sh +171 -0
  15. package/template/.am-tools/guide.md +274 -0
  16. package/template/.am-tools/health-check.sh +165 -0
  17. package/template/.am-tools/validate.sh +174 -0
  18. package/template/BOOT.md +71 -0
  19. package/template/README.md +109 -0
  20. package/template/VERSION.md +57 -0
  21. package/template/adapters/claude.md +56 -0
  22. package/template/adapters/codex.md +33 -0
  23. package/template/adapters/cursor.md +35 -0
  24. package/template/adapters/gemini.md +32 -0
  25. package/template/config.md +33 -0
  26. package/template/history/episodes/_index.md +13 -0
  27. package/template/history/maintenance-log.md +9 -0
  28. package/template/history/reflections/_index.md +11 -0
  29. package/template/knowledge/domains/_template/failures/_index.md +19 -0
  30. package/template/knowledge/domains/_template/patterns.md +21 -0
  31. package/template/knowledge/insights.md +23 -0
  32. package/template/knowledge/stack/_template.md +20 -0
  33. package/template/protocols/compaction.md +101 -0
  34. package/template/protocols/maintenance.md +99 -0
  35. package/template/protocols/memory-ops.md +89 -0
  36. package/template/protocols/quality-gate.md +66 -0
  37. package/template/protocols/workflow.md +81 -0
  38. package/template/workspace/.gitkeep +0 -0
@@ -0,0 +1,35 @@
1
+ # Cursor Integration
2
+
3
+ ## Setup
4
+
5
+ Create `.cursor/rules/agent-mind.md` in your project root:
6
+
7
+ ```markdown
8
+ ## Agent Mind Memory System
9
+ This project uses Agent Mind for structured memory management.
10
+ At the start of every session, read `.agent-mind/BOOT.md` and follow its protocols.
11
+ Use `.agent-mind/workspace/` as working memory for the current task.
12
+ After completing a task, follow `.agent-mind/protocols/compaction.md`.
13
+ When asked about memory health, follow `.agent-mind/protocols/maintenance.md`.
14
+ ```
15
+
16
+ Alternatively, add the snippet to an existing `.cursorrules` file in your project root.
17
+
18
+ ## How It Works
19
+
20
+ - Cursor reads `.cursor/rules/*.md` and `.cursorrules` at session start
21
+ - The snippet points Cursor to the `.agent-mind/` system
22
+ - Cursor can read project files, so all `.agent-mind/` content is accessible
23
+
24
+ ## Coexistence
25
+
26
+ - Cursor has its own rules system and "Memory Bank" community patterns
27
+ - Agent Mind provides a more structured approach with explicit protocols
28
+ - If you use Cursor's Memory Bank pattern (productContext.md etc.), Agent Mind's `knowledge/` serves a similar but more rigorous purpose
29
+ - You can use both — they don't conflict
30
+
31
+ ## Cursor-Specific Tips
32
+
33
+ - Cursor's composer and chat modes both read rules files
34
+ - In composer mode, the full workflow protocol may be too heavy. Consider a lighter version for quick edits.
35
+ - Cursor works well with the warm-tier knowledge files — it can load domain patterns alongside your code context
@@ -0,0 +1,32 @@
1
+ # Gemini CLI Integration
2
+
3
+ ## Setup
4
+
5
+ Add this block to your project's `GEMINI.md`:
6
+
7
+ ```markdown
8
+ ## Agent Mind Memory System
9
+ This project uses Agent Mind for structured memory management.
10
+ At the start of every session, read `.agent-mind/BOOT.md` and follow its protocols.
11
+ Use `.agent-mind/workspace/` as working memory for the current task.
12
+ After completing a task, follow `.agent-mind/protocols/compaction.md`.
13
+ When asked about memory health, follow `.agent-mind/protocols/maintenance.md`.
14
+ ```
15
+
16
+ ## How It Works
17
+
18
+ - Gemini CLI reads `GEMINI.md` at session start
19
+ - The snippet points Gemini to the `.agent-mind/` system
20
+ - Gemini CLI can read files from the filesystem, so all content is accessible
21
+
22
+ ## Coexistence
23
+
24
+ - Gemini CLI has its own instruction system via GEMINI.md
25
+ - Agent Mind provides structured persistent memory that Gemini lacks natively
26
+ - Particularly valuable for Gemini since its native memory across sessions is limited
27
+
28
+ ## Gemini-Specific Tips
29
+
30
+ - Gemini's instruction following varies by model tier. Gemini Pro follows protocols well. Flash may skip steps.
31
+ - Keep the GEMINI.md snippet short — let BOOT.md handle the detailed instructions
32
+ - Gemini handles structured markdown well. The format of Agent Mind files is compatible.
@@ -0,0 +1,33 @@
1
+ # Agent Mind — Configuration
2
+
3
+ ## Project
4
+ - **Name:** {{PROJECT_NAME}}
5
+ - **Description:** {{PROJECT_DESCRIPTION}}
6
+ - **Created:** {{DATE}}
7
+
8
+ ## Domains
9
+ Knowledge domains relevant to this project. Each domain listed here should have a folder in `knowledge/domains/`.
10
+
11
+ {{DOMAINS}}
12
+
13
+ ## Stack
14
+ Technologies used in this project. Each entry can have a matching file in `knowledge/stack/`.
15
+
16
+ {{STACK}}
17
+
18
+ ## Agent Preferences
19
+ - **Primary agent:** {{PRIMARY_AGENT}}
20
+ - **Thinking depth:** adaptive (scale protocol depth to task size)
21
+ - **Memory writes:** quality-gated (all writes to knowledge/ pass through protocols/quality-gate.md)
22
+ - **Maintenance frequency:** every 2 weeks or on request
23
+
24
+ ## Project Context
25
+ Add project-specific context the agent should know across all sessions. Things like: architecture decisions, team conventions, important constraints, deployment targets.
26
+
27
+ (Add as the project progresses. Keep under 50 lines — this loads every session as hot memory.)
28
+
29
+ ## Notes
30
+ - This file loads every session (hot tier). Keep it concise.
31
+ - For domain-specific knowledge, use `knowledge/domains/` instead.
32
+ - For tech-specific knowledge, use `knowledge/stack/` instead.
33
+ - This file is for project-wide context that doesn't fit elsewhere.
@@ -0,0 +1,13 @@
1
+ # Episode Index
2
+
3
+ Task history. One line per completed task. Append-only — never delete entries, only archive old ones during maintenance.
4
+
5
+ **Format:** `YYYY-MM-DD | domain(s) | outcome | task-slug | One-line summary`
6
+
7
+ **Outcomes:** completed | failed | abandoned
8
+
9
+ ---
10
+
11
+ <!-- Episodes will accumulate here as tasks are completed and compacted.
12
+ Search this file to find related past work before starting a new task.
13
+ During maintenance, entries older than 90 days may be moved to _archive.md. -->
@@ -0,0 +1,9 @@
1
+ # Maintenance Log
2
+
3
+ Record of every maintenance action taken on the memory system.
4
+
5
+ **Format:** `YYYY-MM-DD | actions taken | triggered by (human request / scheduled / agent-initiated)`
6
+
7
+ ---
8
+
9
+ <!-- Log maintenance actions here. This provides an audit trail for how the memory system has evolved. -->
@@ -0,0 +1,11 @@
1
+ # Reflection Index
2
+
3
+ Failure analysis log. One line per failed task that was analyzed. Each entry has a matching detailed reflection file.
4
+
5
+ **Format:** `YYYY-MM-DD | domain | slug | What went wrong (one line)`
6
+
7
+ ---
8
+
9
+ <!-- Reflections accumulate here when tasks fail and are analyzed during compaction.
10
+ These are Reflexion-style self-critiques: what went wrong, why, what to do differently.
11
+ Search this when debugging similar problems or when a domain keeps having issues. -->
@@ -0,0 +1,19 @@
1
+ # [Domain Name] — Known Failures
2
+
3
+ One-line per failure. Scan this during Phase 3 (Think Critically) to catch known problems before they happen.
4
+
5
+ **Format:** `date | slug | trigger-condition | one-line summary`
6
+
7
+ <!-- Add failures below. Each should have a matching detailed file: [slug].md
8
+
9
+ Example:
10
+ 2026-03-15 | jwt-token-expiry | task involves JWT + multiple sessions | Token expiry race condition when user has multiple tabs open
11
+
12
+ For each failure in this index, there should be a detailed file with:
13
+ - What was built
14
+ - What was undefined at spec time
15
+ - What broke
16
+ - Root cause
17
+ - What fixed it
18
+ - Detection condition (how to spot this in future tasks)
19
+ -->
@@ -0,0 +1,21 @@
1
+ # [Domain Name] — Patterns
2
+
3
+ Proven approaches for this domain. Each entry was quality-gated and verified.
4
+
5
+ ## How to Use This File
6
+ - Load this when a task touches this domain (see `protocols/workflow.md` Phase 2)
7
+ - Each pattern describes WHAT works, WHEN to use it, and WHY
8
+ - Patterns with dates are newer and more relevant
9
+ - Check `failures/_index.md` alongside this — knowing what breaks is as valuable as knowing what works
10
+
11
+ ---
12
+
13
+ <!-- Add patterns below. Format:
14
+
15
+ ### [Pattern Name]
16
+ **When:** [conditions when this pattern applies]
17
+ **What:** [the approach/technique/solution]
18
+ **Why:** [reasoning — why this works, what it prevents]
19
+ **Added:** YYYY-MM-DD | From: [originating task slug]
20
+
21
+ -->
@@ -0,0 +1,23 @@
1
+ # Cross-Domain Insights
2
+
3
+ Generalizable learnings that apply across multiple domains. Managed by vote count — high-vote insights are promoted to domain patterns, low-vote insights are pruned.
4
+
5
+ **Operations:** ADD (new insight) | UPVOTE (confirmed) | DOWNVOTE (contradicted) | PROMOTE (votes>5 → move to domain patterns) | REMOVE (votes<-2 after 10+ tasks)
6
+
7
+ **Format:**
8
+ ```
9
+ ### [Insight title]
10
+ - **Insight:** [the learning]
11
+ - **Domains:** [which domains this applies to]
12
+ - **Votes:** [number]
13
+ - **Added:** YYYY-MM-DD | **Last touched:** YYYY-MM-DD
14
+ - **Evidence:** [brief: what tasks confirmed or contradicted this]
15
+ ```
16
+
17
+ ---
18
+
19
+ <!-- No insights yet. They emerge from work.
20
+
21
+ As you complete tasks and follow the compaction protocol, insights will accumulate here.
22
+ High-vote insights get promoted to domain patterns. Low-vote insights get pruned.
23
+ This file is the system's learning frontier — where new knowledge is tested before becoming established. -->
@@ -0,0 +1,20 @@
1
+ # [Technology Name] — Stack Knowledge
2
+
3
+ What the agent should know when working with this technology in this project.
4
+
5
+ ## Project-Specific Setup
6
+ <!-- How this tech is configured in THIS project. Versions, config, conventions. -->
7
+
8
+ ## Patterns
9
+ <!-- Proven approaches specific to this technology. -->
10
+
11
+ ## Gotchas
12
+ <!-- Common pitfalls. Things that look right but aren't. -->
13
+
14
+ ## Conventions
15
+ <!-- Team/project conventions for this technology. Naming, structure, etc. -->
16
+
17
+ <!--
18
+ Added: YYYY-MM-DD
19
+ Keep this file under 200 lines. If it grows beyond that, split into sub-files.
20
+ -->
@@ -0,0 +1,101 @@
1
+ # Compaction Protocol
2
+
3
+ Run this after every completed task. Goal: capture what matters, discard noise, keep memory healthy. This is the learning loop — the mechanism that makes the system smarter over time.
4
+
5
+ Backed by: ExpeL (cross-task learning, +31% on ALFWorld), Reflexion (failure analysis, +22% on AlfWorld), SimpleMem (quality-gated writes, 26.4% improvement over Mem0).
6
+
7
+ ---
8
+
9
+ ## Step 1: Create Episode Summary
10
+
11
+ Create a new file: `history/episodes/YYYY-MM/[task-slug].md`
12
+
13
+ Use this format:
14
+ ```
15
+ # [Task Slug]
16
+ **Date:** YYYY-MM-DD
17
+ **Domain(s):** [domains touched]
18
+ **Outcome:** completed | failed | abandoned
19
+ **Summary:** [2-3 sentences: what was done, what was the result]
20
+ **Key insight:** [1 sentence: the most important thing learned, or "none"]
21
+ **Assumptions made:** [brief list, or "none"]
22
+ ```
23
+
24
+ Add a one-line entry to `history/episodes/_index.md`:
25
+ ```
26
+ YYYY-MM-DD | domain(s) | outcome | task-slug | One-line summary
27
+ ```
28
+
29
+ **Every task gets an episode.** Even trivial ones get a one-liner in the index.
30
+
31
+ ## Step 2: Quality Gate
32
+
33
+ Before writing ANYTHING to `knowledge/`, pass through `protocols/quality-gate.md`.
34
+
35
+ Ask three questions:
36
+ 1. **Is it new?** Not already captured in knowledge/.
37
+ 2. **Is it generalizable?** Applies beyond this specific task.
38
+ 3. **Was the outcome verified?** Tests passed, human confirmed, or logic holds.
39
+
40
+ If yes to all three → proceed to Step 3.
41
+ If uncertain → tag `[UNVERIFIED]` and proceed.
42
+ If no → stop here. The episode summary is enough.
43
+
44
+ ## Step 3: Extract Learnings
45
+
46
+ ### Path A: Task Completed Successfully
47
+
48
+ **Check insights:** Does this task confirm an existing insight in `knowledge/insights.md`?
49
+ - Yes → UPVOTE that insight (increment vote count)
50
+ - No → Is there a new generalizable learning? → ADD with `votes: 1`
51
+
52
+ **Check patterns:** Did you use an approach worth remembering?
53
+ - If new reusable pattern → append to `knowledge/domains/[domain]/patterns.md`
54
+ - Include: what the pattern is, when to use it, date, originating task
55
+
56
+ **Check for promotion:** Any insight in insights.md with votes > 5?
57
+ - Yes → move it to the relevant domain's patterns.md (it's proven enough)
58
+
59
+ ### Path B: Task Failed
60
+
61
+ **Write reflection** to `history/reflections/YYYY-MM-DD-[slug].md`:
62
+ ```
63
+ # Reflection: [Task Slug]
64
+ **Date:** YYYY-MM-DD
65
+ **What was attempted:** [brief description]
66
+ **What went wrong:** [what actually happened]
67
+ **Root cause:** [why it happened — not symptoms, the actual cause]
68
+ **What to do differently:** [concrete change for next time]
69
+ **Detection condition:** [how to spot this failure pattern in future tasks]
70
+ ```
71
+
72
+ Add entry to `history/reflections/_index.md`:
73
+ ```
74
+ YYYY-MM-DD | domain | slug | One-line: what went wrong
75
+ ```
76
+
77
+ **Update failure library:** Check `knowledge/domains/[domain]/failures/`:
78
+ - New failure pattern → create entry file, add to `_index.md`
79
+ - Known failure that wasn't caught → update its detection conditions
80
+ - Failure in `_index.md` format: `date | slug | trigger condition | one-line summary`
81
+
82
+ **Check insights:**
83
+ - Should have prevented this? → UPVOTE the relevant insight
84
+ - New insight from this failure? → ADD with `votes: 1`
85
+
86
+ ### Path C: Task Abandoned
87
+
88
+ Just log the episode (Step 1) with outcome "abandoned" and a note on why. No knowledge extraction. Abandoned tasks don't teach reliably — the outcome is unknown.
89
+
90
+ ## Step 4: Clear Workspace
91
+
92
+ After Steps 1-3 are complete:
93
+ - All valuable information is now in `history/` or `knowledge/`
94
+ - Delete all files from `workspace/`
95
+ - Workspace is now clean for the next task
96
+
97
+ ## Scaling
98
+
99
+ **Quick task:** Step 1 (1-line index entry only), Step 4. Skip Steps 2-3.
100
+ **Normal task:** Full Steps 1-4.
101
+ **Failed task:** Full Steps 1-4 with emphasis on Step 3 Path B.
@@ -0,0 +1,99 @@
1
+ # Maintenance Protocol
2
+
3
+ Memory degrades over time without maintenance. Stale insights mislead. Oversized files get partially ignored. Wrong patterns compound errors. This protocol catches those problems.
4
+
5
+ Run this when:
6
+ - The human asks for a memory health check
7
+ - You notice memory is getting large or stale during normal work
8
+ - It's been 2+ weeks since last maintenance (check `history/maintenance-log.md`)
9
+ - After a cluster of failed tasks (something might be wrong with knowledge/)
10
+
11
+ ---
12
+
13
+ ## Step 1: Size Check
14
+
15
+ Check every file against its size limit (from `protocols/memory-ops.md`):
16
+
17
+ | File | Max | Action if exceeded |
18
+ |------|-----|-------------------|
19
+ | BOOT.md | 150 lines | Must trim. This file's adherence matters most. |
20
+ | Each protocol file | 200 lines | Split or trim. |
21
+ | Each domain patterns.md | 200 lines | Archive older/less-used patterns to `_archive.md` |
22
+ | Each failure _index.md | 100 lines | Archive old entries |
23
+ | insights.md | 100 entries | Prune lowest-vote entries |
24
+ | episode _index.md | Unlimited | Archive entries older than 90 days to `_archive.md` |
25
+
26
+ Flag any oversize files with exact line counts.
27
+
28
+ ## Step 2: Stale Memory Check
29
+
30
+ - **Zero-vote insights** untouched for 30+ days → flag for review. Are they worth keeping?
31
+ - **[UNVERIFIED] entries** older than 14 days → ask human to verify or remove
32
+ - **Domain patterns** not referenced by any task in 60+ days → flag as potentially stale
33
+ - **Stack knowledge** for tech no longer in `config.md` → flag for archival
34
+
35
+ ## Step 3: Contradiction Check
36
+
37
+ This is the most important step. Bad memories compound.
38
+
39
+ - Did any recent task **fail** in a domain where `patterns.md` was loaded?
40
+ → The loaded pattern might have been wrong. Cross-reference the failure with the pattern.
41
+ - Are there insights with **negative votes**?
42
+ → List them with vote counts. Recommend removal for votes < -2.
43
+ - Are there **contradictory entries** — two patterns that give conflicting advice?
44
+ → Flag both with the contradiction. Human decides which to keep.
45
+ - Did the agent **ignore a failure pattern** that turned out to be relevant?
46
+ → The failure's detection conditions need updating.
47
+
48
+ ## Step 4: Growth Review
49
+
50
+ - How many episodes were created since last maintenance?
51
+ - How many new knowledge entries were written?
52
+ - How many insights were added vs promoted vs removed?
53
+ - Is the system learning? (Are insights getting upvoted? Are failures being caught?)
54
+ - Is the system degrading? (Increasing failure rate? Patterns not helping?)
55
+
56
+ ## Step 5: Produce Report
57
+
58
+ Create a report for the human. Don't act on it — present it.
59
+
60
+ ```markdown
61
+ ## Memory Health Report — YYYY-MM-DD
62
+
63
+ ### Overall Status: [Healthy | Needs Attention | Issues Found]
64
+
65
+ ### Size Audit
66
+ - [File]: [current] / [max] lines — [OK | OVER — recommend: trim/split/archive]
67
+
68
+ ### Stale Entries (need human decision)
69
+ - [Entry description] — last relevant: [date] — recommend: [keep/remove/verify]
70
+
71
+ ### Suspicious Patterns (possible memory poisoning)
72
+ - [Pattern] in [domain] — evidence: [what went wrong] — recommend: [review/remove/update]
73
+
74
+ ### Contradictions Found
75
+ - [Pattern A] vs [Pattern B] — recommend: [human decides]
76
+
77
+ ### Growth Summary
78
+ - Episodes: [N] new since last maintenance
79
+ - Knowledge writes: [N] (passed gate: [N], tagged unverified: [N])
80
+ - Insights: [N] added, [N] upvoted, [N] downvoted, [N] promoted, [N] removed
81
+
82
+ ### Recommendations
83
+ 1. [Specific actionable recommendation]
84
+ 2. [...]
85
+ ```
86
+
87
+ ## Step 6: Execute Approved Changes
88
+
89
+ After the human reviews the report:
90
+ - Execute only the changes they approve
91
+ - Log what was done to `history/maintenance-log.md`:
92
+ ```
93
+ YYYY-MM-DD | Actions: [what was done] | Triggered by: [human request / scheduled]
94
+ ```
95
+
96
+ ## Critical Rule
97
+
98
+ You NEVER autonomously delete or modify `knowledge/` during maintenance.
99
+ You analyze. You report. You recommend. You wait. The human decides.
@@ -0,0 +1,89 @@
1
+ # Memory Operations Protocol
2
+
3
+ How to read from and write to each memory tier. The goal: load the right context at the right time, write only what deserves to persist.
4
+
5
+ ---
6
+
7
+ ## Reading Memory
8
+
9
+ ### Hot Tier — Always Loaded
10
+ These load at session start. No decision needed.
11
+ - `BOOT.md` — operating instructions
12
+ - `config.md` — project context
13
+ - `protocols/*` — operating procedures (loaded as needed during work)
14
+
15
+ ### Warm Tier — Loaded by Relevance
16
+ Load these based on what the current task needs.
17
+ - `knowledge/domains/[domain]/patterns.md` — when task touches that domain
18
+ - `knowledge/domains/[domain]/failures/_index.md` — scan before implementation
19
+ - `knowledge/stack/[tech].md` — when task involves that technology
20
+ - `knowledge/insights.md` — scan for applicable cross-domain learnings
21
+
22
+ **How to decide what to load:**
23
+ 1. Identify the task's domain(s) from the description
24
+ 2. Load matching domain patterns + failure indexes
25
+ 3. Load relevant stack knowledge
26
+ 4. Scan insights.md for entries tagged with matching domains
27
+ 5. Write what you loaded and why to `workspace/context.md`
28
+
29
+ **Don't overload context.** If you match 5+ domains, prioritize the 2-3 most relevant. Loading too much degrades performance. Research shows context rot is real — more isn't better.
30
+
31
+ ### Cold Tier — Searched on Demand
32
+ Only access when you specifically need historical context.
33
+ - `history/episodes/_index.md` — search for related past work
34
+ - `history/episodes/YYYY-MM/[slug].md` — read specific episode details
35
+ - `history/reflections/_index.md` — search for relevant failure analysis
36
+ - `knowledge/domains/[domain]/failures/[slug].md` — detailed failure context
37
+
38
+ ---
39
+
40
+ ## Writing Memory
41
+
42
+ ### workspace/ (Working Memory)
43
+ - **When:** During any active task
44
+ - **Rules:** Write freely. This is scratch space. Cleared after compaction.
45
+ - **Files:** task.md, context.md, questions.md, assumptions.md, decisions.md, progress.md
46
+ - **No gate required.** This is ephemeral.
47
+
48
+ ### history/episodes/ (Episodic Memory)
49
+ - **When:** During compaction (protocols/compaction.md) only
50
+ - **Rules:** Append-only. Never edit or delete existing episodes.
51
+ - **Format:** Add entry to `_index.md`, create episode file in `YYYY-MM/`
52
+ - **No gate required.** Every completed task gets an episode. But keep summaries concise (5-10 lines).
53
+
54
+ ### history/reflections/ (Failure Analysis)
55
+ - **When:** During compaction, only when a task failed
56
+ - **Rules:** Append-only. Follow the reflection format in compaction.md.
57
+ - **Format:** Add entry to `_index.md`, create reflection file
58
+
59
+ ### knowledge/ (Semantic Memory)
60
+ - **When:** During compaction, and ONLY after passing quality gate
61
+ - **Rules:** MUST pass `protocols/quality-gate.md` before writing
62
+ - **Prefer updates over creation.** If a pattern already exists, update it rather than creating a new entry.
63
+ - **Include provenance.** Every entry should note the date and originating task.
64
+ - **Tag uncertainty.** If outcome wasn't verified, tag `[UNVERIFIED]`.
65
+
66
+ ### knowledge/insights.md (Cross-Domain Learnings)
67
+ - **Operations:**
68
+ - `ADD` — new generalizable learning, set `votes: 1`
69
+ - `UPVOTE` — task confirms existing insight, `votes + 1`
70
+ - `DOWNVOTE` — task contradicts existing insight, `votes - 1`
71
+ - `PROMOTE` — insight with `votes > 5` moves to relevant domain's patterns.md
72
+ - `REMOVE` — insight with `votes < -2` after appearing in 10+ tasks
73
+
74
+ ---
75
+
76
+ ## File Size Limits
77
+
78
+ Keep these in check. Oversized files degrade agent performance.
79
+
80
+ | File | Max Size | Action if exceeded |
81
+ |------|----------|-------------------|
82
+ | BOOT.md | 150 lines | Trim or split into referenced files |
83
+ | Each protocol file | 200 lines | Split into sub-protocols |
84
+ | Each domain patterns.md | 200 lines | Archive older patterns, keep most relevant |
85
+ | Each failure _index.md | 100 lines | Archive old entries to _archive.md |
86
+ | insights.md | 100 entries | Prune low-vote entries, promote high-vote ones |
87
+ | Episode _index.md | Unlimited | But archive entries older than 90 days |
88
+
89
+ These limits come from research: files under 200 lines achieve >92% instruction adherence. Beyond that, agents start skipping content.
@@ -0,0 +1,66 @@
1
+ # Quality Gate Protocol
2
+
3
+ Not everything deserves to be remembered. Bad memories poison the system. Research confirms: agents using naive "remember everything" strategies show sustained performance decline after an initial improvement. This gate prevents that.
4
+
5
+ Inspired by: SimpleMem (quality-gated writes, 26.4% improvement over Mem0), Xiong et al. (self-degradation in long-running agents).
6
+
7
+ ---
8
+
9
+ ## The Three Questions
10
+
11
+ Before writing anything to `knowledge/`, answer these:
12
+
13
+ ### 1. Is it new?
14
+ - Does this information already exist in `knowledge/`?
15
+ - If it's a **duplicate** → don't write. The existing entry is enough.
16
+ - If it's a **correction** → edit the existing entry. Note the date and why it changed.
17
+ - If it's an **extension** → update the existing entry with the new information.
18
+
19
+ ### 2. Is it generalizable?
20
+ - Will this apply to future tasks beyond this specific one?
21
+ - **Generalizable:** "Always validate JWT expiry with a clock skew buffer" — applies to any JWT implementation
22
+ - **Not generalizable:** "The user table has a column called `display_name`" — specific to this project
23
+ - Project-specific facts belong in `config.md` or episode summaries, not in `knowledge/`
24
+
25
+ ### 3. Was the outcome verified?
26
+ - Did the approach actually work? Evidence:
27
+ - Tests passed
28
+ - Human confirmed the result
29
+ - The logic holds under scrutiny
30
+ - The approach was used successfully in production
31
+ - **Unverified outcomes** (agent finished, but no confirmation) should be tagged `[UNVERIFIED]`
32
+
33
+ ## The Decision
34
+
35
+ | Question 1 (New?) | Question 2 (General?) | Question 3 (Verified?) | Action |
36
+ |---|---|---|---|
37
+ | Yes | Yes | Yes | Write to knowledge/ |
38
+ | Yes | Yes | Uncertain | Write with `[UNVERIFIED]` tag |
39
+ | Yes | No | Any | Don't write. Episode summary is enough. |
40
+ | No (duplicate) | Any | Any | Don't write. |
41
+ | No (correction) | Yes | Yes | Edit existing entry. |
42
+ | No (extension) | Yes | Yes | Update existing entry. |
43
+
44
+ ## Memory Poisoning Prevention
45
+
46
+ The biggest risk to this system is a wrong entry in `knowledge/`. A bad pattern will be loaded and applied to every future task in that domain. Defenses:
47
+
48
+ 1. **Provenance:** Every entry in `knowledge/` includes the date and originating task. So you can trace where a questionable pattern came from.
49
+
50
+ 2. **Uncertainty tagging:** If you're not sure, tag `[UNVERIFIED]`. The maintenance protocol reviews these.
51
+
52
+ 3. **Contradiction detection:** If a task fails and the failure matches a pattern you loaded from `knowledge/`, that pattern might be wrong. Flag it immediately — don't wait for maintenance.
53
+
54
+ 4. **Vote decay:** Insights in `insights.md` with negative votes after multiple tasks are likely wrong. Remove at `votes < -2` after 10+ task appearances.
55
+
56
+ 5. **Human review:** During maintenance (`protocols/maintenance.md`), surface all `[UNVERIFIED]` entries and suspicious patterns for human decision.
57
+
58
+ ## Insight Voting Rules
59
+
60
+ `knowledge/insights.md` uses a vote system to surface what's true and prune what's not:
61
+
62
+ - **ADD:** New generalizable learning. Set `votes: 1`. Tag with relevant domain(s).
63
+ - **UPVOTE:** A subsequent task confirms this insight. `votes + 1`.
64
+ - **DOWNVOTE:** A subsequent task contradicts this insight. `votes - 1`.
65
+ - **PROMOTE:** `votes > 5` → move to the relevant domain's `patterns.md`. It's proven.
66
+ - **REMOVE:** `votes < -2` after the insight has existed for 10+ tasks. It's wrong or useless.
@@ -0,0 +1,81 @@
1
+ # Workflow Protocol
2
+
3
+ How you approach every task. Not a rigid pipeline — a thinking process. Scale depth to task size. A 5-minute fix doesn't need a full spec. A multi-day feature does.
4
+
5
+ ---
6
+
7
+ ## Phase 1: Understand
8
+
9
+ Before anything else, understand what's really being asked.
10
+
11
+ - What is the actual goal? (Not just what was said — what is the human trying to achieve?)
12
+ - What is the scope? (What's included? What's explicitly not?)
13
+ - Is this a new task, or continuation of something in `workspace/`?
14
+
15
+ Write your understanding to `workspace/task.md`. Keep it to 5-15 lines. If you can't summarize it concisely, you don't understand it yet.
16
+
17
+ ## Phase 2: Load Context
18
+
19
+ Check what you already know that's relevant.
20
+
21
+ 1. **Domain knowledge**: Scan `knowledge/domains/` — does this task touch a known domain?
22
+ - If yes: read that domain's `patterns.md` and `failures/_index.md`
23
+ 2. **Stack knowledge**: Scan `knowledge/stack/` — does the task involve a known technology?
24
+ - If yes: read the relevant stack file
25
+ 3. **Cross-domain insights**: Check `knowledge/insights.md` for applicable learnings
26
+ 4. **Past work**: Search `history/episodes/_index.md` — done something similar before?
27
+ - If yes: read that episode for context
28
+
29
+ Don't load everything. Load what's relevant. Write what you loaded and why to `workspace/context.md`. This audit trail matters for maintenance.
30
+
31
+ ## Phase 3: Think Critically
32
+
33
+ This is where the real value is. Before doing any work:
34
+
35
+ **Check failures:** For each matched domain, scan the failures index. Does this task have conditions that match a known failure pattern? If yes, explicitly address it in your approach.
36
+
37
+ **Identify unknowns:**
38
+ - **BLOCKING unknowns** — you cannot proceed without the answer
39
+ - Write to `workspace/questions.md`
40
+ - HALT. Surface these to the human. Do not continue until resolved.
41
+ - **Assumable unknowns** — reasonable defaults exist
42
+ - Write to `workspace/assumptions.md` with the default you're choosing and why
43
+
44
+ **Check edges:** What's the simplest thing that could go wrong?
45
+ - What if the input is empty, null, huge, or malformed?
46
+ - What if this runs concurrently? What about race conditions?
47
+ - What does this interact with that could break?
48
+ - What's the failure mode? How does the human know something went wrong?
49
+
50
+ ## Phase 4: Work
51
+
52
+ Now do the actual work. As you work:
53
+
54
+ - Write key decisions to `workspace/decisions.md` with your reasoning
55
+ - If you discover new unknowns mid-work, go back to Phase 3
56
+ - If something breaks that matches a known failure pattern, note it — your knowledge was correct
57
+ - If something breaks that's NOT in your failure library, note it — this is learning fuel for Phase 5
58
+
59
+ ## Phase 5: Capture
60
+
61
+ After the task is done (or failed), follow `protocols/compaction.md` to:
62
+
63
+ 1. Summarize what happened → episode log
64
+ 2. Extract any insights worth remembering → quality gate
65
+ 3. If failed: write a reflection (what went wrong, why, what to do differently)
66
+ 4. Clear `workspace/`
67
+
68
+ ---
69
+
70
+ ## Scaling to Task Size
71
+
72
+ **Quick task (< 30 min):**
73
+ Phase 1 (2-3 lines). Phase 2 (quick scan). Phase 3 (mental check only). Phase 4. Phase 5 (1-line episode entry).
74
+
75
+ **Medium task (1-4 hours):**
76
+ Full Phases 1-5. Write task.md before coding. Check failures properly.
77
+
78
+ **Large task (multi-day):**
79
+ Full Phases 1-5 with deep Phase 3. Break into sub-tasks. Multiple episode entries. Maintain workspace/ across sessions.
80
+
81
+ The protocol adapts to the work. The principle doesn't change: understand before you build, load what you know, think critically, capture what you learn.
File without changes