PyPI - anamne - Versions diffs - 0.3.0__tar.gz - Mend

anamne 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

anamne-0.3.0/.env.example +16 -0
anamne-0.3.0/.gitattributes +20 -0
anamne-0.3.0/.github/workflows/ci.yml +39 -0
anamne-0.3.0/.github/workflows/publish.yml +40 -0
anamne-0.3.0/.gitignore +31 -0
anamne-0.3.0/BLOG_POST.md +249 -0
anamne-0.3.0/CHANGELOG.md +122 -0
anamne-0.3.0/LICENSE +21 -0
anamne-0.3.0/PKG-INFO +337 -0
anamne-0.3.0/README.md +303 -0
anamne-0.3.0/ROADMAP.md +122 -0
anamne-0.3.0/anamne/__init__.py +3 -0
anamne-0.3.0/anamne/agents/__init__.py +0 -0
anamne-0.3.0/anamne/agents/historian.py +270 -0
anamne-0.3.0/anamne/agents/oracle.py +341 -0
anamne-0.3.0/anamne/cli/__init__.py +0 -0
anamne-0.3.0/anamne/cli/main.py +1263 -0
anamne-0.3.0/anamne/config.py +64 -0
anamne-0.3.0/anamne/llm.py +128 -0
anamne-0.3.0/anamne/mcp/__init__.py +0 -0
anamne-0.3.0/anamne/mcp/server.py +248 -0
anamne-0.3.0/anamne/models.py +59 -0
anamne-0.3.0/anamne/store/__init__.py +0 -0
anamne-0.3.0/anamne/store/graph.py +629 -0
anamne-0.3.0/pyproject.toml +56 -0
anamne-0.3.0/scripts/create_test_repo.py +124 -0
anamne-0.3.0/scripts/generate_doc.js +361 -0
anamne-0.3.0/scripts/test_cluster.py +24 -0
anamne-0.3.0/tests/__init__.py +0 -0
anamne-0.3.0/tests/test_clustering.py +76 -0
anamne-0.3.0/tests/test_models.py +81 -0
anamne-0.3.0/tests/test_store.py +384 -0

anamne-0.3.0/.env.example ADDED Viewed

@@ -0,0 +1,16 @@
+# PROVENANCE — Environment Configuration
+# Copy this file to .env, then run: provenance init
+# Pick ONE of these (in priority order — first non-empty wins):
+# Best quality, paid (~$0.003/commit). Get a key at https://platform.anthropic.com
+# ANTHROPIC_API_KEY=sk-ant-...
+# Free tier, good quality. Get a key at https://aistudio.google.com/apikey
+# GEMINI_API_KEY=...
+# Or use a local Ollama model (free, offline, ~4GB disk, lower quality)
+# MODEL=ollama/llama3.2
+# Optional: where to store the knowledge base (default: ~/.provenance)
+# DATA_DIR=C:/Users/YourName/.provenance

anamne-0.3.0/.gitattributes ADDED Viewed

@@ -0,0 +1,20 @@
+# Consistent line endings across all platforms
+* text=auto
+# Source files always LF
+*.py   text eol=lf
+*.md   text eol=lf
+*.toml text eol=lf
+*.yml  text eol=lf
+*.yaml text eol=lf
+*.txt  text eol=lf
+*.json text eol=lf
+*.sh   text eol=lf
+# Binary files — never touch line endings
+*.db    binary
+*.sqlite binary
+*.png   binary
+*.jpg   binary
+*.docx  binary
+*.pdf   binary

anamne-0.3.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,39 @@
+name: CI
+on:
+  push:
+    branches: [master, main]
+  pull_request:
+    branches: [master, main]
+jobs:
+  test:
+    name: Tests (Python ${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Cache pip
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/pip
+          key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }}
+          restore-keys: |
+            ${{ runner.os }}-pip-
+      - name: Install package and dev dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+      - name: Run tests
+        run: python -m pytest tests/ -v

anamne-0.3.0/.github/workflows/publish.yml ADDED Viewed

@@ -0,0 +1,40 @@
+name: Publish to PyPI
+# Triggers when you push a version tag (e.g. v0.3.0)
+on:
+  push:
+    tags:
+      - "v[0-9]+.[0-9]+.[0-9]+"
+jobs:
+  build-and-publish:
+    name: Build and publish to PyPI
+    runs-on: ubuntu-latest
+    environment: pypi  # requires a "pypi" environment configured in repo settings
+    permissions:
+      id-token: write  # required for Trusted Publishing (no PyPI token needed)
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install build tools
+        run: |
+          python -m pip install --upgrade pip
+          pip install build
+      - name: Build distributions
+        run: python -m build
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        # Uses PyPI Trusted Publishing — no token needed, configure here:
+        # https://pypi.org/manage/account/publishing/
+        # Repository: venumittapalli576/anamne
+        # Workflow:   publish.yml
+        # Environment: pypi

anamne-0.3.0/.gitignore ADDED Viewed

@@ -0,0 +1,31 @@
+# Environment — NEVER commit this
+.env
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+.venv/
+venv/
+# ANAMNE data — stays local
+.anamne/
+# IDE
+.vscode/
+.idea/
+*.swp
+# OS
+.DS_Store
+Thumbs.db
+# Generated artifacts
+ANAMNE_*.docx
+PROVENANCE_*.docx
+scripts/node_modules/
+scripts/package-lock.json
+scripts/package.json
+test-repo/

anamne-0.3.0/BLOG_POST.md ADDED Viewed

@@ -0,0 +1,249 @@
+# I Built a Tool, Found a Competitor, Read Two Research Papers, and Pivoted in 48 Hours
+*A software engineering war story about rapid iteration, research literacy, and the right time to stop.*
+---
+## The Original Idea
+Every codebase has a graveyard of decisions. Why did we switch from MySQL to Postgres? Why does the
+payment service live in a separate repo? Why is there a Redis cluster when the database handles
+sessions fine?
+These answers live in Slack threads, ancient commit messages, people's heads. They're lost as soon
+as the team grows or memory fades. I wanted to fix that.
+The idea: index git history with an LLM, extract "why" decisions from commit messages and ADRs,
+store them in a searchable graph, and surface them via an MCP server directly in your editor.
+I called it **ANAMNE** and built it in a weekend.
+```bash
+anamne index ./my-repo
+anamne ask "why was Redis added to this codebase?"
+```
+It worked. Claude answered with citations from actual commit messages. The demo was clean.
+---
+## The Competitor Problem
+Then I did what you should always do before claiming novelty: searched for existing tools.
+**Repowise** does exactly what I built. So does **GitMind**. Multiple well-funded products with
+better UI, more integrations, and a head start.
+The first reaction: rebuild something else. Start over. Find a gap that doesn't exist yet.
+I nearly did. I spent hours searching for "the next idea" — AI memory tools, code documentation
+generators, context compression systems. Every category had competition. Some had more than one.
+The actual lesson wasn't "the idea is wrong." It was: **this is 2026. Every obvious idea has
+competition.** Starting over doesn't solve that.
+---
+## The Research Pivot
+Instead of rebuilding, I read papers.
+Three caught my eye:
+**LIGHT** (arXiv 2510.27246) — a 2026 paper proposing a three-layer memory architecture for AI
+agents. The analogy to human memory was explicit:
+- *Episodic memory* (hippocampal long-term store): full records of past events
+- *Scratchpad* (semantic memory): distilled facts and truths
+- *Working memory* (prefrontal cortex): what you're holding in your head right now
+The paper showed that combining all three layers with explicit conflict resolution produced
+significantly better recall than single-store approaches.
+**Agent Cognitive Compressor** — "bounded compressed state": as an AI's memory grows, you can't
+fit all of it in the context window. The solution is hierarchical: keep the top-K items verbatim,
+compress the lower-priority tail into a compact summary. This bounds the prompt size regardless of
+how much history you've stored.
+**ACT-R memory architecture** — cognitive science model of how humans retrieve memories. Key
+insight: retrieval probability isn't just about relevance — it's modulated by recency and frequency
+of use. Items used more recently and more often have higher "activation" and are more likely to be
+retrieved.
+Reading these, I realized: **my architecture was already halfway there.** I had ChromaDB for
+semantic search (episodic), SQLite for structured storage, and an LLM for synthesis. The core
+pieces were right. I just needed to implement the full three-layer design and ground the
+abstractions in the actual papers.
+---
+## What Changed
+Over two days I refactored ANAMNE from "git WHY tool" to "personal memory layer":
+**Layer 1 — Episodic memory** (already existed, renamed/clarified):
+- ChromaDB semantic search over all past decisions
+- SQLite for bi-temporal storage (created_at, valid_until)
+- Indexed from git history, ADR files
+**Layer 2 — Scratchpad** (new):
+- Explicit `remember()` API for durable facts
+- LLM-based distillation: paste a wall of text, get N atomic facts extracted
+- ACT-R activation tracking: `last_used`, `use_count` updated on every recall
+- `forget()` for explicit deletion
+- `consolidate()` to merge redundant facts (analogous to sleep-phase consolidation)
+**Layer 3 — Working memory** (new):
+- Short-lived session notes with TTL expiry
+- "Currently debugging the auth middleware" — gone in an hour without action
+- No LLM call needed, pure recency-weighted retrieval
+**Oracle agent** (refactored):
+- Queries all three layers simultaneously
+- Formats with explicit citations: `[episodic #3]`, `[fact #a2b1]`, `[working]`
+- ACC-style bounded context: top 3 verbatim + tail compressed into a summary
+- Layer conflict resolution (scratchpad beats working beats episodic)
+**New capture paths** (Phase 2):
+- `anamne journal` — timestamped entry, one command, no ceremony
+- `anamne import-chat` — point at an exported Claude/ChatGPT JSON, extract durable facts
+---
+## The Design That Emerged
+The thing I realized while implementing this: **the problem I'm solving is different from Repowise's.**
+Repowise solves "why was this code written this way?" for teams. It's a knowledge management tool
+for codebases.
+ANAMNE solves "what do I know about everything I've worked on?" for individuals. It's a
+personal memory layer that works across all your AI tools, all your projects, your preferences,
+your constraints, your history.
+Those are different markets. One is team-facing (requires enterprise sales, permission models,
+onboarding). The other is individual-facing (one-command install, local-first, bring your own key).
+The pivot didn't require rebuilding anything. It required reframing the problem.
+---
+## What the Code Looks Like
+The Oracle agent — the core of the recall system — ends up surprisingly clean:
+```python
+def ask(self, question: str, ...) -> str:
+    # Pull from all three layers
+    episodic = self._store.search(question, n_results=8)
+    facts = self._store.search_facts(question, limit=8)
+    working = self._store.working_active()[:10]
+    # ACT-R: update activation on retrieved facts
+    if facts:
+        self._store.touch_facts([f["id"] for f in facts])
+    # ACC: top-3 verbatim, tail compressed
+    verbatim = episodic[:3]
+    tail = episodic[3:]
+    compressed = self._compress_tail(tail, question) if tail else None
+    # Format with citations, send to LLM
+    prompt = _ORACLE_PROMPT.format(
+        working=self._format_working(working),
+        facts=self._format_facts(facts),
+        decisions=self._format_decisions(verbatim),
+        compressed_section=f"BACKGROUND: {compressed}\n\n" if compressed else "",
+        question=question,
+    )
+    return self._llm.complete(prompt, max_tokens=2048).text
+```
+Every claim in the answer is cited back to its layer and entry. The LLM is instructed to surface
+staleness warnings and call out conflicts between layers.
+The consolidation step — analogous to sleep-phase memory consolidation in neuroscience — clusters
+scratchpad facts by keyword overlap and merges each cluster via LLM:
+```bash
+$ anamne consolidate --dry-run
+Merge 1:
+  - I prefer Python for backend services
+  - I prefer Python over Go for scripting
+  -> Prefer Python over Go for all backend and scripting work
+Merge 2:
+  - Database uses PostgreSQL
+  - Switched from MySQL to Postgres in 2024 for better JSON support
+  -> Using PostgreSQL (migrated from MySQL in 2024 for native JSON support)
+```
+---
+## What Surprised Me
+**Reading papers was faster than searching for ideas.** The four papers I read took maybe three
+hours total. They gave me a clear design vocabulary (episodic, semantic, working memory), specific
+algorithms (ACT-R activation formula, ACC compression), and actual citations I can put in the README.
+That's worth more than any "find a gap" exercise.
+**The competitive landscape is a feature, not a bug.** Yes, Mem0 and Supermemory exist. They're
+building cloud SDKs for developers. I'm building a local-first tool for individual AI users. The
+fact that there's a funded market means people want this — I'm just serving a different slice of it.
+**"Brain-inspired" is a useful metaphor, not a liability.** I was initially worried it would sound
+like marketing fluff. But grounding it in actual papers — LIGHT, ACT-R, hippocampal indexing
+theory — makes it defensible. When someone asks "why three layers?", I have a real answer.
+---
+## The Honest Assessment
+ANAMNE is a personal portfolio project. It won't replace Mem0. It won't scale to teams.
+What it is:
+- A working CLI demo with cited recall across three memory layers
+- A real MCP server (tested with Claude Code and Cursor)
+- A brain-inspired memory architecture grounded in actual 2026 research papers
+- An honest README that doesn't overclaim
+What it demonstrates:
+- The ability to read research papers and translate them into working code
+- Rapid iteration under uncertainty (pivot in 48 hours, don't rebuild)
+- Engineering judgment (scope-appropriate design, no overengineering)
+The "build a thing, find a competitor, read papers, pivot" story is more interesting to a
+hiring manager than "I built X, here is the feature list."
+---
+## What's Next
+Phase 2 is already started:
+- `anamne import-chat` — import exported Claude/ChatGPT conversations, extract facts
+- `anamne journal` — one-command timestamped notes, no ceremony
+The interesting open question is Phase 3: **ACT-R decay**. Right now "activation" is tracked
+(last_used, use_count) but there's no actual decay formula. The real ACT-R formula is:
+```
+activation = ln(Σ t_i^(-d)) + noise
+```
+where `t_i` is the recency of each retrieval and `d` is the decay parameter. Implementing that
+would make the "brain-inspired" claim genuinely precise rather than loosely metaphorical.
+---
+## Code
+[github.com/venumittapalli576/anamne](https://github.com/venumittapalli576/anamne)
+MIT license. One-command install. Bring your own key. Zero telemetry.
+```bash
+pip install anamne
+anamne init
+```

anamne-0.3.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,122 @@
+# Changelog
+All notable changes to ANAMNE are documented here.
+---
+## [0.3.0] — 2026-05-10
+### Added — Phase 3 memory upgrades
+**Semantic scratchpad search**
+- Facts are now embedded into a dedicated ChromaDB `scratchpad` collection on write
+- `search_facts_semantic()` — embedding-based search; finds conceptually related facts
+  even when exact keywords don't match (e.g. "database" finds "PostgreSQL" facts)
+- `search_facts_ranked()` now merges substring + semantic candidates, deduplicates,
+  then re-ranks by ACT-R activation — best of both retrieval strategies
+- One-time migration: existing facts are back-filled into ChromaDB on first startup
+- `forget_fact()` now also deletes from ChromaDB scratchpad collection
+**Incremental indexing**
+- New `indexed_commits` SQL table tracks which commits have already been processed
+- `is_commit_indexed()`, `mark_commit_indexed()`, `indexed_commit_count()` on store
+- `HistorianAgent.index_repo(incremental=True)` skips already-indexed commits
+- New CLI command: `anamne sync <repo>` — re-indexes only new commits,
+  saving API calls when you run it after `git pull` or `git commit`
+**Auto-consolidation daemon**
+- New CLI command: `anamne watch` — runs `consolidate` on a configurable schedule
+  (default: every 3600s). Background memory maintenance, analog of sleep-phase
+  consolidation in cognitive science. Press Ctrl+C to stop.
+**New CLI commands** (search, export, capture-clipboard added in this release too)
+- `anamne search <query>` — direct ACT-R-ranked scratchpad search, no API key needed
+- `anamne export` — dump all memories to JSON or Markdown for backup/migration
+- `anamne capture-clipboard` — read clipboard and save as scratchpad fact
+### Fixed
+- `status` command: removed dead `ollama` reference in API key check
+- `recall` and MCP `search_facts`: upgraded to `search_facts_ranked()` (was unranked)
+### Tests
+- 41 tests total (was 31 → 34 → 41), all passing
+- New: semantic search (3), incremental indexing (4), list_all_decisions (3)
+---
+## [0.2.0] — 2026-05-10
+### Changed (breaking)
+- **Renamed**: project `provenance` → `anamne` (CLI command, package, data dir `~/.anamne`)
+- **Removed**: Ollama support (local models too weak for structured JSON extraction tasks)
+- **Removed**: dead FastAPI/uvicorn/jinja2 dependencies
+### Added — Memory architecture (LIGHT + ACC frameworks)
+- **Three-layer memory** following the LIGHT framework (arXiv 2510.27246):
+  - *Episodic* — long-term decisions from git history (ChromaDB semantic search)
+  - *Scratchpad* — durable user-stated facts (SQLite, full-text search)
+  - *Working memory* — short-lived session context with TTL auto-expiry
+- **ACT-R real decay formula**: `A_i = ln(Σ t_j^-d)` where `t_j` = seconds since retrieval `j`
+  - New `retrieval_log` SQL table — every fact access is timestamped
+  - `activation_score()` — computes true ACT-R base-level activation
+  - `search_facts_ranked()` — re-ranks search results by ACT-R activation
+- **ACC bounded context compression**: top-3 episodic results verbatim, tail LLM-compressed
+- **Fact consolidation** (`anamne consolidate`): Jaccard keyword clustering + LLM merge
+- **Layer-conflict priority**: scratchpad > working > episodic (per LIGHT design)
+- **Staleness flags**: episodic items with `valid_until` in the past show `[POTENTIALLY STALE]`
+### Added — CLI commands
+| Command | Description |
+|---|---|
+| `anamne journal` | Timestamped scratchpad entry with auto `journal` tag |
+| `anamne import-chat` | Extract durable facts from exported Claude / ChatGPT JSON |
+| `anamne consolidate` | Merge redundant facts (Jaccard overlap + LLM) |
+| `anamne search` | Direct scratchpad search, ACT-R ranked, no API key needed |
+| `anamne export` | Backup all memories to JSON or Markdown |
+| `anamne capture-clipboard` | Save clipboard text as a scratchpad fact |
+| `anamne recall` | Cross-layer recall (upgraded to use ACT-R ranking) |
+### Added — MCP tools (11 total)
+| Tool | Layer |
+|---|---|
+| `ask_why` | Cross-layer (Oracle) |
+| `search_decisions` | Episodic |
+| `get_file_context` | Episodic |
+| `get_stats` | All |
+| `remember` | Scratchpad |
+| `list_facts` | Scratchpad |
+| `forget_fact` | Scratchpad |
+| `search_facts` | Scratchpad (ACT-R ranked) |
+| `consolidate_facts` | Scratchpad |
+| `working_memory_add` | Working |
+| `working_memory_active` | Working |
+### Added — Infrastructure
+- **Test suite** (31 tests, 100% pass):
+  - `tests/test_models.py` — Decision model, staleness, serialisation
+  - `tests/test_store.py` — all three memory layers + ACT-R activation
+  - `tests/test_clustering.py` — `_cluster_by_overlap` threshold behaviour
+- **GitHub Actions CI** (`.github/workflows/ci.yml`) — runs on every push/PR
+- **pyproject.toml**: authors, keywords, classifiers, project URLs, dev extras
+- **LICENSE**: MIT, copyright Venu Mittapalli
+- **BLOG_POST.md**: origin story and technical deep-dive
+### Fixed
+- `status` command: removed dead `ollama` reference in API key check
+- `recall` command: now uses `search_facts_ranked()` (was unranked)
+- MCP `search_facts` tool: now uses `search_facts_ranked()` (was unranked)
+- README, ROADMAP: all `provenance` references updated to `anamne`
+---
+## [0.1.0] — 2026-04 (internal)
+First working version under the name `provenance`.
+- `Decision` data model with bi-temporal fields
+- SQLite + ChromaDB dual store
+- Historian Agent — git history extraction via LLM
+- Oracle Agent — recall with citations
+- FastMCP server with 4 tools
+- CLI: `init`, `index`, `ask`, `status`, `mcp-server`
+- Claude + Gemini multi-model LLM client

anamne-0.3.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Venu Mittapalli
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.