PyPI - agentic-codememory - Versions diffs - 0.1.5__tar.gz - Mend

agentic-codememory 0.1.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

agentic_codememory-0.1.5/.claude/handoffs/2026-03-20-new-project-init.md +190 -0
agentic_codememory-0.1.5/.claude/settings.local.json +11 -0
agentic_codememory-0.1.5/.env.example +51 -0
agentic_codememory-0.1.5/.github/workflows/ci.yml +110 -0
agentic_codememory-0.1.5/.gitignore +12 -0
agentic_codememory-0.1.5/.planning/PROJECT.md +118 -0
agentic_codememory-0.1.5/.planning/ROADMAP.md +188 -0
agentic_codememory-0.1.5/.planning/STATE.md +60 -0
agentic_codememory-0.1.5/.planning/codebase/ARCHITECTURE.md +261 -0
agentic_codememory-0.1.5/.planning/codebase/CONCERNS.md +245 -0
agentic_codememory-0.1.5/.planning/codebase/CONVENTIONS.md +225 -0
agentic_codememory-0.1.5/.planning/codebase/INTEGRATIONS.md +160 -0
agentic_codememory-0.1.5/.planning/codebase/STACK.md +135 -0
agentic_codememory-0.1.5/.planning/codebase/STRUCTURE.md +200 -0
agentic_codememory-0.1.5/.planning/codebase/TESTING.md +360 -0
agentic_codememory-0.1.5/.planning/config.json +13 -0
agentic_codememory-0.1.5/.planning/research/ARCHITECTURE.md +328 -0
agentic_codememory-0.1.5/.planning/research/FEATURES.md +207 -0
agentic_codememory-0.1.5/.planning/research/PITFALLS.md +438 -0
agentic_codememory-0.1.5/.planning/research/STACK.md +276 -0
agentic_codememory-0.1.5/.planning/research/SUMMARY.md +190 -0
agentic_codememory-0.1.5/.pre-commit-config.yaml +29 -0
agentic_codememory-0.1.5/4-stage-ingestion-with-prep.md +18 -0
agentic_codememory-0.1.5/4_pass_ingestion_with_prep_hybridgraphRAG.py +749 -0
agentic_codememory-0.1.5/5_continuous_ingestion.py +776 -0
agentic_codememory-0.1.5/5_continuous_ingestion_jina.py +779 -0
agentic_codememory-0.1.5/AGENTS.md +375 -0
agentic_codememory-0.1.5/CONTRIBUTING.md +772 -0
agentic_codememory-0.1.5/DOCUMENTATION_SUMMARY.md +206 -0
agentic_codememory-0.1.5/Dockerfile +29 -0
agentic_codememory-0.1.5/GIT-INTEGRATION-SPEC.md +231 -0
agentic_codememory-0.1.5/GRAPHRAG_README.md +78 -0
agentic_codememory-0.1.5/LICENSE +21 -0
agentic_codememory-0.1.5/PKG-INFO +343 -0
agentic_codememory-0.1.5/README.md +316 -0
agentic_codememory-0.1.5/SPEC.md +115 -0
agentic_codememory-0.1.5/TODO.md +400 -0
agentic_codememory-0.1.5/debug_extraction.py +95 -0
agentic_codememory-0.1.5/docker-compose.yml +90 -0
agentic_codememory-0.1.5/docs/API.md +1150 -0
agentic_codememory-0.1.5/docs/ARCHITECTURE.md +1060 -0
agentic_codememory-0.1.5/docs/FIELD_TEST_RESULTS_2026-02-24.md +161 -0
agentic_codememory-0.1.5/docs/FIELD_TEST_TEMPLATE.md +99 -0
agentic_codememory-0.1.5/docs/GIT_GRAPH.md +213 -0
agentic_codememory-0.1.5/docs/INSTALLATION.md +668 -0
agentic_codememory-0.1.5/docs/MCP_INTEGRATION.md +859 -0
agentic_codememory-0.1.5/docs/NEO4J_BROWSER_VISUALIZATION.md +84 -0
agentic_codememory-0.1.5/docs/TOOL_USE_ANNOTATION.md +95 -0
agentic_codememory-0.1.5/docs/TROUBLESHOOTING.md +432 -0
agentic_codememory-0.1.5/docs/evaluation-decision.md +41 -0
agentic_codememory-0.1.5/docs/skill-adapter-security.md +127 -0
agentic_codememory-0.1.5/docs/skill-adapter-workflows.md +132 -0
agentic_codememory-0.1.5/evaluation/README.md +41 -0
agentic_codememory-0.1.5/evaluation/__init__.py +2 -0
agentic_codememory-0.1.5/evaluation/results/.gitkeep +1 -0
agentic_codememory-0.1.5/evaluation/schemas/benchmark_results.schema.json +270 -0
agentic_codememory-0.1.5/evaluation/scripts/create_run_scaffold.py +214 -0
agentic_codememory-0.1.5/evaluation/scripts/summarize_results.py +194 -0
agentic_codememory-0.1.5/evaluation/skills/skill-adapter-workflow.md +35 -0
agentic_codememory-0.1.5/evaluation/tasks/benchmark_tasks.json +191 -0
agentic_codememory-0.1.5/evaluation/templates/decision_memo_template.md +46 -0
agentic_codememory-0.1.5/examples/README.md +127 -0
agentic_codememory-0.1.5/examples/basic_usage.md +527 -0
agentic_codememory-0.1.5/examples/docker_setup.md +786 -0
agentic_codememory-0.1.5/examples/mcp_prompt_examples.md +588 -0
agentic_codememory-0.1.5/graphrag_requirements.txt +21 -0
agentic_codememory-0.1.5/pyproject.toml +138 -0
agentic_codememory-0.1.5/requirements.txt +8 -0
agentic_codememory-0.1.5/skills/agentic-memory-adapter/SKILL.md +87 -0
agentic_codememory-0.1.5/skills/agentic-memory-adapter/scripts/health_check.sh +185 -0
agentic_codememory-0.1.5/skills/agentic-memory-adapter/scripts/run_codememory.sh +201 -0
agentic_codememory-0.1.5/src/codememory/__init__.py +0 -0
agentic_codememory-0.1.5/src/codememory/cli.py +1242 -0
agentic_codememory-0.1.5/src/codememory/config.py +237 -0
agentic_codememory-0.1.5/src/codememory/docker/docker-compose.yml +48 -0
agentic_codememory-0.1.5/src/codememory/ingestion/__init__.py +13 -0
agentic_codememory-0.1.5/src/codememory/ingestion/git_graph.py +488 -0
agentic_codememory-0.1.5/src/codememory/ingestion/graph.py +1423 -0
agentic_codememory-0.1.5/src/codememory/ingestion/parser.py +254 -0
agentic_codememory-0.1.5/src/codememory/ingestion/watcher.py +424 -0
agentic_codememory-0.1.5/src/codememory/server/__init__.py +1 -0
agentic_codememory-0.1.5/src/codememory/server/app.py +750 -0
agentic_codememory-0.1.5/src/codememory/server/tools.py +104 -0
agentic_codememory-0.1.5/src/codememory/telemetry.py +307 -0
agentic_codememory-0.1.5/tests/__init__.py +1 -0
agentic_codememory-0.1.5/tests/conftest.py +16 -0
agentic_codememory-0.1.5/tests/test_cli.py +694 -0
agentic_codememory-0.1.5/tests/test_git_graph.py +122 -0
agentic_codememory-0.1.5/tests/test_graph.py +199 -0
agentic_codememory-0.1.5/tests/test_parser.py +107 -0
agentic_codememory-0.1.5/tests/test_server.py +352 -0
agentic_codememory-0.1.5/upload_checkpoint.py +65 -0

agentic_codememory-0.1.5/.claude/handoffs/2026-03-20-new-project-init.md ADDED Viewed

@@ -0,0 +1,190 @@
+# Session Handoff: New Project Initialization
+**Created:** 2026-03-20
+**Project:** D:\code\agentic-memory
+**Branch:** main
+**Session:** fa780870-3e1d-4373-add9-6a0e936326d8
+---
+## Current State Summary
+The `/gsd:new-project` workflow is ~60% complete. Research phase just finished (4 research files written). The next step is to spawn the **research synthesizer** to create `SUMMARY.md`, then proceed to requirements definition.
+**What's done:**
+- Codebase mapped (7 docs in `.planning/codebase/`)
+- Deep questioning completed — project scope fully defined
+- `PROJECT.md` created and committed (3b3e332)
+- `config.json` created and committed (646a0b8)
+- 4 research files created in `.planning/research/` (STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md)
+**What's next immediately:**
+1. Spawn `gsd-research-synthesizer` agent to create `.planning/research/SUMMARY.md`
+2. Define v1 requirements (per-category scoping questions)
+3. Create roadmap with phases
+4. Initialize STATE.md
+---
+## Important Context
+### What This Project Is
+Expanding the existing `codememory` CLI/MCP tool (code-only knowledge graph) into a **modular multi-type knowledge graph** with two new v1 modules:
+1. **Web Research Memory** — crawl4ai + Brave Search + Playwright/agent-browser, scheduled research pipelines, PDF ingestion, Gemini multimodal embeddings
+2. **Agent Conversation Memory** — auto-capture or manual import, session tracking, context retrieval for AI agents
+**Key architectural decisions already made:**
+- **Separate Neo4j databases per module** to prevent embedding model conflicts (code uses OpenAI, web/chat use Gemini)
+  - Code: port 7687, Web: 7688, Chat: 7689
+- Modular architecture — each module standalone, unified via MCP routing
+- Gemini embeddings (gemini-embedding-2-preview) for non-code modules
+- Crawl4AI for web extraction, Brave Search API for automated research
+- Vercel agent-browser for dynamic content
+- CLI + MCP interface (extends existing pattern)
+- Long-term vision: universal adapter layer for any AI workflow
+### What the User Wants
+The user is building this for personal use (research & analysis pipelines) but wants it adaptable for anyone. Key UX goals:
+- "One click install to whatever AI system they use"
+- Automated capture by default (no friction)
+- Deep research automation with scheduled variations
+- "Seamless integration with any AI" = the magic
+### User Profile
+- Technical, building production-quality systems
+- YOLO mode preference (auto-approve tools)
+- Wants research/plan-check/verifier agents enabled
+- Balanced model profile
+- Parallel execution enabled
+- Git tracking enabled
+### GSD Workflow Config
+```json
+{
+  "mode": "yolo",
+  "granularity": "standard",
+  "parallelization": true,
+  "commit_docs": true,
+  "model_profile": "balanced",
+  "workflow": {
+    "research": true,
+    "plan_check": true,
+    "verifier": true,
+    "nyquist_validation": true
+  }
+}
+```
+---
+## Research Files Created
+All 4 files are in `.planning/research/`:
+| File | Contents |
+|------|----------|
+| `STACK.md` | Technology recommendations: Gemini embeddings, Crawl4AI, Playwright, Brave API, Neo4j multi-db |
+| `FEATURES.md` | Table stakes vs differentiators vs anti-features for both modules + MVP recommendations |
+| `ARCHITECTURE.md` | Hub-and-spoke pattern, component boundaries, 4-pass ingestion, anti-patterns to avoid |
+| `PITFALLS.md` | 18 pitfalls categorized by severity (critical/moderate/minor) with prevention strategies + phase mapping |
+**SUMMARY.md does NOT exist yet** — synthesizer hasn't run.
+---
+## Immediate Next Steps
+1. **Spawn gsd-research-synthesizer** to create `.planning/research/SUMMARY.md`
+   - Prompt: "Synthesize the research outputs from the 4 files in D:\code\agentic-memory\.planning\research\ (STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md) into a SUMMARY.md. This is for a project adding Web Research Memory and Agent Conversation Memory modules to an existing code-only knowledge graph tool."
+2. **Display research complete banner** with key findings summary to user
+3. **Requirements definition** — present feature categories and use AskUserQuestion to scope each for v1:
+   - Web Research Memory features
+   - Conversation Memory features
+   - Shared infrastructure features
+4. **Create roadmap** — phases mapping requirements to implementation
+5. **Initialize STATE.md**
+---
+## Key Patterns From Existing Codebase
+From `.planning/codebase/` analysis:
+- Uses FastMCP for MCP server
+- 4-pass ingestion pipeline already exists for code
+- OpenAI embeddings (text-embedding-3-large, 3072d) for code
+- Neo4j with vector indexes
+- Tree-sitter for code parsing
+- Config via `.codememory/config.json`
+- Existing concerns: silent embedding failures, text truncation, single-threaded embedding
+**New modules should fix these patterns**, not replicate them.
+---
+## Critical Files
+| File | Purpose |
+|------|---------|
+| `.planning/PROJECT.md` | Full project scope, requirements, constraints, decisions |
+| `.planning/config.json` | GSD workflow configuration |
+| `.planning/codebase/ARCHITECTURE.md` | Existing codebase architecture |
+| `.planning/codebase/CONCERNS.md` | Known issues to avoid repeating |
+| `.planning/research/FEATURES.md` | MVP feature recommendations |
+| `.planning/research/PITFALLS.md` | 18 pitfalls with phase-specific warnings |
+| `src/codememory/` | Existing code module to extend |
+---
+## Potential Gotchas
+1. **Research agent file permissions** — In previous session, subagents had Write tool auto-denied. Files were manually created in main conversation. If spawning agents again, confirm Write permissions are available.
+2. **Embedding dimension conflict** — Do NOT allow OpenAI + Gemini embeddings in same Neo4j database. This is Pitfall #1 in PITFALLS.md. Separate databases is the validated approach.
+3. **STACK.md content** — Created from agent summary, not full output. May be less detailed than FEATURES.md/ARCHITECTURE.md/PITFALLS.md which were written with more complete agent outputs.
+4. **Uncommitted changes** — `.planning/research/` is untracked (`??` in git status). Commit after SUMMARY.md is created.
+5. **Main branch vs master** — Working on `main` but `master` is listed as the "main branch" for PRs. Use `main` for development.
+---
+## Decisions Made (With Rationale)
+| Decision | Rationale |
+|----------|-----------|
+| Separate Neo4j databases per module | Prevent embedding model conflicts (OpenAI 3072d vs Gemini 768d incompatible) |
+| Gemini for web/chat, OpenAI for code | Code module already validated with OpenAI; Gemini multimodal needed for non-code |
+| Both modules in v1 | User wants full web+chat scope from the start |
+| Crawl4AI primary, agent-browser for dynamic | Crawl4AI handles most cases; JS-heavy sites need Playwright/agent-browser |
+| Brave Search API (configurable) | User's preference; other options possible via config |
+| CLI + MCP (existing pattern) | Extends what works; universal adapter layer is future vision |
+| YOLO + parallel + balanced model | User's explicit selections during workflow setup |
+---
+## GSD Workflow State
+**Phase:** New Project Initialization
+**Step:** Research Synthesis (post-research, pre-requirements)
+The workflow context when the session ended: research agents had completed but couldn't write files. Files were manually created. Next is synthesizer → requirements → roadmap.
+**Workflow: `/gsd:new-project`**
+- [x] Deep questioning
+- [x] PROJECT.md created
+- [x] config.json created
+- [x] 4 research agents spawned
+- [x] Research files written (manually, due to permission issue)
+- [ ] SUMMARY.md synthesized
+- [ ] Requirements defined
+- [ ] Roadmap created
+- [ ] STATE.md initialized

agentic_codememory-0.1.5/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,11 @@
+{
+  "permissions": {
+    "allow": [
+      "mcp__zread__get_repo_structure",
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Bash(dir:*)",
+      "Bash(test:*)"
+    ]
+  }
+}

agentic_codememory-0.1.5/.env.example ADDED Viewed

@@ -0,0 +1,51 @@
+# Agentic Memory - Environment Configuration
+# Copy this file to .env and fill in your values
+# ============================================================================
+# NEO4J DATABASE
+# ============================================================================
+# Neo4j connection URI
+# - Local Neo4j: bolt://localhost:7687
+# - Neo4j Aura: neo4j+s://your-instance.databases.neo4j.io
+NEO4J_URI=bolt://localhost:7687
+# Neo4j authentication
+NEO4J_USER=neo4j
+NEO4J_PASSWORD=your_neo4j_password
+# Note: Neo4j 5.18+ is required for vector index support
+# ============================================================================
+# OPENAI API
+# ============================================================================
+# OpenAI API key for embeddings (text-embedding-3-large)
+# Get yours at: https://platform.openai.com/api-keys
+OPENAI_API_KEY=sk-your-openai-api-key-here
+# Optional: Override embedding model (default: text-embedding-3-large)
+# EMBEDDING_MODEL=text-embedding-3-large
+# ============================================================================
+# INGESTION SETTINGS
+# ============================================================================
+# Optional: Repository path to index (can also be passed as CLI argument)
+# REPO_PATH=/path/to/your/codebase
+# Optional: Supported file extensions (comma-separated)
+# SUPPORTED_EXTENSIONS=.py,.js,.ts,.tsx,.jsx
+# Optional: Directories to ignore during indexing (comma-separated)
+# IGNORE_DIRS=node_modules,__pycache__,.git,dist,build,.venv,venv
+# Optional: Logging level (DEBUG, INFO, WARNING, ERROR)
+# LOG_LEVEL=INFO
+# ============================================================================
+# MCP SERVER
+# ============================================================================
+# Optional: Port for MCP server (default: varies by client)
+# MCP_PORT=8000

agentic_codememory-0.1.5/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,110 @@
+name: CI
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+jobs:
+  lint-and-format:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+      - name: Run Ruff linter
+        run: ruff check src/
+      - name: Run Black format check
+        run: black --check src/ tests/
+      - name: Run MyPy type check
+        run: mypy src/
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.10', '3.11', '3.12']
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+      - name: Run unit tests with coverage
+        run: pytest --cov=codememory --cov-report=xml --cov-report=term-missing -m "not integration" tests/
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v3
+        with:
+          files: ./coverage.xml
+          fail_ci_if_error: false
+          verbose: true
+  integration-test:
+    runs-on: ubuntu-latest
+    services:
+      neo4j:
+        image: neo4j:5.18-community
+        env:
+          NEO4J_AUTH: neo4j/testpassword
+          NEO4J_PLUGINS: '["apoc"]'
+        ports:
+          - 7687:7687
+          - 7474:7474
+        options: >-
+          --health-cmd "cypher-shell -u neo4j -p testpassword 'RETURN 1'"
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+      - name: Wait for Neo4j
+        run: |
+          for i in {1..30}; do
+            if curl -s http://localhost:7474/db/data/ > /dev/null 2>&1; then
+              echo "Neo4j is ready"
+              break
+            fi
+            echo "Waiting for Neo4j... ($i)"
+            sleep 2
+          done
+      - name: Run integration tests
+        env:
+          NEO4J_URI: bolt://localhost:7687
+          NEO4J_USER: neo4j
+          NEO4J_PASSWORD: testpassword
+          OPENAI_API_KEY: sk-test-key
+        run: pytest -m integration tests/

agentic_codememory-0.1.5/.gitignore ADDED Viewed

@@ -0,0 +1,12 @@
+__pycache__/
+*.pyc
+.venv/
+venv/
+.env
+.DS_Store
+.pytest_cache/
+.coverage
+htmlcov/
+dist/
+build/
+*.egg-info/

agentic_codememory-0.1.5/.planning/PROJECT.md ADDED Viewed

@@ -0,0 +1,118 @@
+# Agentic Memory - Universal Knowledge Graph
+## What This Is
+A modular knowledge graph system that gives AI agents long-term memory across any content type. Currently handles code repositories via tree-sitter parsing and Neo4j graph storage. Expanding with two new modules: Web Research Memory for automated research pipelines (web search, crawling, PDFs) and Agent Conversation Memory for persistent chat/conversation context. Each module operates independently with its own database or optionally shares a unified graph. Agents access memory via MCP tools.
+## Core Value
+AI agents get seamless, persistent memory that works regardless of content type or AI system - making workflows feel magical and enabling deep, cumulative research over time.
+## Requirements
+### Validated
+<!-- Existing code memory capabilities - proven and working -->
+- ✓ Code repository indexing with tree-sitter (Python, JavaScript/TypeScript) — existing
+- ✓ Multi-pass ingestion pipeline (structure scan → entities → relationships → embeddings) — existing
+- ✓ Neo4j graph database with vector search — existing
+- ✓ MCP server exposing search, dependency, and impact analysis tools — existing
+- ✓ CLI interface (init, index, watch, serve, search, deps, impact) — existing
+- ✓ Incremental file watching for code changes — existing
+- ✓ Git history graph ingestion (commits, provenance tracking) — existing
+- ✓ OpenAI text embeddings for semantic code search — existing
+- ✓ Per-repository configuration with environment variable fallbacks — existing
+### Active
+<!-- v1 scope - building these now -->
+**Web Research Memory Module:**
+- [ ] Ingest web pages via URL (manual input)
+- [ ] Auto-crawl from web search results (Brave Search API)
+- [ ] Parse and index PDF documents
+- [ ] Semantic search across all ingested web content
+- [ ] Crawl4AI integration for robust web content extraction (primary)
+- [ ] Vercel agent-browser fallback for JS-rendered/dynamic content (Playwright abstraction optimized for agent workflows — more efficient than raw Playwright)
+- [ ] Smart scheduled research: prompt templates with variables; LLM fills variables each run based on past research graph + conversation history; avoids repeating covered topics
+- [ ] Google Gemini multimodal embeddings (gemini-embedding-2-preview)
+- [ ] Separate Neo4j database for web research content (port 7688)
+- [ ] MCP tools: search_web_memory, ingest_url, schedule_research, run_research_session
+**Agent Conversation Memory Module:**
+- [ ] Ingest conversation logs and chat transcripts (manual import: JSON/JSONL)
+- [ ] Fully automated set-and-forget capture: once configured, conversations are captured without user or agent intervention
+- [ ] Provider-specific automatic integration: Claude Code stop-session hook; survey and implement equivalent zero-friction hooks for other major providers (ChatGPT, Cursor, Windsurf, etc.)
+- [ ] MCP tool (add_message) as universal fallback for providers without native hook support
+- [ ] Query conversational context (retrieve relevant past exchanges)
+- [ ] Incremental message updates (add new messages without full re-index)
+- [ ] User/session tracking (who said what, conversation boundaries, provider attribution)
+- [ ] Google Gemini multimodal embeddings (gemini-embedding-2-preview)
+- [ ] Separate Neo4j database for conversation content (port 7689)
+- [ ] MCP tools: search_conversations, add_message, get_conversation_context
+**Shared Infrastructure:**
+- [ ] Modular architecture supporting independent or unified databases
+- [ ] Configurable embedding model selection: Gemini, OpenAI, Nvidia Nemotron
+- [ ] Config validation: warn if mixing embedding models in unified database
+- [ ] CLI commands: web-init, web-ingest, web-search, chat-init, chat-ingest
+- [ ] Documentation for module setup and configuration
+### Out of Scope
+- Web UI dashboard — Nice-to-have, not v1 priority
+- IDE extensions (VS Code, Cursor) — Future, after proven via MCP
+- Desktop Electron app — Future, CLI + MCP proven first
+- Real-time collaboration features — Single-user focus for v1
+- Advanced conversation analytics (sentiment, topic modeling) — Basic retrieval first
+- Video/audio transcription — Rely on external tools, ingest transcripts only
+- OpenClaw/Codex-specific adapters — Universal adapter layer is post-v1
+- Simple cron scheduling (repeat same query) — Replaced by smart scheduled research with LLM-driven variable substitution
+## Context
+**Existing system:**
+- Proven architecture with Neo4j + MCP + CLI for code memory
+- Multi-pass ingestion pipeline adaptable to new content types
+- Production telemetry system tracking tool usage for research
+**User's immediate use case:**
+- Research pipeline for deep topic exploration
+- Daily automated research on evolving questions
+- Build cumulative knowledge graph on specific domains
+**Long-term vision:**
+- One-click install for any AI workflow
+- Universal adapter layer for OpenClaw, Claude Code, Codex, etc.
+- Seamless integration regardless of which AI system users choose
+**Technical foundation:**
+- Tree-sitter works for code; Crawl4AI + agent-browser handle web/documents
+- OpenAI embeddings proven for code; Google Gemini for multimodal content
+- Separate databases by default prevents embedding model conflicts
+## Constraints
+- **Embedding consistency**: If unified database, all modules must use same embedding model
+- **Existing code memory**: Must maintain full functionality of current code ingestion
+- **Modular independence**: Each module works standalone (no hard cross-dependencies)
+- **Tech stack**: Python 3.10+, Neo4j 5.18+, existing CLI/MCP patterns
+- **API availability**: Requires Google Vertex AI access, Brave Search API key
+- **One-click install**: Must be pip/CLI installable without complex setup
+## Key Decisions
+| Decision | Rationale | Outcome |
+|----------|-----------|---------|
+| Separate databases by default | Prevents embedding model conflicts (OpenAI 3072d vs Gemini 768d incompatible in same vector index) | ✓ Confirmed |
+| Google Gemini embeddings for web/chat | Multimodal support (text, images, future video/audio); OpenAI stays for code module | ✓ Confirmed |
+| Nvidia Nemotron in v1 | NIM API is OpenAI-compatible — ~20 line addition once abstraction layer exists; near-zero cost | ✓ Confirmed |
+| Crawl4AI primary + agent-browser fallback | Crawl4AI handles static pages; Vercel agent-browser for JS-rendered dynamic content (more efficient than raw Playwright for agent workflows) | ✓ Confirmed |
+| Brave Search API as default | Free tier available, good results, configurable for alternatives | ✓ Confirmed |
+| Smart scheduled research (not simple cron) | Prompt templates with LLM-driven variable substitution; context-aware (no topic repetition); steered by past research + conversation history | ✓ Confirmed |
+| Set-and-forget automated capture | UX goal: configure once, captures forever with zero friction; provider-native hooks where available (Claude Code confirmed); MCP tool as fallback for unsupported providers | ✓ Confirmed |
+| Modular architecture | Each module independently usable, scales to future content types | ✓ Confirmed |
+---
+*Last updated: 2026-03-20 after requirements definition*

agentic_codememory-0.1.5/.planning/ROADMAP.md ADDED Viewed

@@ -0,0 +1,188 @@
+# Agentic Memory — v1 Roadmap
+**Project:** Modular Knowledge Graph (Code + Web Research + Conversation Memory)
+**Created:** 2026-03-20
+**Status:** Planning
+---
+## Milestone: v1.0 — Full Multi-Module Memory System
+**Goal:** Extend the existing code memory tool into a universal agent memory system with Web Research Memory and Agent Conversation Memory modules, accessible via CLI and MCP.
+---
+## Phase 1: Foundation
+**Goal:** Establish the shared infrastructure all modules build on. Must be done first — retrofitting these patterns later is costly.
+**Deliverables:**
+- Abstract ingestion base classes (`BaseIngestor`, `BaseEmbeddingService`, `BaseGraphWriter`)
+- Embedding service abstraction layer supporting Gemini, OpenAI, and Nvidia Nemotron (NIM-compatible, OpenAI SDK with `base_url` override)
+- Config validation system — detects embedding model mismatches across databases, warns loudly
+- Multi-database connection manager (routes to :7687 code, :7688 web, :7689 chat)
+- Docker Compose updated with web and chat Neo4j instances (ports 7688, 7689)
+- CLI scaffolding for new commands (`web-init`, `web-ingest`, `web-search`, `chat-init`, `chat-ingest`) — structure only, not yet implemented
+- Unit tests for embedding service abstraction and config validation
+**Success Criteria:**
+- All three Neo4j instances start cleanly via `docker-compose up`
+- Embedding service abstraction passes correct model/dimensions to each database
+- Config validation catches and rejects mixed embedding model configurations
+- Existing code module continues to work unchanged
+**Key Risks:**
+- Gemini embedding API specifics (model name, dimensionality, auth method) — verify early
+- Neo4j Community Edition multi-database support — confirm before designing connection manager
+---
+## Phase 2: Web Research Core
+**Goal:** Functional web research ingestion — URLs, PDFs, and web search results land in the knowledge graph and are semantically searchable.
+**Deliverables:**
+- Crawl4AI integration: URL ingestion, content filtering (boilerplate removal), metadata extraction (title, author, date, source URL)
+- PDF parsing via Crawl4AI built-in support
+- Vercel agent-browser integration as fallback for JS-rendered/dynamic content (more efficient than raw Playwright for agent workflows)
+- Brave Search API integration: web search → auto-ingest top results
+- Gemini multimodal embedding service (gemini-embedding-2-preview) for web content
+- Neo4j web database schema + vector indexes
+- Content deduplication (hash-based, update vs create logic)
+- MCP tools: `ingest_url`, `search_web_memory`
+- CLI commands: `web-init`, `web-ingest`, `web-search` (fully functional)
+**Success Criteria:**
+- `codememory web-ingest <url>` ingests a static page and makes it searchable
+- PDF documents ingested and retrievable via semantic search
+- JS-rendered pages fall back to agent-browser automatically, transparently
+- `codememory web-search "query"` runs Brave Search and auto-ingests results
+- No duplicate entries for the same URL on re-ingest (updates instead)
+- Semantic search returns relevant results across all ingested web content
+**Key Risks:**
+- Crawl4AI version stability and JS rendering reliability
+- Vercel agent-browser API surface — verify current documentation
+- Brave Search API rate limits and response schema
+- Gemini embedding API access (Vertex AI vs AI Studio auth)
+---
+## Phase 3: Web Research Scheduling
+**Goal:** Smart automated research pipeline — set a research template, system runs it on a schedule with LLM-driven variation, building cumulative knowledge over time.
+**Deliverables:**
+- Prompt template system with variable placeholders (e.g. `{topic}`, `{angle}`, `{timeframe}`)
+- LLM-driven variable substitution each run: reads existing research graph + conversation history to select variable values that explore new angles, avoids repeating covered topics
+- Topic coverage tracker: graph-based record of what has been researched, used to steer future runs
+- Schedule management: cron-based execution, configurable frequency (daily, weekly, custom)
+- Research session orchestrator: template → variable fill → search → ingest → update coverage
+- Circuit breakers: rate limit handling, cost caps, graceful degradation on API failures
+- MCP tools: `schedule_research`, `run_research_session`, `list_research_schedules`
+- CLI commands: `web-schedule`, `web-run-research`
+**Success Criteria:**
+- User defines a research template once; system runs autonomously on schedule
+- Each run produces meaningfully different queries based on what's already in the graph
+- Coverage tracker correctly identifies and avoids already-researched topics
+- Failed runs (API errors, rate limits) are logged and retried gracefully
+- Research output is cumulative — graph grows richer over time without duplication
+**Key Risks:**
+- LLM variable substitution quality — prompt engineering for consistent, useful variation
+- Cost management for automated LLM calls on schedule
+- Scheduler library choice (APScheduler vs system cron vs custom)
+---
+## Phase 4: Conversation Memory
+**Goal:** Set-and-forget conversation capture — configure once, all conversations are automatically stored and semantically searchable across providers.
+**Deliverables:**
+- Neo4j conversation database schema: conversations, messages, participants, sessions (port 7689)
+- Gemini embeddings for conversation content
+- Claude Code integration: stop-session hook auto-exports and ingests conversation on session end
+- Provider survey: research hook/integration mechanisms for ChatGPT, Cursor, Windsurf, and other major agent platforms
+- Provider-specific integrations for surveyed platforms (wherever native hooks exist)
+- Manual import fallback: JSON/JSONL conversation log ingestion
+- MCP tool fallback: `add_message()` for providers with no native hook support
+- Incremental message updates (append-only, no full re-index on new messages)
+- User/session tracking: provider attribution, conversation boundaries, role tagging (user/assistant/system)
+- MCP tools: `search_conversations`, `add_message`, `get_conversation_context`
+- CLI commands: `chat-init`, `chat-ingest`, `chat-search`
+**Success Criteria:**
+- Claude Code sessions captured automatically with zero user action after initial setup
+- At least two additional providers integrated with native hooks
+- Manual import handles real-world conversation export formats
+- Semantic search retrieves relevant past exchanges across all captured conversations
+- `get_conversation_context` returns ranked relevant history for a given query
+- Provider attribution is correct (no mixing conversations across providers)
+**Key Risks:**
+- Provider hook availability varies significantly — some may have no hook mechanism
+- Conversation data privacy — clear scoping of what gets captured vs excluded
+- Schema must be locked before first ingest (hard to migrate conversation graph later)
+---
+## Phase 5: Cross-Module Integration & Hardening
+**Goal:** Unified agent interface across all three modules, Nvidia Nemotron embedding support, production hardening.
+**Deliverables:**
+- Unified MCP router: single server aggregates code + web + conversation results
+- Cross-module search: `search_all_memory` queries all databases, merges and ranks results
+- Nvidia Nemotron embedding service (NIM API, OpenAI-compatible — ~20 lines via existing abstraction)
+- Structured logging and observability across all modules
+- Error recovery and retry logic standardized across modules
+- Documentation: setup guides, MCP tool reference, provider integration guides
+- End-to-end integration tests across all three modules
+**Success Criteria:**
+- Single MCP server exposes all tools from all three modules
+- `search_all_memory` returns coherent ranked results across code, web, and conversation content
+- Nvidia Nemotron can be selected as embedding model via config
+- All three modules pass integration tests end-to-end
+- Setup guide enables a new user to have all three modules running in under 30 minutes
+**Key Risks:**
+- Cross-module result ranking/merging quality
+- MCP server routing complexity with many tools
+- Neo4j Community Edition limits on concurrent connections across 3 databases
+---
+## Phase Dependencies
+```
+Phase 1 (Foundation)
+    └── Phase 2 (Web Research Core)
+            └── Phase 3 (Web Research Scheduling)
+    └── Phase 4 (Conversation Memory)
+Phase 2 + Phase 4
+    └── Phase 5 (Cross-Module Integration)
+```
+Phases 2 and 4 can run in parallel after Phase 1 completes.
+Phase 3 depends on Phase 2 (requires working ingestion pipeline).
+Phase 5 depends on all prior phases.
+---
+## Open Research Questions (Pre-Implementation)
+| Question | Blocks | Priority |
+|----------|--------|----------|
+| Gemini embedding API: model name, dimensionality, auth (Vertex AI vs AI Studio) | Phase 1, 2 | Critical |
+| Neo4j Community Edition: multi-database support on single instance | Phase 1 | Critical |
+| Vercel agent-browser: current API surface, install method, JS rendering reliability | Phase 2 | High |
+| Crawl4AI: current stable version, PDF support status | Phase 2 | High |
+| Brave Search: rate limits, response schema, free tier constraints | Phase 2, 3 | High |
+| Cursor/Windsurf/ChatGPT: available hooks or integration points for conversation capture | Phase 4 | Medium |
+| APScheduler vs system cron vs custom: best fit for research scheduling | Phase 3 | Medium |
+---
+*Last updated: 2026-03-20 after requirements definition*