PyPI - swarm-notes - Versions diffs - 0.1.0__tar.gz - Mend

swarm-notes 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

swarm_notes-0.1.0/PKG-INFO +228 -0
swarm_notes-0.1.0/README.md +211 -0
swarm_notes-0.1.0/pyproject.toml +37 -0
swarm_notes-0.1.0/src/swarm_notes/__init__.py +1 -0
swarm_notes-0.1.0/src/swarm_notes/analyst.py +322 -0
swarm_notes-0.1.0/src/swarm_notes/config.py +321 -0
swarm_notes-0.1.0/src/swarm_notes/critic.py +175 -0
swarm_notes-0.1.0/src/swarm_notes/discussant.py +43 -0
swarm_notes-0.1.0/src/swarm_notes/domain_expert.py +63 -0
swarm_notes-0.1.0/src/swarm_notes/federation.py +289 -0
swarm_notes-0.1.0/src/swarm_notes/main.py +277 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/__init__.py +97 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/arxiv.py +196 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/base.py +29 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/common.py +91 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/openalex.py +395 -0
swarm_notes-0.1.0/src/swarm_notes/paper_search/semantic_scholar.py +211 -0
swarm_notes-0.1.0/src/swarm_notes/router.py +172 -0
swarm_notes-0.1.0/src/swarm_notes/vault_manager.py +135 -0
swarm_notes-0.1.0/src/swarm_notes/vault_writer.py +563 -0
swarm_notes-0.1.0/src/swarm_notes/watcher.py +51 -0

swarm_notes-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,228 @@
+Metadata-Version: 2.3
+Name: swarm-notes
+Version: 0.1.0
+Summary: Automated research paper tracking and knowledge synthesis
+Author: LM
+Author-email: LM <hi@leima.is>
+Requires-Dist: beautifulsoup4>=4.14.3
+Requires-Dist: pydantic>=2.12.5
+Requires-Dist: pydantic-ai>=1.71.0
+Requires-Dist: python-dotenv>=1.2.2
+Requires-Dist: python-frontmatter>=1.1.0
+Requires-Dist: pyyaml>=6.0.3
+Requires-Dist: requests>=2.32.5
+Requires-Dist: typer>=0.24.1
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+# Swarm Notes Core Package
+An autonomous, serverless, multi-agent system that tracks academic papers, extracts structured data, and weaves them into a local, interconnected Markdown knowledge graph — a **Second Brain** for ML research.
+Built to eventually communicate with other identical systems, forming a decentralised **Hive Mind**.
+---
+## Architecture
+```
+┌─────────────────────────────────────────────────────┐
+│                  GitHub Actions CI                  │
+│  (weekly schedule + workflow_dispatch)              │
+└─────────────────────┬───────────────────────────────┘
+                      │
+         ┌────────────▼────────────┐
+         │   Federation Agent      │  ← consumes external public_feed.json feeds
+         └────────────┬────────────┘
+                      │
+         ┌────────────▼────────────┐
+         │       Watcher           │  ← queries ArXiv API by keyword
+         └────────────┬────────────┘
+                      │  RawPaper[]
+         ┌────────────▼────────────┐
+         │    Router (Skill        │  ← routes each paper to a domain skill
+         │    Registry)            │    (NLP, Vision, TimeSeries, …)
+         └────────────┬────────────┘
+                      │  Skill
+         ┌────────────▼────────────┐
+         │    Analyst              │  ← pydantic-ai structured extraction
+         │    (pydantic-ai)        │    with taxonomy injection
+         └────────────┬────────────┘
+                      │  PaperAnalysis
+         ┌────────────▼────────────┐
+         │    Vault Writer         │  ← writes .md to tmp_vault/
+         │                         │    generates concept stubs
+         │                         │    updates public_feed.json
+         └────────────┬────────────┘
+                      │  atomic move
+         ┌────────────▼────────────┐
+         │       /vault            │  ← permanent, file-based knowledge graph
+         │   papers/ concepts/     │
+         │   datasets/             │
+         └─────────────────────────┘
+```
+## Directory Structure
+```
+research-cruise/
+├── .github/
+│   └── workflows/
+│       └── autonomous-tracker.yml   # CI/CD pipeline
+├── vault/
+│   ├── papers/                      # One .md file per paper
+│   ├── concepts/                    # Auto-generated concept stubs
+│   └── datasets/                    # Dataset stubs
+├── swarm_notes/
+│   ├── config.py                    # Configuration & env vars
+│   ├── vault_manager.py             # Staging pattern (tmp_vault → vault)
+│   ├── watcher.py                   # Configurable paper-source watcher
+│   ├── router.py                    # Skill registry router
+│   ├── analyst.py                   # pydantic-ai extraction agent
+│   ├── vault_writer.py              # Markdown writer + public_feed.json
+│   ├── federation.py                # Hive Mind federation agent
+│   └── main.py                      # Pipeline orchestrator
+├── taxonomy.json                    # Controlled vocabulary (tags, domains)
+├── public_feed.json                 # Rolling feed of last 20 papers (for federation)
+└── requirements.txt
+```
+## Quick Start
+### Prerequisites
+- Python 3.11+
+- An OpenAI-compatible API key
+### Local Run
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Set your API key
+export LLM_API_KEY="sk-..."
+# Optionally customise keywords
+export PAPER_KEYWORDS="mamba,diffusion model,retrieval augmented generation"
+# Optional: switch the watcher to Semantic Scholar
+export PAPER_SOURCE="semantic_scholar"
+export SEMANTIC_SCHOLAR_API_KEY="..."
+# Run the pipeline
+python -m swarm_notes.main
+```
+### Configuration (Environment Variables)
+| Variable | Default | Description |
+|---|---|---|
+| `LLM_API_KEY` | *(required)* | API key for the LLM provider |
+| `LLM_MODEL` | `openai:gpt-4o-mini` | pydantic-ai model string |
+| `PAPER_SOURCE` | `arxiv` | Paper search backend: `arxiv` or `semantic_scholar` |
+| `PAPER_KEYWORDS` | See `config.py` | Comma-separated search terms |
+| `PAPER_MAX_RESULTS_PER_KEYWORD` | `5` | Papers fetched per keyword |
+| `PAPER_TOTAL_CAP` | `20` | Hard cap on total papers per run |
+| `SEMANTIC_SCHOLAR_API_KEY` | *(empty)* | Optional Semantic Scholar API key sent as `x-api-key` |
+| `FEDERATION_FEEDS` | *(empty)* | Comma-separated external feed URLs |
+| `PUBLIC_FEED_MAX_ITEMS` | `20` | Max entries kept in `public_feed.json` |
+When `PAPER_SOURCE=semantic_scholar`, the watcher queries Semantic Scholar's Graph API and keeps only results that can be mapped back to an ArXiv identifier. That preserves compatibility with the rest of the pipeline, which still stores papers by `arxiv_id`.
+Legacy `ARXIV_KEYWORDS`, `ARXIV_MAX_RESULTS_PER_KEYWORD`, and `ARXIV_TOTAL_CAP` are still accepted for backward compatibility, but `PAPER_*` names are now canonical.
+## CI/CD Setup
+### 1. Fork the repository
+Click **Fork** on GitHub to create your own copy of this repository.
+### 2. Add the required secret
+The pipeline needs an OpenAI-compatible API key to run the LLM analyst step.
+1. Open your forked repository on GitHub.
+2. Go to **Settings → Secrets and variables → Actions**.
+3. Click **New repository secret**.
+4. Set **Name** to `LLM_API_KEY` and **Secret** to your API key (e.g. `sk-...`).
+5. Click **Add secret**.
+> **Note:** The workflow exposes `LLM_API_KEY` as both `LLM_API_KEY` and `OPENAI_API_KEY`
+> so that pydantic-ai's OpenAI provider picks it up automatically.
+### 3. (Optional) Override the model
+By default the pipeline uses `openai:gpt-4o-mini`.  To use a different model, add a
+second repository secret (or variable) named `LLM_MODEL` with the pydantic-ai model
+string, e.g. `openai:gpt-4o` or `anthropic:claude-3-5-haiku`.
+You can also set `LLM_MODEL` in the workflow's `env:` block directly if you prefer not
+to use a secret.
+### 4. Run the pipeline
+- **Scheduled:** the pipeline fires automatically every **Monday at 06:00 UTC**.
+- **Manual:** go to **Actions → Autonomous Research Tracker → Run workflow**, optionally
+  override `keywords`, `federation_feeds`, and `max_results` in the dispatch form.
+## The Hive Mind (Federation)
+Every successful run updates `public_feed.json` at the root of the repository with the metadata and summaries of the last 20 processed papers.
+To subscribe to another agent's feed, pass their raw `public_feed.json` URL:
+```bash
+export FEDERATION_FEEDS="https://raw.githubusercontent.com/alice/research-cruise/main/public_feed.json,https://raw.githubusercontent.com/bob/research-cruise/main/public_feed.json"
+python -m swarm_notes.main
+```
+Or set `federation_feeds` in the **workflow_dispatch** inputs.
+**Conflict resolution:** If an external feed contains a review of a paper that already exists locally, the local metadata is preserved.  The external summary is appended under a `### External Perspectives` section:
+```markdown
+### External Perspectives
+> "Transformers are over-engineered for this dataset." - @Agent_alice
+> *(Retrieved 2024-01-15)*
+```
+## Vault File Format
+Each paper note uses hybrid YAML frontmatter (CSL-compatible fields + custom fields):
+```yaml
+---
+# CSL-compatible fields
+title: "Attention Is All You Need"
+author:
+  - literal: "Ashish Vaswani"
+issued:
+  date-parts:
+    - [2017, 6, 12]
+url: "https://arxiv.org/abs/1706.03762"
+# Custom fields
+arxiv_id: "1706.03762"
+domain: "nlp"
+tags:
+  - "transformer"
+  - "attention-mechanism"
+architectures:
+  - "encoder-decoder"
+datasets:
+  - "WMT 2014"
+skill: "NLPSkill"
+processed_at: "2024-01-15T06:00:00Z"
+---
+```
+Body sections: **Summary**, **Key Contributions**, **Key Concepts** (with relative links to `../concepts/`), **Datasets**, **Limitations**, **Links**.
+## Taxonomy
+`taxonomy.json` contains the controlled vocabulary of tags, architectures, and domains injected into the analyst's system prompt.  This prevents LLM hallucination and keeps metadata consistent.  Edit `taxonomy.json` to add new terms.
+## License
+MIT — see [LICENSE](LICENSE).

swarm_notes-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,211 @@
+# Swarm Notes Core Package
+An autonomous, serverless, multi-agent system that tracks academic papers, extracts structured data, and weaves them into a local, interconnected Markdown knowledge graph — a **Second Brain** for ML research.
+Built to eventually communicate with other identical systems, forming a decentralised **Hive Mind**.
+---
+## Architecture
+```
+┌─────────────────────────────────────────────────────┐
+│                  GitHub Actions CI                  │
+│  (weekly schedule + workflow_dispatch)              │
+└─────────────────────┬───────────────────────────────┘
+                      │
+         ┌────────────▼────────────┐
+         │   Federation Agent      │  ← consumes external public_feed.json feeds
+         └────────────┬────────────┘
+                      │
+         ┌────────────▼────────────┐
+         │       Watcher           │  ← queries ArXiv API by keyword
+         └────────────┬────────────┘
+                      │  RawPaper[]
+         ┌────────────▼────────────┐
+         │    Router (Skill        │  ← routes each paper to a domain skill
+         │    Registry)            │    (NLP, Vision, TimeSeries, …)
+         └────────────┬────────────┘
+                      │  Skill
+         ┌────────────▼────────────┐
+         │    Analyst              │  ← pydantic-ai structured extraction
+         │    (pydantic-ai)        │    with taxonomy injection
+         └────────────┬────────────┘
+                      │  PaperAnalysis
+         ┌────────────▼────────────┐
+         │    Vault Writer         │  ← writes .md to tmp_vault/
+         │                         │    generates concept stubs
+         │                         │    updates public_feed.json
+         └────────────┬────────────┘
+                      │  atomic move
+         ┌────────────▼────────────┐
+         │       /vault            │  ← permanent, file-based knowledge graph
+         │   papers/ concepts/     │
+         │   datasets/             │
+         └─────────────────────────┘
+```
+## Directory Structure
+```
+research-cruise/
+├── .github/
+│   └── workflows/
+│       └── autonomous-tracker.yml   # CI/CD pipeline
+├── vault/
+│   ├── papers/                      # One .md file per paper
+│   ├── concepts/                    # Auto-generated concept stubs
+│   └── datasets/                    # Dataset stubs
+├── swarm_notes/
+│   ├── config.py                    # Configuration & env vars
+│   ├── vault_manager.py             # Staging pattern (tmp_vault → vault)
+│   ├── watcher.py                   # Configurable paper-source watcher
+│   ├── router.py                    # Skill registry router
+│   ├── analyst.py                   # pydantic-ai extraction agent
+│   ├── vault_writer.py              # Markdown writer + public_feed.json
+│   ├── federation.py                # Hive Mind federation agent
+│   └── main.py                      # Pipeline orchestrator
+├── taxonomy.json                    # Controlled vocabulary (tags, domains)
+├── public_feed.json                 # Rolling feed of last 20 papers (for federation)
+└── requirements.txt
+```
+## Quick Start
+### Prerequisites
+- Python 3.11+
+- An OpenAI-compatible API key
+### Local Run
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Set your API key
+export LLM_API_KEY="sk-..."
+# Optionally customise keywords
+export PAPER_KEYWORDS="mamba,diffusion model,retrieval augmented generation"
+# Optional: switch the watcher to Semantic Scholar
+export PAPER_SOURCE="semantic_scholar"
+export SEMANTIC_SCHOLAR_API_KEY="..."
+# Run the pipeline
+python -m swarm_notes.main
+```
+### Configuration (Environment Variables)
+| Variable | Default | Description |
+|---|---|---|
+| `LLM_API_KEY` | *(required)* | API key for the LLM provider |
+| `LLM_MODEL` | `openai:gpt-4o-mini` | pydantic-ai model string |
+| `PAPER_SOURCE` | `arxiv` | Paper search backend: `arxiv` or `semantic_scholar` |
+| `PAPER_KEYWORDS` | See `config.py` | Comma-separated search terms |
+| `PAPER_MAX_RESULTS_PER_KEYWORD` | `5` | Papers fetched per keyword |
+| `PAPER_TOTAL_CAP` | `20` | Hard cap on total papers per run |
+| `SEMANTIC_SCHOLAR_API_KEY` | *(empty)* | Optional Semantic Scholar API key sent as `x-api-key` |
+| `FEDERATION_FEEDS` | *(empty)* | Comma-separated external feed URLs |
+| `PUBLIC_FEED_MAX_ITEMS` | `20` | Max entries kept in `public_feed.json` |
+When `PAPER_SOURCE=semantic_scholar`, the watcher queries Semantic Scholar's Graph API and keeps only results that can be mapped back to an ArXiv identifier. That preserves compatibility with the rest of the pipeline, which still stores papers by `arxiv_id`.
+Legacy `ARXIV_KEYWORDS`, `ARXIV_MAX_RESULTS_PER_KEYWORD`, and `ARXIV_TOTAL_CAP` are still accepted for backward compatibility, but `PAPER_*` names are now canonical.
+## CI/CD Setup
+### 1. Fork the repository
+Click **Fork** on GitHub to create your own copy of this repository.
+### 2. Add the required secret
+The pipeline needs an OpenAI-compatible API key to run the LLM analyst step.
+1. Open your forked repository on GitHub.
+2. Go to **Settings → Secrets and variables → Actions**.
+3. Click **New repository secret**.
+4. Set **Name** to `LLM_API_KEY` and **Secret** to your API key (e.g. `sk-...`).
+5. Click **Add secret**.
+> **Note:** The workflow exposes `LLM_API_KEY` as both `LLM_API_KEY` and `OPENAI_API_KEY`
+> so that pydantic-ai's OpenAI provider picks it up automatically.
+### 3. (Optional) Override the model
+By default the pipeline uses `openai:gpt-4o-mini`.  To use a different model, add a
+second repository secret (or variable) named `LLM_MODEL` with the pydantic-ai model
+string, e.g. `openai:gpt-4o` or `anthropic:claude-3-5-haiku`.
+You can also set `LLM_MODEL` in the workflow's `env:` block directly if you prefer not
+to use a secret.
+### 4. Run the pipeline
+- **Scheduled:** the pipeline fires automatically every **Monday at 06:00 UTC**.
+- **Manual:** go to **Actions → Autonomous Research Tracker → Run workflow**, optionally
+  override `keywords`, `federation_feeds`, and `max_results` in the dispatch form.
+## The Hive Mind (Federation)
+Every successful run updates `public_feed.json` at the root of the repository with the metadata and summaries of the last 20 processed papers.
+To subscribe to another agent's feed, pass their raw `public_feed.json` URL:
+```bash
+export FEDERATION_FEEDS="https://raw.githubusercontent.com/alice/research-cruise/main/public_feed.json,https://raw.githubusercontent.com/bob/research-cruise/main/public_feed.json"
+python -m swarm_notes.main
+```
+Or set `federation_feeds` in the **workflow_dispatch** inputs.
+**Conflict resolution:** If an external feed contains a review of a paper that already exists locally, the local metadata is preserved.  The external summary is appended under a `### External Perspectives` section:
+```markdown
+### External Perspectives
+> "Transformers are over-engineered for this dataset." - @Agent_alice
+> *(Retrieved 2024-01-15)*
+```
+## Vault File Format
+Each paper note uses hybrid YAML frontmatter (CSL-compatible fields + custom fields):
+```yaml
+---
+# CSL-compatible fields
+title: "Attention Is All You Need"
+author:
+  - literal: "Ashish Vaswani"
+issued:
+  date-parts:
+    - [2017, 6, 12]
+url: "https://arxiv.org/abs/1706.03762"
+# Custom fields
+arxiv_id: "1706.03762"
+domain: "nlp"
+tags:
+  - "transformer"
+  - "attention-mechanism"
+architectures:
+  - "encoder-decoder"
+datasets:
+  - "WMT 2014"
+skill: "NLPSkill"
+processed_at: "2024-01-15T06:00:00Z"
+---
+```
+Body sections: **Summary**, **Key Contributions**, **Key Concepts** (with relative links to `../concepts/`), **Datasets**, **Limitations**, **Links**.
+## Taxonomy
+`taxonomy.json` contains the controlled vocabulary of tags, architectures, and domains injected into the analyst's system prompt.  This prevents LLM hallucination and keeps metadata consistent.  Edit `taxonomy.json` to add new terms.
+## License
+MIT — see [LICENSE](LICENSE).

swarm_notes-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,37 @@
+[project]
+name = "swarm-notes"
+version = "0.1.0"
+description = "Automated research paper tracking and knowledge synthesis"
+readme = "README.md"
+authors = [
+    { name = "LM", email = "hi@leima.is" }
+]
+requires-python = ">=3.11"
+dependencies = [
+    "beautifulsoup4>=4.14.3",
+    "pydantic>=2.12.5",
+    "pydantic-ai>=1.71.0",
+    "python-dotenv>=1.2.2",
+    "python-frontmatter>=1.1.0",
+    "pyyaml>=6.0.3",
+    "requests>=2.32.5",
+    "typer>=0.24.1",
+]
+[project.scripts]
+swarm-notes = "swarm_notes.main:app"
+[build-system]
+requires = ["uv_build>=0.8.15,<0.9.0"]
+build-backend = "uv_build"
+[dependency-groups]
+dev = [
+    "pytest>=9.0.2",
+]
+[tool.pytest.ini_options]
+pythonpath = ["src"]
+markers = [
+    "integration: mark a test as an integration test that hits external APIs."
+]

swarm_notes-0.1.0/src/swarm_notes/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """research-cruise agent package."""