iflow-mcp_hulupeep_ruvscan-mcp 0.5.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- iflow_mcp_hulupeep_ruvscan_mcp-0.5.0.dist-info/METADATA +1222 -0
- iflow_mcp_hulupeep_ruvscan_mcp-0.5.0.dist-info/RECORD +19 -0
- iflow_mcp_hulupeep_ruvscan_mcp-0.5.0.dist-info/WHEEL +4 -0
- iflow_mcp_hulupeep_ruvscan_mcp-0.5.0.dist-info/entry_points.txt +2 -0
- mcp/__init__.py +7 -0
- mcp/bindings/__init__.py +5 -0
- mcp/bindings/rust_client.py +166 -0
- mcp/endpoints/query.py +144 -0
- mcp/endpoints/scan.py +112 -0
- mcp/mcp_stdio_server.py +212 -0
- mcp/monitoring.py +126 -0
- mcp/reasoning/__init__.py +6 -0
- mcp/reasoning/embeddings.py +208 -0
- mcp/reasoning/fact_cache.py +212 -0
- mcp/reasoning/safla_agent.py +282 -0
- mcp/server.py +268 -0
- mcp/storage/__init__.py +21 -0
- mcp/storage/db.py +216 -0
- mcp/storage/models.py +77 -0
|
@@ -0,0 +1,1222 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: iflow-mcp_hulupeep_ruvscan-mcp
|
|
3
|
+
Version: 0.5.0
|
|
4
|
+
Summary: RuvScan MCP Server - Sublinear intelligence for GitHub discovery
|
|
5
|
+
Project-URL: Homepage, https://github.com/ruvnet/ruvscan
|
|
6
|
+
Project-URL: Documentation, https://github.com/ruvnet/ruvscan#readme
|
|
7
|
+
Project-URL: Repository, https://github.com/ruvnet/ruvscan
|
|
8
|
+
Project-URL: Issues, https://github.com/ruvnet/ruvscan/issues
|
|
9
|
+
Author-email: Ruvnet <support@ruvnet.ai>
|
|
10
|
+
License: MIT OR Apache-2.0
|
|
11
|
+
Keywords: ai,claude,github,mcp,semantic-search,sublinear
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
21
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
22
|
+
Requires-Python: >=3.10
|
|
23
|
+
Requires-Dist: fastapi>=0.109.0
|
|
24
|
+
Requires-Dist: httpx>=0.24.0
|
|
25
|
+
Requires-Dist: mcp[cli]>=1.2.0
|
|
26
|
+
Requires-Dist: pydantic>=2.5.0
|
|
27
|
+
Requires-Dist: python-dotenv>=1.0.0
|
|
28
|
+
Requires-Dist: uvicorn[standard]>=0.27.0
|
|
29
|
+
Provides-Extra: dev
|
|
30
|
+
Requires-Dist: black>=24.0.0; extra == 'dev'
|
|
31
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
|
|
32
|
+
Requires-Dist: pytest>=7.4.0; extra == 'dev'
|
|
33
|
+
Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
|
34
|
+
Description-Content-Type: text/markdown
|
|
35
|
+
|
|
36
|
+
# π§ RuvScan - MCP Server for Intelligent GitHub Discovery
|
|
37
|
+
|
|
38
|
+
[](LICENSE)
|
|
39
|
+
[](https://modelcontextprotocol.io)
|
|
40
|
+
[](https://www.python.org/downloads/)
|
|
41
|
+
[](https://pypi.org/project/ruvscan-mcp/)
|
|
42
|
+
[](docker-compose.yml)
|
|
43
|
+
|
|
44
|
+
> **Give Claude the power to discover GitHub tools with sublinear intelligence.**
|
|
45
|
+
|
|
46
|
+
RuvScan is a **Model Context Protocol (MCP) server** that connects to Claude Code CLI, Codex, and Claude Desktop. It turns GitHub into your AI's personal innovation scout β finding tools, frameworks, and solutions you'd never think to search for.
|
|
47
|
+
|
|
48
|
+
**Oh, it's a work in progress - so suggest changes to make it better.*
|
|
49
|
+
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
It comes packaged with RUVNET repo but you can add ANY other repo like Andrej Kaparthy's or other folks on the edge of what you are working on.
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## π― What Is This?
|
|
59
|
+
|
|
60
|
+
**A GitHub search that actually understands what you're trying to build.**
|
|
61
|
+
|
|
62
|
+
### The Problem
|
|
63
|
+
|
|
64
|
+
You're building something new (an app or feature). You know there's probably a library, framework, or algorithm out there that could 10Γ your project. But:
|
|
65
|
+
|
|
66
|
+
- π **Search is broken** - You'd have to know the exact keywords
|
|
67
|
+
- π **Too many options** - Millions of repos, most irrelevant
|
|
68
|
+
- π― **Wrong domain** - The best solution might be in a totally different field
|
|
69
|
+
- β° **Takes forever** - Hours of browsing docs and READMEs
|
|
70
|
+
|
|
71
|
+
### The Solution
|
|
72
|
+
|
|
73
|
+
**RuvScan thinks like a creative developer**, not a search engine:
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
You: "I'm building an AI app. Context recall is too slow."
|
|
77
|
+
|
|
78
|
+
RuvScan: "Here's a sublinear-time solver that could replace your
|
|
79
|
+
vector database queries. It's from scientific computing,
|
|
80
|
+
but the O(log n) algorithm applies perfectly to semantic
|
|
81
|
+
search. Here's how to integrate it..."
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**It finds:**
|
|
85
|
+
- β¨ **Outside-the-box solutions** - Tools from other domains that apply to yours
|
|
86
|
+
|
|
87
|
+
- β‘ **Performance wins** - Algorithms you didn't know existed
|
|
88
|
+
|
|
89
|
+
- π§ **Easy integration** - Tells you exactly how to use what it finds
|
|
90
|
+
|
|
91
|
+
- π§ **Creative transfers** - "This solved X, but you can use it for Y"
|
|
92
|
+
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
How you phrase your request helps the tool give you straightforward help or at the edge kind of solutions. Here are a few more examples of how you might phrase to show different solutions. (more examples further on)
|
|
96
|
+
|
|
97
|
+
### Example requests
|
|
98
|
+
|
|
99
|
+
The actual response will be in understandable plain English while suggesting state of the art.
|
|
100
|
+
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
1. *βI just want a drop-in script that downloads my inbox and saves each email as JSONβwhat should I try?*β β byroot/mail or DusanKasan/parsemail for dead-simple IMAP/MIME to
|
|
104
|
+
structured JSON.
|
|
105
|
+
2. *βGive me a starter repo that already watches Gmail and writes summaries to a Notion page.β* β openai/gpt-email-summarizer-style templates or lucasmic/imap-to-webhook for plug-and-
|
|
106
|
+
play workflows.
|
|
107
|
+
3. *βShow me open-source email parsers I can drop into a Python summarizerβIMAP fetch, MIME decoding, nothing fancy.β* β DusanKasan/parsemail or inboxkitten/mail-parser for turnkey
|
|
108
|
+
IMAP/MIME handling.
|
|
109
|
+
4. β*Iβm summarizing email on cheap Chromebooks. Which repos include tiny embeddings or approximate search so I can stay under 1β―GB RAM?β* β ruvnet/sublinear-time-solver or facebook/faiss-lite to slot in sublinear similarity on low-RAM hardware.
|
|
110
|
+
5. β*Need policy/compliance topic detectors with clear audit trails. Point me to rule-based or interpretable NLP projects built for email streams.β* β ruvnet/FACT plus
|
|
111
|
+
CaselawAccessProject/legal-topic-models for deterministic caching plus transparent classifiers.
|
|
112
|
+
6. **βMy pipeline can only see messages once. Find streaming or incremental NLP algorithms (reservoir sampling, online transformers, CRDT logs) that pair well with an email*
|
|
113
|
+
summarizer.β* β ruvnet/MidStream or openmessaging/stream-query for single-pass, reservoir-style processing.
|
|
114
|
+
7. β*Newsletters are 90β―% of my inbox. Recommend DOM-first or layout-aware extraction toolkits I can chain before summarization so tables and sections survive.β* β postlight/mercury-
|
|
115
|
+
parser or mozilla/readability to strip and structure HTML before summarizing.
|
|
116
|
+
8. *βLegal demands reproducible summaries. Surface repos that memoize LLM calls (FACT-style hashing, deterministic agents) so the same thread always yields the same text.β* β ruvnet/
|
|
117
|
+
FACT or explosion/spaCy-ray patterns that hash embeddings/results for audit trails.
|
|
118
|
+
9. **βIβm willing to repurpose exotic toolingβsublinear solvers, sparse matrix DOM walkers, flow-based streaming enginesβif you can explain how theyβd accelerate large-scale email*
|
|
119
|
+
summarization. What should I investigate?β* β ruvnet/sublinear-time-solver (DOM walker mode), apache/arrow (columnar email batches), and ruvnet/flow-nexus (cost-propagation for batched summarization) as creative transfers.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## β‘ Install in 30ish Seconds
|
|
124
|
+
|
|
125
|
+
RuvScan works with **Claude Code CLI**, **Codex CLI**, and **Claude Desktop**. Pick your platform:
|
|
126
|
+
|
|
127
|
+
|
|
128
|
+
|
|
129
|
+
Note: TWO Things need to happen to have this working.
|
|
130
|
+
|
|
131
|
+
1. The BACKEND (docker ) must be running in a separate terminal window and
|
|
132
|
+
2. The MCP needs to be added to your CLI or claude
|
|
133
|
+
3. 2 After installing do /MCP and check if it is installed correctly (you will see an x or worse, no tools showing). If either are true, just ask claude - hey fix my ruvscan mcp server.
|
|
134
|
+
|
|
135
|
+
### For Claude Code CLI
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
# 1. Start RuvScan backend
|
|
139
|
+
git clone https://github.com/ruvnet/ruvscan.git && cd ruvscan
|
|
140
|
+
docker compose up -d
|
|
141
|
+
|
|
142
|
+
# 2. Add MCP server to Claude
|
|
143
|
+
claude mcp add ruvscan --scope user --env GITHUB_TOKEN=ghp_your_token -- uvx ruvscan-mcp
|
|
144
|
+
|
|
145
|
+
# 3. Start using it!
|
|
146
|
+
claude
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### For Codex CLI (Quick Install)
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
# 1. Start RuvScan backend
|
|
153
|
+
git clone https://github.com/ruvnet/ruvscan.git && cd ruvscan
|
|
154
|
+
docker compose up -d
|
|
155
|
+
|
|
156
|
+
# 2. Install globally with pipx
|
|
157
|
+
pipx install -e .
|
|
158
|
+
|
|
159
|
+
# 3. Configure in ~/.codex/config.toml
|
|
160
|
+
# See "For Codex CLI" section below for configuration details
|
|
161
|
+
|
|
162
|
+
# 4. Start using it!
|
|
163
|
+
codex
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
> βΉοΈ **GitHub personal access token required.** RuvScan calls the GitHub API heavily; without a token you will immediately hit anonymous rate limits and scans will fail. Create a fine-grained or classic token with `repo` (read) and `read:org` scope, then expose it as `GITHUB_TOKEN` everywhere you run the MCP client and backend.
|
|
167
|
+
|
|
168
|
+
### For Claude Desktop
|
|
169
|
+
|
|
170
|
+
**1. Start the backend:**
|
|
171
|
+
```bash
|
|
172
|
+
git clone https://github.com/ruvnet/ruvscan.git && cd ruvscan
|
|
173
|
+
docker compose up -d
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
**2. Add to config** (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
|
|
177
|
+
```json
|
|
178
|
+
{
|
|
179
|
+
"mcpServers": {
|
|
180
|
+
"ruvscan": {
|
|
181
|
+
"command": "uvx",
|
|
182
|
+
"args": ["ruvscan-mcp"],
|
|
183
|
+
"env": {
|
|
184
|
+
"GITHUB_TOKEN": "ghp_your_github_token_here"
|
|
185
|
+
}
|
|
186
|
+
}
|
|
187
|
+
}
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
**3. Restart Claude Desktop** (Cmd+Q and reopen)
|
|
192
|
+
|
|
193
|
+
### For Codex CLI
|
|
194
|
+
|
|
195
|
+
Codex CLI speaks the same MCP protocol. After starting the Docker backend:
|
|
196
|
+
|
|
197
|
+
**Step 1: Install RuvScan globally with pipx**
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
cd ruvscan
|
|
201
|
+
pipx install -e .
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Step 2: Configure Codex**
|
|
205
|
+
|
|
206
|
+
Edit `~/.codex/config.toml` and add:
|
|
207
|
+
|
|
208
|
+
```toml
|
|
209
|
+
[mcp_servers.ruvscan]
|
|
210
|
+
command = "ruvscan-mcp"
|
|
211
|
+
|
|
212
|
+
[mcp_servers.ruvscan.env]
|
|
213
|
+
GITHUB_TOKEN = "ghp_your_github_token_here"
|
|
214
|
+
RUVSCAN_API_URL = "http://localhost:8000"
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
**Step 3: Test it works**
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
# From any directory
|
|
221
|
+
cd /tmp
|
|
222
|
+
codex mcp list | grep ruvscan
|
|
223
|
+
# Should show: ruvscan ruvscan-mcp - GITHUB_TOKEN=*****, RUVSCAN_API_URL=***** - enabled
|
|
224
|
+
|
|
225
|
+
# Start a conversation
|
|
226
|
+
codex
|
|
227
|
+
> Can you scan the anthropics GitHub organization?
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
> β
**Global Installation**: RuvScan is now available in ALL projects and directories!
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
#### Alternative: Using codex mcp add (if available)
|
|
235
|
+
|
|
236
|
+
If your Codex build includes the `mcp add` command:
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
codex mcp add --env GITHUB_TOKEN=ghp_your_token --env RUVSCAN_API_URL=http://localhost:8000 -- ruvscan-mcp ruvscan
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
> π§ͺ When experimenting with `mcp dev`, run `mcp dev --transport sse src/ruvscan_mcp/mcp_stdio_server.py`.
|
|
243
|
+
> The server now performs a health check and shuts down with a clear explanation if no client completes the handshake within five minutes (for example, when the transport is mismatched).
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
#### Troubleshooting Codex CLI
|
|
248
|
+
|
|
249
|
+
**Check MCP server status:**
|
|
250
|
+
```bash
|
|
251
|
+
codex mcp list
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Verify command exists:**
|
|
255
|
+
```bash
|
|
256
|
+
which ruvscan-mcp
|
|
257
|
+
# Should output: /home/your-user/.local/bin/ruvscan-mcp
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**Test command directly:**
|
|
261
|
+
```bash
|
|
262
|
+
ruvscan-mcp --help
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
**View Codex logs:**
|
|
266
|
+
```bash
|
|
267
|
+
tail -f ~/.codex/log/codex-tui.log
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
π **Detailed Codex Setup Guide:** [docs/CODEX_CLI_SETUP.md](docs/CODEX_CLI_SETUP.md)
|
|
271
|
+
|
|
272
|
+
### GitHub Token Checklist
|
|
273
|
+
|
|
274
|
+
- Create a personal access token (classic or fine-grained) with read access to the repos you care about plus `read:org`. GitHubβs walkthrough lives here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic
|
|
275
|
+
- Export it in your shell (`export GITHUB_TOKEN=ghp_...`) before running `docker compose`, `uvicorn`, or `codex/claude mcp add` so the backend can authenticate API calls.
|
|
276
|
+
- For Docker-based runs, copy `.env.example` to `.env` and drop the token there so the containers inherit it.
|
|
277
|
+
- Optionally add the same value to `.env.local`; `scripts/seed_database.py` will pick it up automatically when seeding.
|
|
278
|
+
- **Cost:** GitHub does not charge for issuing or using a PAT. Your scans only consume API rate quota on the account that created the token; standard rate limits refresh hourly. If you're on an enterprise plan, the usage just rolls into the org's normal API allowances.
|
|
279
|
+
- Treat the token like a password. Store it in your secret manager and revoke it from https://github.com/settings/tokens if it ever leaks.
|
|
280
|
+
|
|
281
|
+
### What `docker compose up` Runs
|
|
282
|
+
|
|
283
|
+
- `mcp-server` (Python/FastAPI) β hosts the MCP HTTP API on port 8000, reads `GITHUB_TOKEN`, writes data to `./data/ruvscan.db`, and exposes `/scan`, `/query`, `/compare`, and `/analyze` endpoints.
|
|
284
|
+
- `scanner` (Go) β background workers (port 8081 on the host β 8080 in-container) that call the GitHub REST API, fetch README/topic metadata, and POST results back to the MCP server at `/ingest`.
|
|
285
|
+
- `rust-engine` (Rust) β optional gRPC service for JohnsonβLindenstrauss O(log n) similarity; disabled by default and only launched when you run `docker compose --profile rust-debug up`.
|
|
286
|
+
- Shared volumes β `./data` and `./logs` are bind-mounted so your SQLite DB and logs persist across container restarts.
|
|
287
|
+
|
|
288
|
+
**π Full Installation Guide:** [docs/MCP_INSTALL.md](docs/MCP_INSTALL.md)
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## π± Sample Data & Optional Seeding
|
|
293
|
+
|
|
294
|
+
Out of the box, RuvScan already includes a `data/ruvscan.db` file packed with ~100 public repositories from the **ruvnet** organization. That means a fresh clone can answer questions like βWhat do we have for real-time streaming?β as soon as the MCP server startsβno extra steps required.
|
|
295
|
+
|
|
296
|
+
### When would I run the seed script?
|
|
297
|
+
|
|
298
|
+
- **Refresh the included catalog** (pick up new ruvnet repos or README changes).
|
|
299
|
+
- **Add another user/org** so your local MCP knows about your own code.
|
|
300
|
+
- **Rebuild the database** after deleting `data/ruvscan.db`.
|
|
301
|
+
|
|
302
|
+
```bash
|
|
303
|
+
# Refresh the bundled ruvnet dataset
|
|
304
|
+
python3 scripts/seed_database.py --org ruvnet
|
|
305
|
+
|
|
306
|
+
# Add a different org or user (ex. OpenAI)
|
|
307
|
+
python3 scripts/seed_database.py --org openai --limit 30
|
|
308
|
+
|
|
309
|
+
# Skip README downloads for a quick metadata-only pass
|
|
310
|
+
python3 scripts/seed_database.py --no-readmes
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
Prefer clicks over scripts? Tell your MCP client:
|
|
314
|
+
- **Claude / Codex prompt:** βUse scan_github on org anthropics with a limit of 25.β
|
|
315
|
+
- **CLI:** `./scripts/ruvscan scan org anthropics --limit 25`
|
|
316
|
+
|
|
317
|
+
Either route stores the new repos alongside the preloaded ruvnet entries so every future query can reference them.
|
|
318
|
+
|
|
319
|
+
**Check what's inside:**
|
|
320
|
+
```bash
|
|
321
|
+
sqlite3 data/ruvscan.db "SELECT COUNT(*), MIN(org), MAX(org) FROM repos;"
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
### What does RuvScan store locally?
|
|
325
|
+
|
|
326
|
+
- Everything lives in the `data/ruvscan.db` SQLite file. Each row captures the repoβs owner, name, description, topics, README text, star count, primary language, and the `last_scan` timestamp so we know when it was fetched.
|
|
327
|
+
- The MCP tools only read from this file; the only way new repos show up is when you seed or run a `scan_github` command (either via CLI or Claude).
|
|
328
|
+
- No background internet crawling happens after a scan completesβwhat you see is exactly whatβs stored in SQLite.
|
|
329
|
+
|
|
330
|
+
### How do I see which repos are cached?
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
# Show every org/user currently in the catalog
|
|
334
|
+
sqlite3 data/ruvscan.db "
|
|
335
|
+
SELECT org, COUNT(*) AS repos
|
|
336
|
+
FROM repos
|
|
337
|
+
GROUP BY org
|
|
338
|
+
ORDER BY repos DESC;"
|
|
339
|
+
|
|
340
|
+
# Peek at the latest entries to confirm what's fresh
|
|
341
|
+
sqlite3 data/ruvscan.db "
|
|
342
|
+
SELECT full_name, stars, datetime(last_scan) AS last_seen
|
|
343
|
+
FROM repos
|
|
344
|
+
ORDER BY last_scan DESC
|
|
345
|
+
LIMIT 10;"
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
Prefer a friendlier view? Run `./scripts/ruvscan cards --limit 20` to list the top cached repos with summaries.
|
|
349
|
+
|
|
350
|
+
### How do I wipe the catalog and start over?
|
|
351
|
+
|
|
352
|
+
1. Stop whatever is talking to RuvScan (`docker compose down` or CtrlβC the dev server).
|
|
353
|
+
2. (Optional) Back up the old database: `cp data/ruvscan.db data/ruvscan.db.bak`.
|
|
354
|
+
3. Remove the file: `rm -f data/ruvscan.db`.
|
|
355
|
+
4. Seed again with whatever scope you want:
|
|
356
|
+
|
|
357
|
+
```bash
|
|
358
|
+
python3 scripts/seed_database.py --org ruvnet --limit 100
|
|
359
|
+
# or
|
|
360
|
+
./scripts/ruvscan scan org my-company --limit 50
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
Reβstart the MCP server and it will only know about the repos you just seeded or scanned.
|
|
364
|
+
|
|
365
|
+
β οΈ **Reminder:** the database keeps `last_scan` timestamps. Updating the same org simply refreshes the rows instead of duplicating them. If you rely on the bundled sample data, consider re-running the refresh monthly so the catalog stays current.
|
|
366
|
+
|
|
367
|
+
π **Full Guide:** [Database Seeding Documentation](docs/DATABASE_SEEDING.md)
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## π€ How RuvScan Suggests Some Tools (and Skips Others)
|
|
372
|
+
|
|
373
|
+
RuvScan scores every cached repository against your intent using three simple signals:
|
|
374
|
+
|
|
375
|
+
1. **Token overlap** β does the repo description/README mention the same concepts you typed?
|
|
376
|
+
2. **Efficiency boost** β extra credit for words like βoptimize,β βstreaming,β βsublinear,β etc.
|
|
377
|
+
3. **Reality check** β star count and recent scans nudge mature, maintained projects upward.
|
|
378
|
+
|
|
379
|
+
The goal is to surface repos that obviously help without making you stretch too far.
|
|
380
|
+
|
|
381
|
+
### Real example: βScan email for policy updatesβ
|
|
382
|
+
|
|
383
|
+
- **Your ask:** βBuild a tool that scans incoming email for important policy updates and compliance requirements.β
|
|
384
|
+
- **What surfaced:** `freeCodeCamp/mail-for-good`, `DusanKasan/parsemail`, `ruvnet/FACT`, etc. Those repos talk about *email parsing*, *campaign pipelines*, and *deterministic summaries*βkeywords that overlap the request almost perfectly.
|
|
385
|
+
- **What you might have expected:** `ruvnet/sublinear-time-solver` (which includes a DOM extractor that could chew through large HTML archives).
|
|
386
|
+
- **Why it was skipped:** the solverβs README highlights *JohnsonβLindenstrauss projection*, *sparse matrix solvers*, and *Flow-Nexus streaming*. None of those tokens match βemail,β βpolicy,β or βcompliance,β so its overlap score stayed below the default `min_score=0.6`. RuvScan saw it as βclever infrastructure, but unrelated to your words,β so it deferred to mail-focused repos.
|
|
387
|
+
|
|
388
|
+
### How to explore outside-the-box options
|
|
389
|
+
|
|
390
|
+
- **Nudge the intent:** mention the bridge explicitly (ββ¦or should I repurpose sublinear-time-solverβs DOM tool for compliance emails?β). Now the tokenizer sees βsublinearβ and βDOM,β boosting that repo.
|
|
391
|
+
- **Lower the threshold:** call `query_leverage` with `min_score=0.4` and `max_results=10` to let more fringe ideas through.
|
|
392
|
+
- **Widen the context:** add an engineering note or PRD link so the SAFLA reasoning layer understands why a matrix solver might help an email scanner.
|
|
393
|
+
|
|
394
|
+
By default, RuvScan errs on the side of *obvious fit*. If you want it to wander into βthis sounds weird but might workβ territory, just give it permission with a hint or a looser score cutoff.
|
|
395
|
+
|
|
396
|
+
---
|
|
397
|
+
|
|
398
|
+
## π¬ Using RuvScan in Claude
|
|
399
|
+
|
|
400
|
+
Once installed, just talk to Claude naturally:
|
|
401
|
+
|
|
402
|
+
### Example 1: Scan GitHub Organizations
|
|
403
|
+
|
|
404
|
+
**You:** "Scan the Anthropics GitHub organization"
|
|
405
|
+
|
|
406
|
+
**Claude:** *Uses `scan_github` tool*
|
|
407
|
+
```
|
|
408
|
+
Scan initiated for org: anthropics
|
|
409
|
+
Status: initiated
|
|
410
|
+
Estimated repositories: 50
|
|
411
|
+
Message: Scan initiated - workers processing in background
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
### Example 2: Make Reasoning Reproducible
|
|
415
|
+
|
|
416
|
+
**You:** "I need to debug why my agent made a decision yesterday. Any deterministic tooling?"
|
|
417
|
+
|
|
418
|
+
**Claude:** *Uses `query_leverage` and surfaces FACT*
|
|
419
|
+
|
|
420
|
+
```
|
|
421
|
+
Repository: ruvnet/FACT
|
|
422
|
+
Relevance Score: 0.89
|
|
423
|
+
Complexity: O(1)
|
|
424
|
+
|
|
425
|
+
Summary: Deterministic caching framework that replays every LLM call with SHA256 hashes.
|
|
426
|
+
|
|
427
|
+
Why This Helps: Guarantees identical outputs for the same prompts, letting you trace agent decisions step by step.
|
|
428
|
+
|
|
429
|
+
How to Use: pip install fact-cache && from fact import FACTCache
|
|
430
|
+
|
|
431
|
+
Capabilities: Deterministic replay, prompt hashing, audit trails
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
### Example 3: Compare Frameworks
|
|
435
|
+
|
|
436
|
+
**You:** "Compare facebook/react and vuejs/core for me"
|
|
437
|
+
|
|
438
|
+
**Claude:** *Uses `compare_repositories` tool*
|
|
439
|
+
```
|
|
440
|
+
Repository Comparison (O(log n) complexity)
|
|
441
|
+
|
|
442
|
+
facebook/react vs vuejs/core
|
|
443
|
+
|
|
444
|
+
Similarity Score: 0.78
|
|
445
|
+
Complexity: O(log n)
|
|
446
|
+
|
|
447
|
+
Analysis: Both are component-based UI frameworks with virtual DOM, but React
|
|
448
|
+
has larger ecosystem and more enterprise adoption. Vue has simpler learning
|
|
449
|
+
curve and better built-in state management.
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
### Example 4: Understand the Reasoning
|
|
453
|
+
|
|
454
|
+
**You:** "Show me the reasoning chain for why you recommended that solver"
|
|
455
|
+
|
|
456
|
+
**Claude:** *Uses `analyze_reasoning` tool*
|
|
457
|
+
```
|
|
458
|
+
Reasoning Chain for ruvnet/sublinear-time-solver:
|
|
459
|
+
|
|
460
|
+
- Detected performance optimization intent
|
|
461
|
+
- Matched O(log n) complexity with vector search problem
|
|
462
|
+
- Found Johnson-Lindenstrauss dimension reduction capability
|
|
463
|
+
- Cross-domain transfer from scientific computing to AI/ML
|
|
464
|
+
- Verified WASM support for browser integration
|
|
465
|
+
|
|
466
|
+
(Retrieved from FACT deterministic cache)
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
### Example 5: Mine Existing Ruvnet Stacks
|
|
470
|
+
|
|
471
|
+
**You:** "I already have the ruvnet repos seeded. What should I reuse for real-time streaming?"
|
|
472
|
+
|
|
473
|
+
**Claude:** *Calls `query_leverage` and surfaces existing entries*
|
|
474
|
+
```
|
|
475
|
+
Repository: ruvnet/MidStream
|
|
476
|
+
Relevance Score: 0.91
|
|
477
|
+
|
|
478
|
+
Summary: WASM-accelerated multiplexing layer for realtime inference
|
|
479
|
+
|
|
480
|
+
Why This Helps: Drop it in front of your LangChain stack to swap synchronous
|
|
481
|
+
requests for bidirectional streams. Built to pair with sublinear-time-solver.
|
|
482
|
+
|
|
483
|
+
How to Use: docker pull ghcr.io/ruvnet/midstream:latest
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
---
|
|
487
|
+
|
|
488
|
+
## π What Can You Build With This?
|
|
489
|
+
|
|
490
|
+
RuvScan powers **3 types of killer tools**:
|
|
491
|
+
|
|
492
|
+
### 1. ποΈ Builder Co-Pilot (IDE Integration)
|
|
493
|
+
|
|
494
|
+
**Imagine**: Your code editor that suggests relevant libraries as you type.
|
|
495
|
+
|
|
496
|
+
```javascript
|
|
497
|
+
// You're writing:
|
|
498
|
+
async function improveContextRetrieval(query) {
|
|
499
|
+
// ...
|
|
500
|
+
}
|
|
501
|
+
|
|
502
|
+
// RuvScan suggests:
|
|
503
|
+
π‘ Found: sublinear-time-solver
|
|
504
|
+
"Replace linear search with O(log n) similarity"
|
|
505
|
+
Relevance: 0.94 | Integration: 2 minutes
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
**Use Cases**:
|
|
509
|
+
- VS Code extension
|
|
510
|
+
- Cursor integration
|
|
511
|
+
- GitHub Copilot alternative
|
|
512
|
+
- JetBrains plugin
|
|
513
|
+
|
|
514
|
+
### 2. π€ AI Agent Intelligence Layer
|
|
515
|
+
|
|
516
|
+
**Imagine**: Your AI agents that automatically discover and integrate new tools.
|
|
517
|
+
|
|
518
|
+
```python
|
|
519
|
+
# Your AI agent:
|
|
520
|
+
agent.goal("Optimize database queries")
|
|
521
|
+
|
|
522
|
+
# RuvScan finds and explains:
|
|
523
|
+
{
|
|
524
|
+
"tool": "cached-sublinear-solver",
|
|
525
|
+
"why": "Replace O(nΒ²) joins with O(log n) approximations",
|
|
526
|
+
"how": "pip install sublinear-solver && ..."
|
|
527
|
+
}
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
**Use Cases**:
|
|
531
|
+
- Autonomous coding agents
|
|
532
|
+
- DevOps automation
|
|
533
|
+
- System optimization bots
|
|
534
|
+
- Research assistants
|
|
535
|
+
|
|
536
|
+
### 3. π Discovery Engine (Product/Research)
|
|
537
|
+
|
|
538
|
+
**Imagine**: A tool that finds innovation opportunities across your entire tech stack.
|
|
539
|
+
|
|
540
|
+
```bash
|
|
541
|
+
$ ruvscan scan --org mycompany
|
|
542
|
+
$ ruvscan query "What could 10Γ our ML pipeline?"
|
|
543
|
+
|
|
544
|
+
Found 8 leverage opportunities:
|
|
545
|
+
1. Replace sklearn with sublinear solver (600Γ faster)
|
|
546
|
+
2. Use MidStream for real-time inference (80% cost savings)
|
|
547
|
+
3. ...
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
**Use Cases**:
|
|
551
|
+
- Tech stack audits
|
|
552
|
+
- Performance optimization hunts
|
|
553
|
+
- Architecture reviews
|
|
554
|
+
- Competitive research
|
|
555
|
+
|
|
556
|
+
---
|
|
557
|
+
|
|
558
|
+
## π οΈ What Tools Does Claude Get?
|
|
559
|
+
|
|
560
|
+
When you install RuvScan as an MCP server, Claude gains 4 powerful tools:
|
|
561
|
+
|
|
562
|
+
| Tool | What It Does | Example Use |
|
|
563
|
+
|------|--------------|-------------|
|
|
564
|
+
| **`scan_github`** | Scan any GitHub org, user, or topic | "Scan the openai organization" |
|
|
565
|
+
| **`query_leverage`** | Find relevant tools with O(log n) semantic search | "Find tools for real-time collaboration" |
|
|
566
|
+
| **`compare_repositories`** | Compare repos with sublinear similarity | "Compare NextJS vs Remix" |
|
|
567
|
+
| **`analyze_reasoning`** | View FACT cache reasoning chains | "Why did you recommend that library?" |
|
|
568
|
+
|
|
569
|
+
---
|
|
570
|
+
|
|
571
|
+
**What's new:**
|
|
572
|
+
|
|
573
|
+
- RuvScan now fetches up to 200 repositories per scan, starting with a fast README sweep before deeper analysis.
|
|
574
|
+
- The first time the MCP server starts it automatically preloads the entire `ruvnet` organization, so you can ask questions immediately.
|
|
575
|
+
- Query responses include a concise summary and a structured Markdown briefing that highlights the opportunity, expected benefit, and integration path for each recommendation.
|
|
576
|
+
- Every answer reminds you to share a Product Requirements Document (PRD) or similar artifact so the follow-up analysis can be even more specific.
|
|
577
|
+
- The server now performs a health check and shuts down with a clear explanation if no client completes the handshake within five minutes (for example, when the transport is mismatched). This prevents the server from hanging silently when run with the wrong transport (for example, `mcp dev` without `--transport sse`) or when the backend API is unreachable.
|
|
578
|
+
|
|
579
|
+
## π¬ Demo: Complete Workflow
|
|
580
|
+
|
|
581
|
+
### In Claude Code CLI
|
|
582
|
+
|
|
583
|
+
```bash
|
|
584
|
+
$ claude
|
|
585
|
+
|
|
586
|
+
You: I'm working on a Python project that processes large datasets.
|
|
587
|
+
The performance is terrible. What GitHub tools could help?
|
|
588
|
+
|
|
589
|
+
Claude: Let me search for high-performance data processing tools...
|
|
590
|
+
[Uses query_leverage tool]
|
|
591
|
+
|
|
592
|
+
I found several relevant projects:
|
|
593
|
+
|
|
594
|
+
1. ruvnet/sublinear-time-solver (Relevance: 0.94)
|
|
595
|
+
- TRUE O(log n) algorithms for matrix operations
|
|
596
|
+
- Could replace your O(nΒ²) operations with O(log n)
|
|
597
|
+
- Install: pip install sublinear-solver
|
|
598
|
+
|
|
599
|
+
2. apache/arrow (Relevance: 0.88)
|
|
600
|
+
- Columnar data format for fast analytics
|
|
601
|
+
- 100Γ faster than pandas for large datasets
|
|
602
|
+
|
|
603
|
+
Would you like me to scan the Apache organization to find more tools?
|
|
604
|
+
|
|
605
|
+
You: Yes, scan the apache organization
|
|
606
|
+
|
|
607
|
+
Claude: [Uses scan_github tool]
|
|
608
|
+
Scanning Apache Foundation repositories...
|
|
609
|
+
Found 150+ repositories. Indexing them now.
|
|
610
|
+
```
|
|
611
|
+
|
|
612
|
+
### In Claude Desktop
|
|
613
|
+
|
|
614
|
+
<img src="https://via.placeholder.com/800x400/1e1e1e/00ff00?text=Claude+Desktop+Screenshot" alt="Claude Desktop with RuvScan" />
|
|
615
|
+
|
|
616
|
+
1. Open Claude Desktop
|
|
617
|
+
2. See the tools icon (π§) showing RuvScan is connected
|
|
618
|
+
3. Ask questions naturally - Claude uses RuvScan automatically
|
|
619
|
+
4. Get intelligent suggestions with reasoning chains
|
|
620
|
+
|
|
621
|
+
---
|
|
622
|
+
|
|
623
|
+
## β‘ Alternative: Run as Standalone API (2 Minutes)
|
|
624
|
+
|
|
625
|
+
### Option 1: Docker (For Direct API Use)
|
|
626
|
+
|
|
627
|
+
```bash
|
|
628
|
+
# 1. Clone and setup
|
|
629
|
+
git clone https://github.com/ruvnet/ruvscan.git
|
|
630
|
+
cd ruvscan
|
|
631
|
+
cp .env.example .env
|
|
632
|
+
|
|
633
|
+
# 2. Add your GitHub token to .env
|
|
634
|
+
# GITHUB_TOKEN=ghp_your_token_here
|
|
635
|
+
|
|
636
|
+
# 3. Start everything
|
|
637
|
+
docker compose up -d
|
|
638
|
+
|
|
639
|
+
# 4. Try it!
|
|
640
|
+
./scripts/ruvscan query "Find tools for real-time AI performance"
|
|
641
|
+
```
|
|
642
|
+
|
|
643
|
+
### Option 2: Direct HTTP API
|
|
644
|
+
|
|
645
|
+
```bash
|
|
646
|
+
# Query for leverage
|
|
647
|
+
curl -X POST http://localhost:8000/query \
|
|
648
|
+
-H "Content-Type: application/json" \
|
|
649
|
+
-d '{
|
|
650
|
+
"intent": "How can I speed up my vector database?",
|
|
651
|
+
"max_results": 5
|
|
652
|
+
}'
|
|
653
|
+
```
|
|
654
|
+
|
|
655
|
+
### Option 3: Python Integration
|
|
656
|
+
|
|
657
|
+
```python
|
|
658
|
+
import httpx
|
|
659
|
+
|
|
660
|
+
async def find_leverage(what_you_are_building):
|
|
661
|
+
async with httpx.AsyncClient() as client:
|
|
662
|
+
response = await client.post(
|
|
663
|
+
"http://localhost:8000/query",
|
|
664
|
+
json={"intent": what_you_are_building}
|
|
665
|
+
)
|
|
666
|
+
return response.json()
|
|
667
|
+
|
|
668
|
+
# Use it
|
|
669
|
+
ideas = await find_leverage(
|
|
670
|
+
"Building a real-time collaboration editor"
|
|
671
|
+
)
|
|
672
|
+
|
|
673
|
+
for idea in ideas:
|
|
674
|
+
print(f"π‘ {idea['repo']}")
|
|
675
|
+
print(f" {idea['outside_box_reasoning']}")
|
|
676
|
+
print(f" Integration: {idea['integration_hint']}")
|
|
677
|
+
```
|
|
678
|
+
|
|
679
|
+
---
|
|
680
|
+
|
|
681
|
+
## π¨ Real-World Examples
|
|
682
|
+
|
|
683
|
+
### Example 1: Performance Optimization
|
|
684
|
+
|
|
685
|
+
**You ask:**
|
|
686
|
+
```
|
|
687
|
+
"Pandas melts when I process multi-GB analytics data. I need something columnar."
|
|
688
|
+
```
|
|
689
|
+
|
|
690
|
+
**RuvScan finds:**
|
|
691
|
+
```json
|
|
692
|
+
{
|
|
693
|
+
"repo": "apache/arrow",
|
|
694
|
+
"outside_box_reasoning": "Arrow gives you a columnar in-memory format with
|
|
695
|
+
vectorized kernels. Swap it in to keep data compressed on the wire and
|
|
696
|
+
eliminate Python GIL bottlenecks.",
|
|
697
|
+
"integration_hint": "pip install pyarrow && use datasets.to_table()"
|
|
698
|
+
}
|
|
699
|
+
```
|
|
700
|
+
|
|
701
|
+
### Example 2: Architecture Discovery
|
|
702
|
+
|
|
703
|
+
**You ask:**
|
|
704
|
+
```
|
|
705
|
+
"Need a way to replay AI reasoning for debugging."
|
|
706
|
+
```
|
|
707
|
+
|
|
708
|
+
**RuvScan finds:**
|
|
709
|
+
```json
|
|
710
|
+
{
|
|
711
|
+
"repo": "ruvnet/FACT",
|
|
712
|
+
"outside_box_reasoning": "FACT caches every LLM interaction
|
|
713
|
+
with deterministic hashing. Replay any conversation
|
|
714
|
+
exactly as it happened. Built for reproducible AI.",
|
|
715
|
+
"integration_hint": "from fact import FACTCache;
|
|
716
|
+
cache = FACTCache()"
|
|
717
|
+
}
|
|
718
|
+
```
|
|
719
|
+
|
|
720
|
+
### Example 3: Domain Transfer
|
|
721
|
+
|
|
722
|
+
**You ask:**
|
|
723
|
+
```
|
|
724
|
+
"Building a recommendation system. Need fast similarity."
|
|
725
|
+
```
|
|
726
|
+
|
|
727
|
+
**RuvScan finds:**
|
|
728
|
+
```json
|
|
729
|
+
{
|
|
730
|
+
"repo": "scientific-computing/spectral-graph",
|
|
731
|
+
"outside_box_reasoning": "This is from bioinformatics,
|
|
732
|
+
but the spectral clustering algorithm works perfectly
|
|
733
|
+
for collaborative filtering. O(n log n) vs O(nΒ²).",
|
|
734
|
+
"integration_hint": "Adapt the adjacency matrix code
|
|
735
|
+
to your user-item matrix"
|
|
736
|
+
}
|
|
737
|
+
```
|
|
738
|
+
|
|
739
|
+
---
|
|
740
|
+
|
|
741
|
+
## π₯ Why RuvScan Is Different
|
|
742
|
+
|
|
743
|
+
### Traditional Search
|
|
744
|
+
```
|
|
745
|
+
You β "vector database speed" β GitHub
|
|
746
|
+
Results: 10,000 vector DB libraries
|
|
747
|
+
Problem: You already KNEW about vector databases
|
|
748
|
+
```
|
|
749
|
+
|
|
750
|
+
### RuvScan
|
|
751
|
+
```
|
|
752
|
+
You β "My vector DB is slow" β RuvScan
|
|
753
|
+
Results: Sublinear algorithms, compression techniques,
|
|
754
|
+
caching strategies from OTHER domains
|
|
755
|
+
Problem: SOLVED with ideas you'd never have found
|
|
756
|
+
```
|
|
757
|
+
|
|
758
|
+
**The secret**: RuvScan uses:
|
|
759
|
+
- π§ **Semantic understanding** (not keyword matching)
|
|
760
|
+
- π **Cross-domain reasoning** (finds solutions from other fields)
|
|
761
|
+
- β‘ **Sublinear algorithms** (TRUE O(log n) similarity search)
|
|
762
|
+
- π― **Deterministic AI** (same question = same answer, always)
|
|
763
|
+
|
|
764
|
+
---
|
|
765
|
+
|
|
766
|
+
## π For Engineers: How It Works
|
|
767
|
+
|
|
768
|
+
Now let's get technical...
|
|
769
|
+
|
|
770
|
+
### Architecture: Tri-Language Hybrid System
|
|
771
|
+
|
|
772
|
+
RuvScan is built as a **hybrid intelligence system** combining:
|
|
773
|
+
|
|
774
|
+
```
|
|
775
|
+
π Python β MCP Orchestrator (FastAPI)
|
|
776
|
+
β FACT Cache (deterministic reasoning)
|
|
777
|
+
β SAFLA Agent (analogical inference)
|
|
778
|
+
|
|
779
|
+
π¦ Rust β Sublinear Engine (gRPC)
|
|
780
|
+
β Johnson-Lindenstrauss projection
|
|
781
|
+
β TRUE O(log n) semantic comparison
|
|
782
|
+
|
|
783
|
+
πΉ Go β Concurrent Scanner (GitHub API)
|
|
784
|
+
β Rate-limited fetching
|
|
785
|
+
β Parallel processing
|
|
786
|
+
```
|
|
787
|
+
|
|
788
|
+
### The Intelligence Stack
|
|
789
|
+
|
|
790
|
+
#### 1. Sublinear Similarity (Rust)
|
|
791
|
+
|
|
792
|
+
**Problem**: Comparing your query to 10,000 repos is O(n) β too slow.
|
|
793
|
+
|
|
794
|
+
**Solution**: Johnson-Lindenstrauss dimension reduction.
|
|
795
|
+
|
|
796
|
+
```rust
|
|
797
|
+
// Reduce 1536-dimensional vectors to O(log n)
|
|
798
|
+
let jl = JLProjection::new(1536, 0.5);
|
|
799
|
+
let reduced = jl.project(&embedding);
|
|
800
|
+
|
|
801
|
+
// Now compare in compressed space
|
|
802
|
+
let similarity = sublinear_similarity(&query, &corpus);
|
|
803
|
+
// Complexity: O(log n) vs O(n)
|
|
804
|
+
```
|
|
805
|
+
|
|
806
|
+
**Mathematical guarantee**: Distances preserved within (1 Β± Ξ΅).
|
|
807
|
+
|
|
808
|
+
#### 2. FACT Cache (Python)
|
|
809
|
+
|
|
810
|
+
**Problem**: LLM reasoning is non-deterministic β can't reproduce results.
|
|
811
|
+
|
|
812
|
+
**Solution**: Deterministic prompt caching with SHA256 hashing.
|
|
813
|
+
|
|
814
|
+
```python
|
|
815
|
+
# Same input always produces same output
|
|
816
|
+
cache_hash = hashlib.sha256(prompt.encode()).hexdigest()
|
|
817
|
+
cached_result = fact_cache.get(cache_hash)
|
|
818
|
+
|
|
819
|
+
if cached_result:
|
|
820
|
+
return cached_result # 100% reproducible
|
|
821
|
+
```
|
|
822
|
+
|
|
823
|
+
**Benefit**: Every insight is reproducible, auditable, versioned.
|
|
824
|
+
|
|
825
|
+
#### 3. SAFLA Reasoning (Python)
|
|
826
|
+
|
|
827
|
+
**Problem**: Literal similarity misses creative reuse opportunities.
|
|
828
|
+
|
|
829
|
+
**Solution**: Analogical reasoning across domains.
|
|
830
|
+
|
|
831
|
+
```python
|
|
832
|
+
# Detect domain overlap
|
|
833
|
+
intent_concepts = ["performance", "search", "real-time"]
|
|
834
|
+
repo_capabilities = ["O(log n)", "sublinear", "algorithms"]
|
|
835
|
+
|
|
836
|
+
# Generate creative transfer
|
|
837
|
+
insight = safla.generate_outside_box_reasoning(
|
|
838
|
+
query="speed up vector search",
|
|
839
|
+
repo="scientific-computing/sparse-solver"
|
|
840
|
+
)
|
|
841
|
+
# β "Use sparse matrix techniques for approximate NN"
|
|
842
|
+
```
|
|
843
|
+
|
|
844
|
+
**Benefit**: Finds solutions from completely different fields.
|
|
845
|
+
|
|
846
|
+
#### 4. Concurrent Scanning (Go)
|
|
847
|
+
|
|
848
|
+
**Problem**: GitHub has 100M+ repos β can't scan them all.
|
|
849
|
+
|
|
850
|
+
**Solution**: Parallel workers with smart rate limiting.
|
|
851
|
+
|
|
852
|
+
```go
|
|
853
|
+
// 10 concurrent workers
|
|
854
|
+
for _, repo := range repos {
|
|
855
|
+
go scanner.processRepo(repo)
|
|
856
|
+
}
|
|
857
|
+
|
|
858
|
+
// Auto rate-limit
|
|
859
|
+
scanner.checkRateLimit()
|
|
860
|
+
// Sleeps if < 100 requests remaining
|
|
861
|
+
```
|
|
862
|
+
|
|
863
|
+
**Benefit**: Scan 100s of repos/minute without hitting limits.
|
|
864
|
+
|
|
865
|
+
---
|
|
866
|
+
|
|
867
|
+
## ποΈ Technical Architecture
|
|
868
|
+
|
|
869
|
+
### Data Flow
|
|
870
|
+
|
|
871
|
+
```
|
|
872
|
+
βββββββββββββββ
|
|
873
|
+
β User β
|
|
874
|
+
β Query β
|
|
875
|
+
ββββββββ¬βββββββ
|
|
876
|
+
β
|
|
877
|
+
βΌ
|
|
878
|
+
βββββββββββββββββββββββββββββββββββ
|
|
879
|
+
β Python MCP Server (FastAPI) β
|
|
880
|
+
β ββββββββββββββ¬ββββββββββββββββββ
|
|
881
|
+
β β Generate β Check FACT ββ
|
|
882
|
+
β β Embedding β Cache ββ
|
|
883
|
+
β βββββββ¬βββββββ΄βββββββββ¬βββββββββ
|
|
884
|
+
ββββββββββΌβββββββββββββββββΌβββββββββ
|
|
885
|
+
β β
|
|
886
|
+
βΌ βΌ
|
|
887
|
+
ββββββββββββ ββββββββββββ
|
|
888
|
+
β Rust β β Cache β
|
|
889
|
+
β Engine β β Hit! β
|
|
890
|
+
βββββββ¬βββββ ββββββ¬ββββββ
|
|
891
|
+
β β
|
|
892
|
+
βΌ β
|
|
893
|
+
Compute O(log n) β
|
|
894
|
+
Similarities β
|
|
895
|
+
β β
|
|
896
|
+
ββββββββ¬ββββββββ
|
|
897
|
+
βΌ
|
|
898
|
+
βββββββββββββββ
|
|
899
|
+
β SAFLA β
|
|
900
|
+
β Reasoning β
|
|
901
|
+
ββββββββ¬βββββββ
|
|
902
|
+
βΌ
|
|
903
|
+
βββββββββββββββ
|
|
904
|
+
β Leverage β
|
|
905
|
+
β Cards β
|
|
906
|
+
βββββββββββββββ
|
|
907
|
+
```
|
|
908
|
+
|
|
909
|
+
### System Components
|
|
910
|
+
|
|
911
|
+
| Component | Tech | Purpose | Complexity |
|
|
912
|
+
|-----------|------|---------|------------|
|
|
913
|
+
| MCP Server | Python 3.11 + FastAPI | API orchestration | O(1) |
|
|
914
|
+
| FACT Cache | SQLite + SHA256 | Deterministic storage | O(1) lookup |
|
|
915
|
+
| SAFLA Agent | Python + LLM | Analogical reasoning | O(k) prompts |
|
|
916
|
+
| Sublinear Engine | Rust + gRPC | Semantic comparison | **O(log n)** |
|
|
917
|
+
| Scanner | Go + goroutines | GitHub ingestion | O(n) parallel |
|
|
918
|
+
|
|
919
|
+
### Performance Characteristics
|
|
920
|
+
|
|
921
|
+
```
|
|
922
|
+
Query Response Time: <3 seconds
|
|
923
|
+
Scan Throughput: 50+ repos/minute
|
|
924
|
+
Memory Footprint: <500MB
|
|
925
|
+
CPU Usage: <1 core
|
|
926
|
+
Complexity: TRUE O(log n)
|
|
927
|
+
Determinism: 100% (FACT cache)
|
|
928
|
+
```
|
|
929
|
+
|
|
930
|
+
---
|
|
931
|
+
|
|
932
|
+
## π οΈ Building Systems With RuvScan
|
|
933
|
+
|
|
934
|
+
### System 1: AI Code Assistant
|
|
935
|
+
|
|
936
|
+
**Stack**: RuvScan + Claude + VS Code Extension
|
|
937
|
+
|
|
938
|
+
```typescript
|
|
939
|
+
// VS Code extension
|
|
940
|
+
vscode.workspace.onDidChangeTextDocument(async (event) => {
|
|
941
|
+
const context = extractContext(event.document);
|
|
942
|
+
|
|
943
|
+
const suggestions = await ruvscan.query({
|
|
944
|
+
intent: `Optimize this code: ${context}`,
|
|
945
|
+
max_results: 3
|
|
946
|
+
});
|
|
947
|
+
|
|
948
|
+
showInlineSuggestions(suggestions);
|
|
949
|
+
});
|
|
950
|
+
```
|
|
951
|
+
|
|
952
|
+
**Value**: Developer gets library suggestions as they code.
|
|
953
|
+
|
|
954
|
+
### System 2: Autonomous Agent
|
|
955
|
+
|
|
956
|
+
**Stack**: RuvScan + LangChain + OpenAI
|
|
957
|
+
|
|
958
|
+
```python
|
|
959
|
+
class BuilderAgent:
|
|
960
|
+
def __init__(self):
|
|
961
|
+
self.ruvscan = RuvScanClient()
|
|
962
|
+
|
|
963
|
+
async def optimize(self, codebase):
|
|
964
|
+
# Scan for bottlenecks
|
|
965
|
+
bottlenecks = await self.analyze(codebase)
|
|
966
|
+
|
|
967
|
+
# Find solutions
|
|
968
|
+
for issue in bottlenecks:
|
|
969
|
+
solutions = await self.ruvscan.query(
|
|
970
|
+
f"Solve: {issue.description}"
|
|
971
|
+
)
|
|
972
|
+
|
|
973
|
+
# Auto-apply best solution
|
|
974
|
+
await self.apply(solutions[0])
|
|
975
|
+
```
|
|
976
|
+
|
|
977
|
+
**Value**: Agent autonomously improves your code.
|
|
978
|
+
|
|
979
|
+
### System 3: Research Platform
|
|
980
|
+
|
|
981
|
+
**Stack**: RuvScan + Supabase + Next.js
|
|
982
|
+
|
|
983
|
+
```javascript
|
|
984
|
+
// Research dashboard
|
|
985
|
+
async function discoverInnovations(techStack) {
|
|
986
|
+
// Scan your current stack
|
|
987
|
+
const current = await ruvscan.scan({
|
|
988
|
+
source_type: "org",
|
|
989
|
+
source_name: "your-company"
|
|
990
|
+
});
|
|
991
|
+
|
|
992
|
+
// Find improvements
|
|
993
|
+
const opportunities = await Promise.all(
|
|
994
|
+
current.map(repo =>
|
|
995
|
+
ruvscan.query(`Improve ${repo.name}`)
|
|
996
|
+
)
|
|
997
|
+
);
|
|
998
|
+
|
|
999
|
+
return rankByImpact(opportunities);
|
|
1000
|
+
}
|
|
1001
|
+
```
|
|
1002
|
+
|
|
1003
|
+
**Value**: Continuous innovation discovery.
|
|
1004
|
+
|
|
1005
|
+
---
|
|
1006
|
+
|
|
1007
|
+
## π API Reference
|
|
1008
|
+
|
|
1009
|
+
### Core Endpoints
|
|
1010
|
+
|
|
1011
|
+
#### POST `/query` - Find Leverage
|
|
1012
|
+
|
|
1013
|
+
```bash
|
|
1014
|
+
curl -X POST http://localhost:8000/query \
|
|
1015
|
+
-H "Content-Type: application/json" \
|
|
1016
|
+
-d '{
|
|
1017
|
+
"intent": "Your problem or goal",
|
|
1018
|
+
"max_results": 10,
|
|
1019
|
+
"min_score": 0.7
|
|
1020
|
+
}'
|
|
1021
|
+
```
|
|
1022
|
+
|
|
1023
|
+
**Response:**
|
|
1024
|
+
```json
|
|
1025
|
+
[{
|
|
1026
|
+
"repo": "org/repo-name",
|
|
1027
|
+
"capabilities": ["feature1", "feature2"],
|
|
1028
|
+
"summary": "What this repo does",
|
|
1029
|
+
"outside_box_reasoning": "Why this applies to your problem",
|
|
1030
|
+
"integration_hint": "How to use it",
|
|
1031
|
+
"relevance_score": 0.92,
|
|
1032
|
+
"runtime_complexity": "O(log n)",
|
|
1033
|
+
"cached": true
|
|
1034
|
+
}]
|
|
1035
|
+
```
|
|
1036
|
+
|
|
1037
|
+
#### POST `/scan` - Scan Repositories
|
|
1038
|
+
|
|
1039
|
+
```bash
|
|
1040
|
+
curl -X POST http://localhost:8000/scan \
|
|
1041
|
+
-H "Content-Type: application/json" \
|
|
1042
|
+
-d '{
|
|
1043
|
+
"source_type": "org",
|
|
1044
|
+
"source_name": "ruvnet",
|
|
1045
|
+
"limit": 50
|
|
1046
|
+
}'
|
|
1047
|
+
```
|
|
1048
|
+
|
|
1049
|
+
#### POST `/compare` - Compare Repos
|
|
1050
|
+
|
|
1051
|
+
```bash
|
|
1052
|
+
curl -X POST http://localhost:8000/compare \
|
|
1053
|
+
-H "Content-Type: application/json" \
|
|
1054
|
+
-d '{
|
|
1055
|
+
"repo_a": "org/repo-1",
|
|
1056
|
+
"repo_b": "org/repo-2"
|
|
1057
|
+
}'
|
|
1058
|
+
```
|
|
1059
|
+
|
|
1060
|
+
### MCP Integration
|
|
1061
|
+
|
|
1062
|
+
RuvScan implements the Model Context Protocol for IDE/Agent integration:
|
|
1063
|
+
|
|
1064
|
+
```json
|
|
1065
|
+
{
|
|
1066
|
+
"mcpServers": {
|
|
1067
|
+
"ruvscan": {
|
|
1068
|
+
"command": "docker",
|
|
1069
|
+
"args": ["run", "-p", "8000:8000", "ruvscan/mcp-server"]
|
|
1070
|
+
}
|
|
1071
|
+
}
|
|
1072
|
+
}
|
|
1073
|
+
```
|
|
1074
|
+
|
|
1075
|
+
**Compatible with:**
|
|
1076
|
+
- Claude Desktop
|
|
1077
|
+
- Cursor
|
|
1078
|
+
- TabStax
|
|
1079
|
+
- Any MCP-compatible tool
|
|
1080
|
+
|
|
1081
|
+
---
|
|
1082
|
+
|
|
1083
|
+
## π Deployment
|
|
1084
|
+
|
|
1085
|
+
### Development (Local)
|
|
1086
|
+
|
|
1087
|
+
```bash
|
|
1088
|
+
# Using Docker
|
|
1089
|
+
docker compose up -d
|
|
1090
|
+
|
|
1091
|
+
# Manual
|
|
1092
|
+
bash scripts/setup.sh
|
|
1093
|
+
make dev
|
|
1094
|
+
```
|
|
1095
|
+
|
|
1096
|
+
### Production (Cloud)
|
|
1097
|
+
|
|
1098
|
+
**Docker Compose:**
|
|
1099
|
+
```bash
|
|
1100
|
+
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
|
1101
|
+
```
|
|
1102
|
+
|
|
1103
|
+
**Kubernetes:**
|
|
1104
|
+
```bash
|
|
1105
|
+
kubectl apply -f k8s/deployment.yaml
|
|
1106
|
+
```
|
|
1107
|
+
|
|
1108
|
+
**Cloud Platforms:**
|
|
1109
|
+
- AWS: ECS, EKS
|
|
1110
|
+
- Google Cloud: Cloud Run, GKE
|
|
1111
|
+
- Azure: ACI, AKS
|
|
1112
|
+
|
|
1113
|
+
See [DEPLOYMENT.md](docs/DEPLOYMENT.md) for full guide.
|
|
1114
|
+
|
|
1115
|
+
---
|
|
1116
|
+
|
|
1117
|
+
## π§ͺ Testing
|
|
1118
|
+
|
|
1119
|
+
```bash
|
|
1120
|
+
# Run all tests
|
|
1121
|
+
./scripts/run_tests.sh
|
|
1122
|
+
|
|
1123
|
+
# Or specific suites
|
|
1124
|
+
pytest tests/test_server.py # API tests
|
|
1125
|
+
pytest tests/test_embeddings.py # Embedding tests
|
|
1126
|
+
pytest tests/test_fact_cache.py # Cache tests
|
|
1127
|
+
pytest tests/test_integration.py # E2E tests
|
|
1128
|
+
```
|
|
1129
|
+
|
|
1130
|
+
---
|
|
1131
|
+
|
|
1132
|
+
## π Documentation
|
|
1133
|
+
|
|
1134
|
+
- **[Quick Start](docs/QUICK_START.md)** - Get running in 5 minutes
|
|
1135
|
+
- **[Architecture](docs/ARCHITECTURE.md)** - Deep technical dive
|
|
1136
|
+
- **[API Reference](docs/api/MCP_PROTOCOL.md)** - Complete API docs
|
|
1137
|
+
- **[Deployment](docs/DEPLOYMENT.md)** - Production deployment
|
|
1138
|
+
- **[Examples](examples/)** - Code examples
|
|
1139
|
+
|
|
1140
|
+
---
|
|
1141
|
+
|
|
1142
|
+
## π― Roadmap
|
|
1143
|
+
|
|
1144
|
+
### v0.5 (Current) β
|
|
1145
|
+
- MCP server with 5 endpoints
|
|
1146
|
+
- TRUE O(log n) algorithms
|
|
1147
|
+
- FACT deterministic caching
|
|
1148
|
+
- SAFLA analogical reasoning
|
|
1149
|
+
- Docker + Kubernetes deployment
|
|
1150
|
+
|
|
1151
|
+
### v0.6 (Next)
|
|
1152
|
+
- [ ] Real-time streaming (MidStream)
|
|
1153
|
+
- [ ] Authentication & API keys
|
|
1154
|
+
- [ ] Rate limiting
|
|
1155
|
+
- [ ] Prometheus metrics
|
|
1156
|
+
- [ ] Enhanced LLM reasoning
|
|
1157
|
+
|
|
1158
|
+
### v0.7
|
|
1159
|
+
- [ ] Advanced query DSL
|
|
1160
|
+
- [ ] Graph visualization
|
|
1161
|
+
- [ ] Multi-LLM support
|
|
1162
|
+
- [ ] WebSocket API
|
|
1163
|
+
- [ ] Plugin system
|
|
1164
|
+
|
|
1165
|
+
### v1.0
|
|
1166
|
+
- [ ] Self-optimizing agent
|
|
1167
|
+
- [ ] Federated nodes
|
|
1168
|
+
- [ ] Community marketplace
|
|
1169
|
+
- [ ] Enterprise features
|
|
1170
|
+
|
|
1171
|
+
---
|
|
1172
|
+
|
|
1173
|
+
## π€ Contributing
|
|
1174
|
+
|
|
1175
|
+
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
1176
|
+
|
|
1177
|
+
**Areas we need help:**
|
|
1178
|
+
- π§ͺ Testing edge cases
|
|
1179
|
+
- π Documentation improvements
|
|
1180
|
+
- π Language translations
|
|
1181
|
+
- π IDE integrations
|
|
1182
|
+
- π¨ UI/Dashboard
|
|
1183
|
+
|
|
1184
|
+
---
|
|
1185
|
+
|
|
1186
|
+
## π License
|
|
1187
|
+
|
|
1188
|
+
MIT OR Apache-2.0 - Choose whichever works for you.
|
|
1189
|
+
|
|
1190
|
+
---
|
|
1191
|
+
|
|
1192
|
+
## π Built On
|
|
1193
|
+
|
|
1194
|
+
RuvScan stands on the shoulders of giants:
|
|
1195
|
+
|
|
1196
|
+
- **[sublinear-time-solver](https://github.com/ruvnet/sublinear-time-solver)** - TRUE O(log n) algorithms
|
|
1197
|
+
- **[FACT](https://github.com/ruvnet/FACT)** - Deterministic AI framework
|
|
1198
|
+
- **[MidStream](https://github.com/ruvnet/MidStream)** - Real-time streaming
|
|
1199
|
+
- **[FastAPI](https://fastapi.tiangolo.com/)** - Modern Python web
|
|
1200
|
+
- **[Rust](https://www.rust-lang.org/)** - Performance-critical code
|
|
1201
|
+
- **[Go](https://go.dev/)** - Concurrent systems
|
|
1202
|
+
|
|
1203
|
+
---
|
|
1204
|
+
|
|
1205
|
+
|
|
1206
|
+
|
|
1207
|
+
---
|
|
1208
|
+
|
|
1209
|
+
## β¨ The Vision
|
|
1210
|
+
|
|
1211
|
+
**RuvScan makes every developer 10Γ more productive by turning the entire open-source world into their personal innovation engine.**
|
|
1212
|
+
|
|
1213
|
+
Instead of reinventing the wheel, developers discover existing solutions β even ones from completely different domains β and apply them creatively to their problems.
|
|
1214
|
+
|
|
1215
|
+
**The result**: Faster builds, better architectures, and constant innovation.
|
|
1216
|
+
|
|
1217
|
+
---
|
|
1218
|
+
|
|
1219
|
+
|
|
1220
|
+
|
|
1221
|
+
---
|
|
1222
|
+
|