@gmickel/gno 0.3.0 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +109 -181
  2. package/package.json +6 -1
package/README.md CHANGED
@@ -1,256 +1,184 @@
1
- # GNO: Your Local Second Brain
1
+ # GNO
2
2
 
3
- **Index, Search, and Synthesize Your Entire Digital Life.**
3
+ **Your Local Second Brain** — Index, search, and synthesize your entire digital life.
4
4
 
5
- GNO is a **Local Knowledge Engine** designed for privacy-conscious individuals and AI agents. It indexes your notes, code, documents (Markdown, PDF, Office, and more), and meeting transcripts, providing lightning-fast semantic search and AI-powered answers—all on your machine.
5
+ [![npm version](https://img.shields.io/npm/v/@gmickel/gno.svg)](https://www.npmjs.com/package/@gmickel/gno)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
6
7
 
7
- ---
8
-
9
- ## ✨ Key Features
10
-
11
- * **Universal Indexing**: Effortlessly ingest and search across Markdown, PDF, DOCX, XLSX, PPTX, and plain text files.
12
- * **Hybrid Search Pipeline**: Combines **BM25 keyword search** with **vector semantic search** and **AI re-ranking** for unparalleled retrieval accuracy.
13
- * **Local LLM Integration**: Get grounded AI answers with citations using **node-llama-cpp** and auto-downloaded GGUF models. No external services, maximum privacy.
14
- * **Agent-First Design (MCP)**: Seamlessly integrate GNO with AI agents via the Model Context Protocol (MCP) server.
15
- * **Deterministic Output**: Stable, schema-driven JSON, file-line, and markdown outputs for reliable scripting.
16
- * **Multilingual Support**: Robust handling of multiple languages in indexing and retrieval.
17
- * **Privacy-Preserving**: All processing happens locally. Your data never leaves your device.
18
- * **World-Class Engineering**: Spec-driven development, rigorous testing, and eval gates ensure reliability and quality.
8
+ GNO is a local knowledge engine for privacy-conscious individuals and AI agents. Index your notes, code, PDFs, and Office docs. Get lightning-fast semantic search and AI-powered answers—all on your machine.
19
9
 
20
10
  ---
21
11
 
22
- ## 🚀 Quick Start
23
-
24
- Get searching in minutes with the 3-command workflow:
12
+ ## Contents
25
13
 
26
- 1. **Initialize your knowledge base**:
27
- ```sh
28
- # Create a collection for your notes (adjust path and name as needed)
29
- gno init ~/my-notes --name notes --pattern "**/*.md"
14
+ - [Quick Start](#quick-start)
15
+ - [Installation](#installation)
16
+ - [Search Modes](#search-modes)
17
+ - [Agent Integration](#agent-integration)
18
+ - [How It Works](#how-it-works)
19
+ - [Local Models](#local-models)
20
+ - [Architecture](#architecture)
21
+ - [Development](#development)
30
22
 
31
- # Full index: sync files + generate embeddings
32
- gno index
33
- ```
23
+ ---
34
24
 
35
- 2. **Ask a question**:
36
- ```sh
37
- # Get a direct, cited answer from your documents
38
- gno ask "What are the best practices for API authentication?" --collection notes
25
+ ## Quick Start
39
26
 
40
- # Search with keywords or natural language
41
- gno query "Q4 planning meeting summary" --collection notes
42
- ```
27
+ ```bash
28
+ # Initialize with your notes folder
29
+ gno init ~/notes --name notes
43
30
 
44
- 3. **Explore your data**:
45
- ```sh
46
- # Retrieve specific document content
47
- gno get "notes/2024-01-15.md"
31
+ # Index documents (BM25 + vectors)
32
+ gno index
48
33
 
49
- # Get results in a machine-readable format for agents
50
- gno search "project deadlines" --json -n 10
51
- ```
34
+ # Search
35
+ gno query "authentication best practices"
36
+ gno ask "summarize the API discussion" --answer
37
+ ```
52
38
 
53
39
  ---
54
40
 
55
- ## 🧠 For Humans & AI Agents
41
+ ## Installation
56
42
 
57
- GNO is built for both worlds:
43
+ Requires [Bun](https://bun.sh/) >= 1.0.0.
58
44
 
59
- * **For Humans**: A powerful, yet intuitive CLI to quickly find information, get answers, and explore your local knowledge base.
60
- * **For AI Agents**: Exposes a stable MCP server and structured output formats (`--json`, `--files`) for seamless integration with LLMs and agentic workflows.
61
-
62
- ---
63
-
64
- ## 🔎 Search Modes
45
+ ```bash
46
+ bun install -g @gmickel/gno
47
+ ```
65
48
 
66
- GNO offers multiple search strategies to suit your needs:
49
+ **macOS**: Vector search requires Homebrew SQLite:
50
+ ```bash
51
+ brew install sqlite3
52
+ ```
67
53
 
68
- | Command | Mode | Description | Best For |
69
- | :---------- | :------------- | :-------------------------------------------------------------- | :----------------------------------------------- |
70
- | `gno search`| **BM25** | Fast, keyword-based full-text search. | Exact phrase matching, known terms. |
71
- | `gno vsearch`| **Vector** | Semantic search based on meaning, not just keywords. | Natural language queries, conceptual understanding. |
72
- | `gno query` | **Hybrid** | Combines BM25 and Vector search with LLM reranking and fusion. | Highest accuracy, nuanced understanding. |
73
- | `gno ask` | **RAG-focused**| Hybrid search providing a synthesized, cited answer from results. | Getting direct answers to complex questions. |
54
+ Verify:
55
+ ```bash
56
+ gno doctor
57
+ ```
74
58
 
75
59
  ---
76
60
 
77
- ## 🤖 Agent Integration
78
-
79
- GNO is designed to be the knowledge backbone for your AI agents.
61
+ ## Search Modes
80
62
 
81
- ### CLI Output Formats
63
+ | Command | Mode | Best For |
64
+ |:--------|:-----|:---------|
65
+ | `gno search` | BM25 | Exact phrases, known terms |
66
+ | `gno vsearch` | Vector | Natural language, concepts |
67
+ | `gno query` | Hybrid | Highest accuracy (BM25 + Vector + reranking) |
68
+ | `gno ask` | RAG | Direct answers with citations |
82
69
 
83
- Use `--json` or `--files` for machine-readable output:
84
-
85
- ```sh
86
- # Get JSON results for LLM processing
87
- gno query "meeting notes on user feedback" --json -n 5
88
-
89
- # Get file paths and scores for agent tool use
90
- gno search "API design" --files --min-score 0.3
91
- ```
70
+ ---
92
71
 
93
- ### Skill Installation (Recommended for Claude Code/Codex/OpenCode)
72
+ ## Agent Integration
94
73
 
95
- Skills integrate via CLI - the agent runs GNO commands directly. No MCP overhead, no context pollution.
74
+ ### For Claude Code / Codex / OpenCode
96
75
 
97
76
  ```bash
98
- gno skill install --scope user # User-wide for Claude Code
99
- gno skill install --target codex # For Codex
77
+ gno skill install --scope user # Claude Code
78
+ gno skill install --target codex # Codex
100
79
  gno skill install --target all # Both
101
80
  ```
102
81
 
103
- After install, restart your agent. It will detect GNO and can search your indexed documents.
104
-
105
- ### MCP Server (For Claude Desktop/Cursor)
106
-
107
- Exposes an MCP server for GUI-based AI applications.
82
+ ### For Claude Desktop / Cursor
108
83
 
109
- **Tools Exposed:**
110
- * `gno_search` (BM25)
111
- * `gno_vsearch` (Vector)
112
- * `gno_query` (Hybrid)
113
- * `gno_get` (Document retrieval)
114
- * `gno_multi_get` (Batch retrieval)
115
- * `gno_status` (Index health)
116
-
117
- **Example Claude Desktop Configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
84
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
118
85
 
119
86
  ```json
120
87
  {
121
88
  "mcpServers": {
122
- "gno": {
123
- "command": "gno",
124
- "args": ["mcp"]
125
- }
89
+ "gno": { "command": "gno", "args": ["mcp"] }
126
90
  }
127
91
  }
128
92
  ```
129
93
 
130
- *(Adjust path and `mcpServers` key based on your agent's configuration.)*
131
-
132
- ---
133
-
134
- ## ⚙️ How It Works: The GNO Pipeline
94
+ **MCP Tools**: `gno_search`, `gno_vsearch`, `gno_query`, `gno_get`, `gno_multi_get`, `gno_status`
135
95
 
136
- GNO employs a sophisticated, multi-stage retrieval process for optimal results:
96
+ ### CLI Output Formats
137
97
 
138
- ```mermaid
139
- graph TD
140
- A[User Query] --> B(Query Expansion);
141
- B --> C{Lexical Variants};
142
- B --> D{Semantic Variants};
143
- B --> E{Optional HyDE};
144
- A --> F[Original Query];
145
-
146
- C --> G(BM25 Retrieval);
147
- D --> H(Vector Search);
148
- E --> H;
149
- F --> G;
150
- F --> H;
151
-
152
- G --> I(Ranked List 1);
153
- H --> J(Ranked List 2);
154
- I --> K{RRF Fusion + Bonus};
155
- J --> K;
156
-
157
- K --> L(Top Candidates);
158
- L --> M(LLM Re-ranking);
159
- M --> N(Position-Aware Blending);
160
- N --> O(Final Results);
161
-
162
- subgraph "Search Stages"
163
- B; C; D; E; F; G; H; I; J; K; L; M; N; O;
164
- end
98
+ ```bash
99
+ gno query "meeting notes" --json -n 5 # JSON for LLMs
100
+ gno search "API design" --files # File paths only
165
101
  ```
166
102
 
167
- ### Search Pipeline Details:
168
-
169
- 1. **Query Expansion**: Generates alternative queries (lexical and semantic) and an optional synthetic "HyDE" document using a local LLM for richer retrieval.
170
- 2. **Parallel Retrieval**: Executes BM25 (keyword) and Vector (semantic) searches concurrently.
171
- 3. **Fusion**: Combines results using Reciprocal Rank Fusion (RRF) with a weighted boost for original query matches and a top-rank bonus.
172
- 4. **Re-ranking**: An LLM-based cross-encoder re-scores the top candidates for final relevance.
173
- 5. **Blending**: Dynamically adjusts the mix of retrieval vs. reranked scores based on rank position to preserve accuracy.
174
-
175
- **Score Normalization**: Raw scores from FTS, vector distance, and reranker are normalized to a 0-1 scale for consistent fusion.
176
-
177
103
  ---
178
104
 
179
- ## 📦 Installation
180
-
181
- Requires **Bun** >= 1.0.0.
105
+ ## How It Works
182
106
 
183
- ```sh
184
- # Install globally
185
- bun install -g @gmickel/gno
107
+ ```mermaid
108
+ graph TD
109
+ A[User Query] --> B(Query Expansion)
110
+ B --> C{Lexical Variants}
111
+ B --> D{Semantic Variants}
112
+ B --> E{HyDE Passage}
113
+ A --> F[Original Query]
114
+
115
+ C --> G(BM25 Search)
116
+ D --> H(Vector Search)
117
+ E --> H
118
+ F --> G
119
+ F --> H
120
+
121
+ G --> I(Ranked List 1)
122
+ H --> J(Ranked List 2)
123
+ I --> K{RRF Fusion}
124
+ J --> K
125
+
126
+ K --> L(Top Candidates)
127
+ L --> M(Cross-Encoder Rerank)
128
+ M --> N[Final Results]
186
129
  ```
187
130
 
188
- **macOS users**: For optimal vector search performance, install Homebrew SQLite:
189
- ```sh
190
- brew install sqlite3
191
- ```
131
+ 1. **Query Expansion** LLM generates lexical variants, semantic variants, and [HyDE](https://arxiv.org/abs/2212.10496) passage
132
+ 2. **Parallel Retrieval** — BM25 + Vector search run concurrently on all variants
133
+ 3. **Fusion** — Reciprocal Rank Fusion merges results with position-based scoring
134
+ 4. **Re-ranking** — Cross-encoder rescores top 20, blended with fusion scores
192
135
 
193
- Verify your installation:
194
- ```sh
195
- gno doctor
196
- ```
136
+ See [How Search Works](https://gno.sh/docs/HOW-SEARCH-WORKS/) for full pipeline details.
197
137
 
198
138
  ---
199
139
 
200
- ## 🏠 Local LLM Models
201
-
202
- GNO runs embeddings, reranking, and query expansion locally using GGUF models via `node-llama-cpp`. Models are automatically downloaded and cached on first use in `~/.cache/gno/models/`.
140
+ ## Local Models
203
141
 
204
- | Model | Purpose | Size (approx.) |
205
- | :-------------------- | :---------------- | :------------- |
206
- | `bge-m3` | Multilingual Embeddings | ~500MB |
207
- | `bge-reranker-v2-m3` | Cross-Encoder Re-ranking | ~700MB |
208
- | `Qwen-Instruct` | Query Expansion / HyDE | ~600MB |
142
+ Models auto-download on first use to `~/.cache/gno/models/`.
209
143
 
210
- *(Specific GGUF versions are pinned for stability.)*
144
+ | Model | Purpose | Size |
145
+ |:------|:--------|:-----|
146
+ | bge-m3 | Embeddings | ~500MB |
147
+ | bge-reranker-v2-m3 | Re-ranking | ~700MB |
148
+ | Qwen-Instruct | Query expansion | ~600MB |
211
149
 
212
150
  ---
213
151
 
214
- ## 📜 Architecture Overview
215
-
216
- GNO follows a layered, Ports and Adapters architecture for maintainability and testability:
152
+ ## Architecture
217
153
 
218
154
  ```
219
- ┌───────────────────────────────────────────────────────────┐
220
- GNO CLI / MCP
221
- ├───────────────────────────────────────────────────────────┤
222
- Ports: Converter, Store, Embedding, Generation, Rerank...
223
- ├───────────────────────────────────────────────────────────┤
224
- Adapters: SQLite, FTS5, sqlite-vec, node-llama-cpp, CLI
225
- ├───────────────────────────────────────────────────────────┤
226
- Core Domain: Identity, Mirrors, Chunking, Retrieval
227
- └───────────────────────────────────────────────────────────┘
155
+ ┌─────────────────────────────────────────────────┐
156
+ GNO CLI / MCP
157
+ ├─────────────────────────────────────────────────┤
158
+ Ports: Converter, Store, Embedding, Rerank
159
+ ├─────────────────────────────────────────────────┤
160
+ Adapters: SQLite, FTS5, sqlite-vec, llama-cpp
161
+ ├─────────────────────────────────────────────────┤
162
+ Core: Identity, Mirrors, Chunking, Retrieval
163
+ └─────────────────────────────────────────────────┘
228
164
  ```
229
165
 
230
166
  ---
231
167
 
232
- ## 💻 Development
168
+ ## Development
233
169
 
234
170
  ```bash
235
- # Clone the repository
236
- git clone https://github.com/gmickel/gno.git
237
- cd gno
238
-
239
- # Install dependencies
171
+ git clone https://github.com/gmickel/gno.git && cd gno
240
172
  bun install
241
-
242
- # Run tests
243
173
  bun test
244
-
245
- # Lint and format code
246
174
  bun run lint
247
-
248
- # Type check
249
175
  bun run typecheck
250
176
  ```
251
177
 
178
+ See [Contributing](.github/CONTRIBUTING.md) for CI matrix, caching, and release process.
179
+
252
180
  ---
253
181
 
254
- ## 📄 License
182
+ ## License
255
183
 
256
- [MIT License](./LICENSE)
184
+ [MIT](./LICENSE)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@gmickel/gno",
3
- "version": "0.3.0",
3
+ "version": "0.3.4",
4
4
  "description": "Local semantic search for your documents. Index Markdown, PDF, and Office files with hybrid BM25 + vector search.",
5
5
  "keywords": [
6
6
  "search",
@@ -47,6 +47,7 @@
47
47
  "test:watch": "bun test --watch",
48
48
  "test:coverage": "bun test --coverage",
49
49
  "test:coverage:html": "bun test --coverage --html",
50
+ "test:fixtures": "bun scripts/generate-test-fixtures.ts",
50
51
  "typecheck": "tsgo --noEmit",
51
52
  "lint:typeaware": "bun x oxlint --type-aware",
52
53
  "reset": "bun run src/index.ts reset --confirm",
@@ -79,9 +80,13 @@
79
80
  "@typescript/native-preview": "^7.0.0-dev.20251215.1",
80
81
  "ajv": "^8.17.1",
81
82
  "ajv-formats": "^3.0.1",
83
+ "docx": "^9.5.1",
82
84
  "evalite": "^1.0.0-beta.15",
85
+ "exceljs": "^4.4.0",
83
86
  "lefthook": "^2.0.12",
84
87
  "oxlint-tsgolint": "^0.10.0",
88
+ "pdf-lib": "^1.17.1",
89
+ "pptxgenjs": "^4.0.1",
85
90
  "ultracite": "^6.5.0"
86
91
  },
87
92
  "peerDependencies": {