pdf-brain 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Joel Hooks
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,137 @@
1
+ # pdf-library
2
+
3
+ Local PDF knowledge base with vector search. Extract, embed, and semantically search your PDFs.
4
+
5
+ ## Features
6
+
7
+ - **Local-first** - Everything runs on your machine, no API costs
8
+ - **Vector search** - Semantic search via Ollama embeddings (mxbai-embed-large)
9
+ - **Hybrid search** - Combine vector similarity with full-text search
10
+ - **iCloud sync** - Default storage in `~/Documents/.pdf-library/`
11
+ - **Fast** - SQLite + sqlite-vec for instant queries
12
+
13
+ ## Requirements
14
+
15
+ - macOS (Apple Silicon recommended)
16
+ - [Bun](https://bun.sh) runtime
17
+ - [Ollama](https://ollama.ai) for embeddings
18
+ - [uv](https://github.com/astral-sh/uv) for PDF extraction
19
+
20
+ ## Setup
21
+
22
+ ```bash
23
+ # Clone and setup
24
+ git clone https://github.com/joelhooks/pdf-library.git
25
+ cd pdf-library
26
+ ./scripts/setup.sh
27
+ ```
28
+
29
+ This will:
30
+
31
+ 1. Install Ollama if needed
32
+ 2. Pull the `mxbai-embed-large` embedding model
33
+ 3. Install dependencies
34
+ 4. Create the library directory
35
+
36
+ ## Usage
37
+
38
+ ### CLI
39
+
40
+ ```bash
41
+ # Add a PDF from local path
42
+ bun run dev add /path/to/document.pdf
43
+
44
+ # Add from URL
45
+ bun run dev add https://example.com/paper.pdf
46
+
47
+ # Add with tags
48
+ bun run dev add /path/to/document.pdf --tags "ai,agents"
49
+
50
+ # Add from URL with custom title
51
+ bun run dev add https://example.com/paper.pdf --title "Research Paper" --tags "research"
52
+
53
+ # Search semantically
54
+ bun run dev search "context engineering patterns"
55
+
56
+ # Full-text search
57
+ bun run dev search "context engineering" --fts
58
+
59
+ # List all documents
60
+ bun run dev list
61
+
62
+ # List by tag
63
+ bun run dev list --tag ai
64
+
65
+ # Get document details
66
+ bun run dev get "document-title"
67
+
68
+ # Remove a document
69
+ bun run dev remove "document-title"
70
+
71
+ # Update tags
72
+ bun run dev tag "document-title" "new,tags,here"
73
+
74
+ # Show stats
75
+ bun run dev stats
76
+
77
+ # Check Ollama status
78
+ bun run dev check
79
+ ```
80
+
81
+ ### As a library
82
+
83
+ ```typescript
84
+ import { PDFLibrary } from "pdf-library";
85
+
86
+ const library = new PDFLibrary();
87
+
88
+ // Add a PDF
89
+ await library.add("/path/to/document.pdf", {
90
+ tags: ["ai", "agents"],
91
+ });
92
+
93
+ // Semantic search
94
+ const results = await library.search("context engineering patterns");
95
+
96
+ // Hybrid search (vector + FTS)
97
+ const results = await library.search("context engineering", { hybrid: true });
98
+
99
+ // List documents
100
+ const docs = library.list();
101
+ ```
102
+
103
+ ### OpenCode Tool
104
+
105
+ Copy `opencode-tool.ts` to `~/.config/opencode/tool/pdf-library.ts` to use as an OpenCode custom tool.
106
+
107
+ ## Configuration
108
+
109
+ Environment variables:
110
+
111
+ | Variable | Default | Description |
112
+ | ------------------ | -------------------------- | ------------------------ |
113
+ | `PDF_LIBRARY_PATH` | `~/Documents/.pdf-library` | Library storage location |
114
+ | `OLLAMA_HOST` | `http://localhost:11434` | Ollama API endpoint |
115
+ | `OLLAMA_MODEL` | `mxbai-embed-large` | Embedding model |
116
+
117
+ ## How it works
118
+
119
+ 1. **Extract** - PDF text extracted via `pypdf` (run through `uv`)
120
+ 2. **Chunk** - Text split into ~512 token chunks with overlap
121
+ 3. **Embed** - Each chunk embedded via Ollama (mxbai-embed-large, 1024 dims)
122
+ 4. **Store** - SQLite database with sqlite-vec for vector search + FTS5 for full-text
123
+ 5. **Search** - Query embedded, compared against chunks via cosine similarity
124
+
125
+ ## Storage
126
+
127
+ ```
128
+ ~/Documents/.pdf-library/
129
+ ├── library.db # SQLite database (vectors, FTS, metadata)
130
+ ├── downloads/ # PDFs downloaded from URLs
131
+ ├── extracted/ # Markdown versions of PDFs (optional)
132
+ └── originals/ # Copy of original PDFs (optional)
133
+ ```
134
+
135
+ ## License
136
+
137
+ MIT