vector-inspector 0.2.0__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/PKG-INFO +26 -162
  2. vector_inspector-0.2.1/README.md +225 -0
  3. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/pyproject.toml +1 -1
  4. vector_inspector-0.2.0/README.md +0 -361
  5. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/__init__.py +0 -0
  6. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/__main__.py +0 -0
  7. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/__init__.py +0 -0
  8. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/connections/__init__.py +0 -0
  9. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/connections/base_connection.py +0 -0
  10. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/connections/chroma_connection.py +0 -0
  11. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/connections/qdrant_connection.py +0 -0
  12. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/core/connections/template_connection.py +0 -0
  13. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/main.py +0 -0
  14. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/__init__.py +0 -0
  15. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/backup_restore_service.py +0 -0
  16. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/filter_service.py +0 -0
  17. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/import_export_service.py +0 -0
  18. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/settings_service.py +0 -0
  19. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/services/visualization_service.py +0 -0
  20. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/__init__.py +0 -0
  21. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/components/__init__.py +0 -0
  22. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/components/backup_restore_dialog.py +0 -0
  23. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/components/filter_builder.py +0 -0
  24. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/components/item_dialog.py +0 -0
  25. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/components/loading_dialog.py +0 -0
  26. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/main_window.py +0 -0
  27. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/__init__.py +0 -0
  28. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/collection_browser.py +0 -0
  29. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/connection_view.py +0 -0
  30. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/metadata_view.py +0 -0
  31. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/search_view.py +0 -0
  32. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/src/vector_inspector/ui/views/visualization_view.py +0 -0
  33. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/tests/test_connections.py +0 -0
  34. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/tests/test_filter_service.py +0 -0
  35. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/tests/test_settings_service.py +0 -0
  36. {vector_inspector-0.2.0 → vector_inspector-0.2.1}/tests/vector_inspector.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: vector-inspector
3
- Version: 0.2.0
3
+ Version: 0.2.1
4
4
  Summary: A comprehensive desktop application for visualizing, querying, and managing vector database data
5
5
  Author-Email: Anthony Dawson <anthonypdawson+github@gmail.com>
6
6
  License: MIT
@@ -33,9 +33,9 @@ A comprehensive desktop application for visualizing, querying, and managing vect
33
33
  - [Architecture](#architecture)
34
34
  - [Application Structure](#application-structure)
35
35
  - [Use Cases](#use-cases)
36
- - [Feature Access (Free vs Pro)](#feature-access-free-vs-pro)
37
- - [Planned Roadmap](#planned-roadmap)
38
- - [Installation (Planned)](#installation-planned)
36
+ - [Feature Access](#feature-access)
37
+ - [Roadmap](#roadmap)
38
+ - [Installation](#installation)
39
39
  - [Configuration](#configuration)
40
40
  - [Development Setup](#development-setup)
41
41
  - [Contributing](#contributing)
@@ -119,58 +119,9 @@ Vector Inspector bridges the gap between vector databases and user-friendly data
119
119
 
120
120
  ## Architecture
121
121
 
122
- ### Technology Stack
123
-
124
- #### Frontend (GUI)
125
- - **Framework**: PySide6 (Qt for Python) - native desktop application
126
- - **UI Components**: Qt Widgets for forms, dialogs, and application structure
127
- - **Visualization**:
128
- - Plotly for interactive charts (embedded via QWebEngineView)
129
- - matplotlib for static visualizations
130
- - **Data Grid**: QTableView with custom models for high-performance data display
131
-
132
- #### Backend
133
- - **Language**: Python 3.12
134
- - **Core Libraries**:
135
- - Vector DB clients: `chromadb`, `qdrant-client` (implemented), `pinecone-client`, `weaviate-client`, `pymilvus` (planned)
136
- - Embeddings: `sentence-transformers`, `fastembed` (implemented), `openai`, `cohere` (planned)
137
- - Data processing: `pandas`, `numpy`
138
- - Dimensionality reduction: `scikit-learn`, `umap-learn`
139
- - **API Layer**: FastAPI (planned for programmatic access) or direct Python integration
140
-
141
- #### Data Layer
142
- - **Connection Management**: Provider-specific connection classes with unified interface
143
- - **Query Abstraction**: Base connection interface that each provider implements
144
- - **Storage Modes**:
145
- - ChromaDB: Persistent local storage
146
- - Qdrant Remote: Connect via host/port (e.g., localhost:6333)
147
- - Qdrant Embedded: Local path storage without separate server
148
- - **Caching**: Redis or in-memory cache for frequently accessed data (planned)
149
- - **Settings Persistence**: User settings saved to ~/.vector-viewer/settings.json
150
-
151
- ### Application Structure
122
+ Vector Inspector is built with PySide6 (Qt for Python) for the GUI, providing a native desktop experience. The backend uses Python with support for multiple vector database providers through a unified interface.
152
123
 
153
- ```
154
- vector-viewer/
155
- ├── src/
156
- │ └── vector_viewer/
157
- │ ├── core/
158
- │ │ └── connections/ # Connection managers for each provider
159
- │ ├── ui/
160
- │ │ ├── components/ # Reusable UI components
161
- │ │ └── views/ # Main application views
162
- │ ├── services/ # Business logic services
163
- │ └── main.py # Application entry point
164
- ├── tests/
165
- ├── docs/
166
- ├── data/ # Local database storage
167
- │ ├── chroma_db/
168
- │ └── qdrant/
169
- ├── run.sh / run.bat # Launch scripts
170
- └── pyproject.toml
171
- ```
172
-
173
- User settings are saved to `~/.vector-viewer/settings.json`
124
+ For detailed architecture information, see [docs/architecture.md](docs/architecture.md).
174
125
 
175
126
  ## Use Cases
176
127
 
@@ -181,116 +132,29 @@ User settings are saved to `~/.vector-viewer/settings.json`
181
132
  5. **Data Migration**: Transfer data between vector database providers
182
133
  6. **Education**: Learn and experiment with vector databases interactively
183
134
 
184
- ## Feature Access (Free vs Pro)
185
-
186
- | Feature | Access |
187
- |----------------------------------------------|----------|
188
- | Connection to ChromaDB | Free |
189
- | Basic metadata browsing and filtering | Free |
190
- | Simple similarity search interface | Free |
191
- | 2D vector visualization (PCA/t-SNE) | Free |
192
- | Basic CRUD operations | Free |
193
- | Metadata filtering (advanced) | Free |
194
- | Item editing | Free |
195
- | Import/export (CSV, JSON, Parquet) | Free |
196
- | Provider abstraction layer | Free |
197
- | Pinecone support | Free |
198
- | Weaviate support | Free |
199
- | Qdrant support (basic/experimental) | Free |
200
- | Milvus support | Pro |
201
- | ChromaDB advanced support | Pro |
202
- | FAISS (local files) support | Pro |
203
- | pgvector (PostgreSQL extension) support | Pro |
204
- | Elasticsearch with vector search support | Pro |
205
- | Advanced query builder | Free |
206
- | 3D visualization | Free |
207
- | Embedding model integration (basic) | Free |
208
- | Query history and saved queries | Free |
209
- | Model Comparison Mode | Pro |
210
- | Cluster Explorer | Pro |
211
- | Embedding Inspector | Pro |
212
- | Embedding Provenance Graph | Pro |
213
- | Semantic Drift Timeline | Pro |
214
- | Cross-Collection Similarity | Pro |
215
- | Vector Surgery | Pro |
216
- | Custom plugin system | Pro |
217
- | Team collaboration features | Pro |
218
-
219
- > **Note:** Qdrant support is available for free users in the open source version (basic/experimental). Advanced Qdrant features (e.g., payload filtering, geo, cloud auth) may be reserved for Pro in the future.
220
-
221
- ## Planned Roadmap
222
-
223
- ### Phase 1: Foundation (MVP)
224
- - [x] Connection to ChromaDB
225
- - [x] Basic metadata browsing and filtering
226
- - [x] Simple similarity search interface
227
- - [x] 2D vector visualization (PCA/t-SNE)
228
- - [x] Basic CRUD operations
229
-
230
- ### Phase 2: Core Features
231
- - [x] Metadata filtering (advanced filtering, combine with search)
232
- - [x] Item editing (update metadata and documents)
233
- - [x] Import/export (CSV, JSON, Parquet, backup/restore)
234
- - [x] Provider abstraction layer (unified interface for all supported vector DBs)
235
- - [x] Qdrant support (basic/experimental, free)
236
-
237
- ### Phase 3: UX & Professional Polish
238
- - [ ] **Unified Information Panel** (new "Info" tab as default view)
239
- - [ ] Database and collection metadata display
240
- - [ ] Connection health and version information
241
- - [ ] Schema visualization and index configuration display
242
-
243
- ### Phase 4: Modular/Plugin System & Hybrid Model
244
- - [ ] Implement modular/plugin system for feature extensions
245
- - [ ] Migrate paid/advanced features to commercial modules
246
- - [ ] Add licensing/access control for commercial features
247
-
248
- ### Phase 5: Provider Expansion (Incremental)
249
- - [ ] Pinecone support (free)
250
- - [ ] Weaviate support (free)
251
- - [ ] Qdrant support (paid)
252
-
253
- #### Future/Backlog Providers
254
- - [ ] Milvus support (paid)
255
- - [ ] ChromaDB advanced support (paid)
256
- - [ ] FAISS (local files) support (paid)
257
- - [ ] pgvector (PostgreSQL extension) support (paid)
258
- - [ ] Elasticsearch with vector search support (paid)
259
-
260
-
261
- ### Phase 6A: Advanced Usability & Visualization
262
- - [ ] Advanced query builder (free)
263
- - [ ] 3D visualization (free)
264
- - [ ] Embedding model integration (free)
265
- - [ ] Query history and saved queries (free)
266
- - [ ] Metadata Type Detection & Rich Media Preview (free)
267
-
268
- ### Phase 6B: Analytical & Comparison Tools
269
- - [ ] Model Comparison Mode (paid)
270
- - [ ] Cluster Explorer (paid)
271
- - [ ] Embedding Inspector (paid)
272
- - [ ] Embedding Provenance Graph (paid)
273
-
274
- ### Phase 6C: Temporal & Cross-Collection Analytics
275
- - [ ] Semantic Drift Timeline (paid)
276
- - [ ] Cross-Collection Similarity (paid)
277
-
278
- ### Phase 6D: Experimental & Power Features
279
- - [ ] Vector Surgery (paid)
280
- - [ ] Custom plugin system (paid)
281
- - [ ] Team collaboration features (paid)
282
-
283
- ### Phase 7: Enterprise Features
284
- - [ ] Multi-user support with auth
285
- - [ ] Audit logging
286
- - [ ] Advanced security features
287
- - [ ] Custom reporting
288
- - [ ] API for programmatic access (FastAPI backend)
289
- - [ ] Caching layer (Redis/in-memory) for performance
290
- - [ ] Connection pooling and optimization
135
+ ## Feature Access
136
+
137
+ Vector Inspector is available in both free (open source) and Pro versions. The free version includes all core features for ChromaDB and basic Qdrant support, while Pro adds advanced analytics and additional providers.
138
+
139
+ See [FEATURES.md](FEATURES.md) for a complete feature comparison.
140
+
141
+ ## Roadmap
142
+
143
+ **Current Status**: Phase 2 Complete
144
+
145
+ See [ROADMAP.md](ROADMAP.md) for the complete development roadmap and planned features.
291
146
 
292
147
  ## Installation
293
148
 
149
+ ### From PyPI (Recommended)
150
+
151
+ ```bash
152
+ pip install vector-inspector
153
+ vector-inspector
154
+ ```
155
+
156
+ ### From Source
157
+
294
158
  ```bash
295
159
  # Clone the repository
296
160
  git clone https://github.com/anthonypdawson/vector-viewer.git
@@ -0,0 +1,225 @@
1
+ # Vector Inspector
2
+
3
+
4
+ A comprehensive desktop application for visualizing, querying, and managing vector database data. Similar to SQL database viewers, Vector Inspector provides an intuitive GUI for exploring vector embeddings, metadata, and performing similarity searches across multiple vector database providers.
5
+
6
+ ## Overview
7
+
8
+ ## Table of Contents
9
+
10
+ - [Overview](#overview)
11
+ - [Key Features](#key-features)
12
+ - [Architecture](#architecture)
13
+ - [Application Structure](#application-structure)
14
+ - [Use Cases](#use-cases)
15
+ - [Feature Access](#feature-access)
16
+ - [Roadmap](#roadmap)
17
+ - [Installation](#installation)
18
+ - [Configuration](#configuration)
19
+ - [Development Setup](#development-setup)
20
+ - [Contributing](#contributing)
21
+ - [License](#license)
22
+ - [Acknowledgments](#acknowledgments)
23
+
24
+ Vector Inspector bridges the gap between vector databases and user-friendly data exploration tools. While vector databases are powerful for semantic search and AI applications, they often lack the intuitive inspection and management tools that traditional SQL databases have. This project aims to provide that missing layer.
25
+
26
+ ## Key Features
27
+
28
+ ### 1. **Multi-Provider Support**
29
+ - Connect to vector databases:
30
+ - ChromaDB (persistent local storage)
31
+ - Qdrant (remote server or embedded local)
32
+ - Unified interface regardless of backend provider
33
+ - Automatically saves last connection configuration
34
+
35
+ ### 2. **Data Visualization**
36
+ - **Metadata Explorer**: Browse and filter vector entries by metadata fields
37
+ - **Vector Dimensionality Reduction**: Visualize high-dimensional vectors in 2D/3D using:
38
+ - t-SNE
39
+ - UMAP
40
+ - PCA
41
+ - **Cluster Visualization**: Color-code vectors by metadata categories or clustering results
42
+ - **Interactive Plots**: Zoom, pan, and select vectors for detailed inspection
43
+ - **Data Distribution Charts**: Histograms and statistics for metadata fields
44
+
45
+ ### 3. **Search & Query Interface**
46
+ - **Similarity Search**:
47
+ - Text-to-vector search (with embedding model integration)
48
+ - Vector-to-vector search
49
+ - Find similar items to selected entries
50
+ - Adjustable top-k results and similarity thresholds
51
+ - **Metadata Filtering**:
52
+ - SQL-like query builder for metadata
53
+ - Combine vector similarity with metadata filters
54
+ - Advanced filtering: ranges, IN clauses, pattern matching
55
+ - **Hybrid Search**: Combine semantic search with keyword search
56
+ - **Query History**: Save and reuse frequent queries
57
+
58
+ ### 4. **Data Management**
59
+ - **Browse Collections/Indexes**: View all available collections with statistics
60
+ - **CRUD Operations**:
61
+ - View individual vectors and their metadata
62
+ - Add new vectors (with auto-embedding options)
63
+ - Update metadata fields
64
+ - Delete vectors (single or batch)
65
+ - **Bulk Import/Export**:
66
+ - Import from CSV, JSON, Parquet
67
+ - Export query results to various formats
68
+ - Backup and restore collections
69
+ - **Schema Inspector**: View collection configuration, vector dimensions, metadata schema
70
+
71
+ ### 5. **SQL-Like Experience**
72
+ - **Query Console**: Write queries in a familiar SQL-like syntax (where supported)
73
+ - **Results Grid**:
74
+ - Sortable, filterable table view
75
+ - Pagination for large result sets
76
+ - Column customization
77
+ - **Data Inspector**: Click any row to see full details including raw vector
78
+ - **Query Execution Plans**: Understand how queries are executed
79
+ - **Auto-completion**: Intelligent suggestions for collection names, fields, and operations
80
+
81
+ ### 6. **Advanced Features**
82
+ - **Embedding Model Integration**:
83
+ - Use OpenAI, Cohere, HuggingFace models for text-to-vector conversion
84
+ - Local model support (sentence-transformers)
85
+ - Custom model integration
86
+ - **Vector Analysis**:
87
+ - Compute similarity matrices
88
+ - Identify outliers and anomalies
89
+ - Cluster analysis with k-means, DBSCAN
90
+ - **Embedding Inspector**:
91
+ - For similar collections or items, automatically identify which vector dimensions (activations) most contribute to the similarity
92
+ - Map key activations to interpretable concepts (e.g., 'humor', 'sadness', 'anger') using metadata or labels
93
+ - Generate human-readable explanations for why items are similar
94
+ - **Performance Monitoring**:
95
+ - Query latency tracking
96
+ - Index performance metrics
97
+ - Connection health monitoring
98
+
99
+ ## Architecture
100
+
101
+ Vector Inspector is built with PySide6 (Qt for Python) for the GUI, providing a native desktop experience. The backend uses Python with support for multiple vector database providers through a unified interface.
102
+
103
+ For detailed architecture information, see [docs/architecture.md](docs/architecture.md).
104
+
105
+ ## Use Cases
106
+
107
+ 1. **AI/ML Development**: Inspect embeddings generated during model development
108
+ 2. **RAG System Debugging**: Verify what documents are being retrieved
109
+ 3. **Data Quality Assurance**: Identify poorly embedded or outlier vectors
110
+ 4. **Production Monitoring**: Check vector database health and data consistency
111
+ 5. **Data Migration**: Transfer data between vector database providers
112
+ 6. **Education**: Learn and experiment with vector databases interactively
113
+
114
+ ## Feature Access
115
+
116
+ Vector Inspector is available in both free (open source) and Pro versions. The free version includes all core features for ChromaDB and basic Qdrant support, while Pro adds advanced analytics and additional providers.
117
+
118
+ See [FEATURES.md](FEATURES.md) for a complete feature comparison.
119
+
120
+ ## Roadmap
121
+
122
+ **Current Status**: ✅ Phase 2 Complete
123
+
124
+ See [ROADMAP.md](ROADMAP.md) for the complete development roadmap and planned features.
125
+
126
+ ## Installation
127
+
128
+ ### From PyPI (Recommended)
129
+
130
+ ```bash
131
+ pip install vector-inspector
132
+ vector-inspector
133
+ ```
134
+
135
+ ### From Source
136
+
137
+ ```bash
138
+ # Clone the repository
139
+ git clone https://github.com/anthonypdawson/vector-viewer.git
140
+ cd vector-viewer
141
+
142
+ # Install dependencies using PDM
143
+ pdm install
144
+
145
+ # Launch application
146
+ ./run.sh # Linux/macOS
147
+ ./run.bat # Windows
148
+ ```
149
+
150
+ ## Configuration
151
+
152
+ Paths are resolved relative to the project root (where `pyproject.toml` is). For example, entering `./data/chroma_db` will use the absolute path resolved from the project root.
153
+
154
+ The application automatically saves your last connection configuration to `~/.vector-viewer/settings.json`. The next time you launch the application, it will attempt to reconnect using the last saved settings.
155
+
156
+ Example settings structure:
157
+ ```json
158
+ {
159
+ "last_connection": {
160
+ "provider": "chromadb",
161
+ "connection_type": "persistent",
162
+ "path": "./data/chroma_db"
163
+ }
164
+ }
165
+ ```
166
+
167
+ ## Development Setup
168
+
169
+ ```bash
170
+ # Install PDM if you haven't already
171
+ pip install pdm
172
+
173
+ # Install dependencies with development tools (PDM will create venv automatically)
174
+ pdm install -d
175
+
176
+ # Run tests
177
+ pdm run pytest
178
+
179
+ # Run application in development mode
180
+ ./run.sh # Linux/macOS
181
+ ./run.bat # Windows
182
+
183
+ # Or use Python module directly from src directory:
184
+ cd src
185
+ pdm run python -m vector_viewer
186
+ ```
187
+
188
+ ## Contributing
189
+
190
+ Contributions are welcome! Areas where help is needed:
191
+ - Additional vector database provider integrations
192
+ - UI/UX improvements
193
+ - Performance optimizations
194
+ - Documentation
195
+ - Test coverage
196
+
197
+ Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
198
+
199
+ ## License
200
+
201
+ MIT License - See [LICENSE](LICENSE) file for details.
202
+
203
+ ## Acknowledgments
204
+
205
+ This project draws inspiration from:
206
+ - DBeaver (SQL database viewer)
207
+ - MongoDB Compass (NoSQL database GUI)
208
+ - Pinecone Console
209
+ - Various vector database management tools
210
+
211
+ ---
212
+
213
+ **Status**: ✅ Phase 2 Complete - Advanced Features Implemented!
214
+
215
+ **What's New in Phase 2:**
216
+ - 🔍 Advanced metadata filtering with customizable filter rules (AND/OR logic)
217
+ - ✏️ Double-click to edit items directly in the data browser
218
+ - 📥 Import data from CSV, JSON, and Parquet files
219
+ - 📤 Export filtered data to CSV, JSON, and Parquet formats
220
+ - 💾 Comprehensive backup and restore system for collections
221
+ - 🔄 Metadata filters integrated with search for powerful queries
222
+
223
+ See [GETTING_STARTED.md](GETTING_STARTED.md) for usage instructions and [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md) for technical details.
224
+
225
+ **Contact**: Anthony Dawson
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "vector-inspector"
3
- version = "0.2.0"
3
+ version = "0.2.1"
4
4
  description = "A comprehensive desktop application for visualizing, querying, and managing vector database data"
5
5
  authors = [
6
6
  { name = "Anthony Dawson", email = "anthonypdawson+github@gmail.com" },
@@ -1,361 +0,0 @@
1
- # Vector Inspector
2
-
3
-
4
- A comprehensive desktop application for visualizing, querying, and managing vector database data. Similar to SQL database viewers, Vector Inspector provides an intuitive GUI for exploring vector embeddings, metadata, and performing similarity searches across multiple vector database providers.
5
-
6
- ## Overview
7
-
8
- ## Table of Contents
9
-
10
- - [Overview](#overview)
11
- - [Key Features](#key-features)
12
- - [Architecture](#architecture)
13
- - [Application Structure](#application-structure)
14
- - [Use Cases](#use-cases)
15
- - [Feature Access (Free vs Pro)](#feature-access-free-vs-pro)
16
- - [Planned Roadmap](#planned-roadmap)
17
- - [Installation (Planned)](#installation-planned)
18
- - [Configuration](#configuration)
19
- - [Development Setup](#development-setup)
20
- - [Contributing](#contributing)
21
- - [License](#license)
22
- - [Acknowledgments](#acknowledgments)
23
-
24
- Vector Inspector bridges the gap between vector databases and user-friendly data exploration tools. While vector databases are powerful for semantic search and AI applications, they often lack the intuitive inspection and management tools that traditional SQL databases have. This project aims to provide that missing layer.
25
-
26
- ## Key Features
27
-
28
- ### 1. **Multi-Provider Support**
29
- - Connect to vector databases:
30
- - ChromaDB (persistent local storage)
31
- - Qdrant (remote server or embedded local)
32
- - Unified interface regardless of backend provider
33
- - Automatically saves last connection configuration
34
-
35
- ### 2. **Data Visualization**
36
- - **Metadata Explorer**: Browse and filter vector entries by metadata fields
37
- - **Vector Dimensionality Reduction**: Visualize high-dimensional vectors in 2D/3D using:
38
- - t-SNE
39
- - UMAP
40
- - PCA
41
- - **Cluster Visualization**: Color-code vectors by metadata categories or clustering results
42
- - **Interactive Plots**: Zoom, pan, and select vectors for detailed inspection
43
- - **Data Distribution Charts**: Histograms and statistics for metadata fields
44
-
45
- ### 3. **Search & Query Interface**
46
- - **Similarity Search**:
47
- - Text-to-vector search (with embedding model integration)
48
- - Vector-to-vector search
49
- - Find similar items to selected entries
50
- - Adjustable top-k results and similarity thresholds
51
- - **Metadata Filtering**:
52
- - SQL-like query builder for metadata
53
- - Combine vector similarity with metadata filters
54
- - Advanced filtering: ranges, IN clauses, pattern matching
55
- - **Hybrid Search**: Combine semantic search with keyword search
56
- - **Query History**: Save and reuse frequent queries
57
-
58
- ### 4. **Data Management**
59
- - **Browse Collections/Indexes**: View all available collections with statistics
60
- - **CRUD Operations**:
61
- - View individual vectors and their metadata
62
- - Add new vectors (with auto-embedding options)
63
- - Update metadata fields
64
- - Delete vectors (single or batch)
65
- - **Bulk Import/Export**:
66
- - Import from CSV, JSON, Parquet
67
- - Export query results to various formats
68
- - Backup and restore collections
69
- - **Schema Inspector**: View collection configuration, vector dimensions, metadata schema
70
-
71
- ### 5. **SQL-Like Experience**
72
- - **Query Console**: Write queries in a familiar SQL-like syntax (where supported)
73
- - **Results Grid**:
74
- - Sortable, filterable table view
75
- - Pagination for large result sets
76
- - Column customization
77
- - **Data Inspector**: Click any row to see full details including raw vector
78
- - **Query Execution Plans**: Understand how queries are executed
79
- - **Auto-completion**: Intelligent suggestions for collection names, fields, and operations
80
-
81
- ### 6. **Advanced Features**
82
- - **Embedding Model Integration**:
83
- - Use OpenAI, Cohere, HuggingFace models for text-to-vector conversion
84
- - Local model support (sentence-transformers)
85
- - Custom model integration
86
- - **Vector Analysis**:
87
- - Compute similarity matrices
88
- - Identify outliers and anomalies
89
- - Cluster analysis with k-means, DBSCAN
90
- - **Embedding Inspector**:
91
- - For similar collections or items, automatically identify which vector dimensions (activations) most contribute to the similarity
92
- - Map key activations to interpretable concepts (e.g., 'humor', 'sadness', 'anger') using metadata or labels
93
- - Generate human-readable explanations for why items are similar
94
- - **Performance Monitoring**:
95
- - Query latency tracking
96
- - Index performance metrics
97
- - Connection health monitoring
98
-
99
- ## Architecture
100
-
101
- ### Technology Stack
102
-
103
- #### Frontend (GUI)
104
- - **Framework**: PySide6 (Qt for Python) - native desktop application
105
- - **UI Components**: Qt Widgets for forms, dialogs, and application structure
106
- - **Visualization**:
107
- - Plotly for interactive charts (embedded via QWebEngineView)
108
- - matplotlib for static visualizations
109
- - **Data Grid**: QTableView with custom models for high-performance data display
110
-
111
- #### Backend
112
- - **Language**: Python 3.12
113
- - **Core Libraries**:
114
- - Vector DB clients: `chromadb`, `qdrant-client` (implemented), `pinecone-client`, `weaviate-client`, `pymilvus` (planned)
115
- - Embeddings: `sentence-transformers`, `fastembed` (implemented), `openai`, `cohere` (planned)
116
- - Data processing: `pandas`, `numpy`
117
- - Dimensionality reduction: `scikit-learn`, `umap-learn`
118
- - **API Layer**: FastAPI (planned for programmatic access) or direct Python integration
119
-
120
- #### Data Layer
121
- - **Connection Management**: Provider-specific connection classes with unified interface
122
- - **Query Abstraction**: Base connection interface that each provider implements
123
- - **Storage Modes**:
124
- - ChromaDB: Persistent local storage
125
- - Qdrant Remote: Connect via host/port (e.g., localhost:6333)
126
- - Qdrant Embedded: Local path storage without separate server
127
- - **Caching**: Redis or in-memory cache for frequently accessed data (planned)
128
- - **Settings Persistence**: User settings saved to ~/.vector-viewer/settings.json
129
-
130
- ### Application Structure
131
-
132
- ```
133
- vector-viewer/
134
- ├── src/
135
- │ └── vector_viewer/
136
- │ ├── core/
137
- │ │ └── connections/ # Connection managers for each provider
138
- │ ├── ui/
139
- │ │ ├── components/ # Reusable UI components
140
- │ │ └── views/ # Main application views
141
- │ ├── services/ # Business logic services
142
- │ └── main.py # Application entry point
143
- ├── tests/
144
- ├── docs/
145
- ├── data/ # Local database storage
146
- │ ├── chroma_db/
147
- │ └── qdrant/
148
- ├── run.sh / run.bat # Launch scripts
149
- └── pyproject.toml
150
- ```
151
-
152
- User settings are saved to `~/.vector-viewer/settings.json`
153
-
154
- ## Use Cases
155
-
156
- 1. **AI/ML Development**: Inspect embeddings generated during model development
157
- 2. **RAG System Debugging**: Verify what documents are being retrieved
158
- 3. **Data Quality Assurance**: Identify poorly embedded or outlier vectors
159
- 4. **Production Monitoring**: Check vector database health and data consistency
160
- 5. **Data Migration**: Transfer data between vector database providers
161
- 6. **Education**: Learn and experiment with vector databases interactively
162
-
163
- ## Feature Access (Free vs Pro)
164
-
165
- | Feature | Access |
166
- |----------------------------------------------|----------|
167
- | Connection to ChromaDB | Free |
168
- | Basic metadata browsing and filtering | Free |
169
- | Simple similarity search interface | Free |
170
- | 2D vector visualization (PCA/t-SNE) | Free |
171
- | Basic CRUD operations | Free |
172
- | Metadata filtering (advanced) | Free |
173
- | Item editing | Free |
174
- | Import/export (CSV, JSON, Parquet) | Free |
175
- | Provider abstraction layer | Free |
176
- | Pinecone support | Free |
177
- | Weaviate support | Free |
178
- | Qdrant support (basic/experimental) | Free |
179
- | Milvus support | Pro |
180
- | ChromaDB advanced support | Pro |
181
- | FAISS (local files) support | Pro |
182
- | pgvector (PostgreSQL extension) support | Pro |
183
- | Elasticsearch with vector search support | Pro |
184
- | Advanced query builder | Free |
185
- | 3D visualization | Free |
186
- | Embedding model integration (basic) | Free |
187
- | Query history and saved queries | Free |
188
- | Model Comparison Mode | Pro |
189
- | Cluster Explorer | Pro |
190
- | Embedding Inspector | Pro |
191
- | Embedding Provenance Graph | Pro |
192
- | Semantic Drift Timeline | Pro |
193
- | Cross-Collection Similarity | Pro |
194
- | Vector Surgery | Pro |
195
- | Custom plugin system | Pro |
196
- | Team collaboration features | Pro |
197
-
198
- > **Note:** Qdrant support is available for free users in the open source version (basic/experimental). Advanced Qdrant features (e.g., payload filtering, geo, cloud auth) may be reserved for Pro in the future.
199
-
200
- ## Planned Roadmap
201
-
202
- ### Phase 1: Foundation (MVP)
203
- - [x] Connection to ChromaDB
204
- - [x] Basic metadata browsing and filtering
205
- - [x] Simple similarity search interface
206
- - [x] 2D vector visualization (PCA/t-SNE)
207
- - [x] Basic CRUD operations
208
-
209
- ### Phase 2: Core Features
210
- - [x] Metadata filtering (advanced filtering, combine with search)
211
- - [x] Item editing (update metadata and documents)
212
- - [x] Import/export (CSV, JSON, Parquet, backup/restore)
213
- - [x] Provider abstraction layer (unified interface for all supported vector DBs)
214
- - [x] Qdrant support (basic/experimental, free)
215
-
216
- ### Phase 3: UX & Professional Polish
217
- - [ ] **Unified Information Panel** (new "Info" tab as default view)
218
- - [ ] Database and collection metadata display
219
- - [ ] Connection health and version information
220
- - [ ] Schema visualization and index configuration display
221
-
222
- ### Phase 4: Modular/Plugin System & Hybrid Model
223
- - [ ] Implement modular/plugin system for feature extensions
224
- - [ ] Migrate paid/advanced features to commercial modules
225
- - [ ] Add licensing/access control for commercial features
226
-
227
- ### Phase 5: Provider Expansion (Incremental)
228
- - [ ] Pinecone support (free)
229
- - [ ] Weaviate support (free)
230
- - [ ] Qdrant support (paid)
231
-
232
- #### Future/Backlog Providers
233
- - [ ] Milvus support (paid)
234
- - [ ] ChromaDB advanced support (paid)
235
- - [ ] FAISS (local files) support (paid)
236
- - [ ] pgvector (PostgreSQL extension) support (paid)
237
- - [ ] Elasticsearch with vector search support (paid)
238
-
239
-
240
- ### Phase 6A: Advanced Usability & Visualization
241
- - [ ] Advanced query builder (free)
242
- - [ ] 3D visualization (free)
243
- - [ ] Embedding model integration (free)
244
- - [ ] Query history and saved queries (free)
245
- - [ ] Metadata Type Detection & Rich Media Preview (free)
246
-
247
- ### Phase 6B: Analytical & Comparison Tools
248
- - [ ] Model Comparison Mode (paid)
249
- - [ ] Cluster Explorer (paid)
250
- - [ ] Embedding Inspector (paid)
251
- - [ ] Embedding Provenance Graph (paid)
252
-
253
- ### Phase 6C: Temporal & Cross-Collection Analytics
254
- - [ ] Semantic Drift Timeline (paid)
255
- - [ ] Cross-Collection Similarity (paid)
256
-
257
- ### Phase 6D: Experimental & Power Features
258
- - [ ] Vector Surgery (paid)
259
- - [ ] Custom plugin system (paid)
260
- - [ ] Team collaboration features (paid)
261
-
262
- ### Phase 7: Enterprise Features
263
- - [ ] Multi-user support with auth
264
- - [ ] Audit logging
265
- - [ ] Advanced security features
266
- - [ ] Custom reporting
267
- - [ ] API for programmatic access (FastAPI backend)
268
- - [ ] Caching layer (Redis/in-memory) for performance
269
- - [ ] Connection pooling and optimization
270
-
271
- ## Installation
272
-
273
- ```bash
274
- # Clone the repository
275
- git clone https://github.com/anthonypdawson/vector-viewer.git
276
- cd vector-viewer
277
-
278
- # Install dependencies using PDM
279
- pdm install
280
-
281
- # Launch application
282
- ./run.sh # Linux/macOS
283
- ./run.bat # Windows
284
- ```
285
-
286
- ## Configuration
287
-
288
- Paths are resolved relative to the project root (where `pyproject.toml` is). For example, entering `./data/chroma_db` will use the absolute path resolved from the project root.
289
-
290
- The application automatically saves your last connection configuration to `~/.vector-viewer/settings.json`. The next time you launch the application, it will attempt to reconnect using the last saved settings.
291
-
292
- Example settings structure:
293
- ```json
294
- {
295
- "last_connection": {
296
- "provider": "chromadb",
297
- "connection_type": "persistent",
298
- "path": "./data/chroma_db"
299
- }
300
- }
301
- ```
302
-
303
- ## Development Setup
304
-
305
- ```bash
306
- # Install PDM if you haven't already
307
- pip install pdm
308
-
309
- # Install dependencies with development tools (PDM will create venv automatically)
310
- pdm install -d
311
-
312
- # Run tests
313
- pdm run pytest
314
-
315
- # Run application in development mode
316
- ./run.sh # Linux/macOS
317
- ./run.bat # Windows
318
-
319
- # Or use Python module directly from src directory:
320
- cd src
321
- pdm run python -m vector_viewer
322
- ```
323
-
324
- ## Contributing
325
-
326
- Contributions are welcome! Areas where help is needed:
327
- - Additional vector database provider integrations
328
- - UI/UX improvements
329
- - Performance optimizations
330
- - Documentation
331
- - Test coverage
332
-
333
- Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
334
-
335
- ## License
336
-
337
- MIT License - See [LICENSE](LICENSE) file for details.
338
-
339
- ## Acknowledgments
340
-
341
- This project draws inspiration from:
342
- - DBeaver (SQL database viewer)
343
- - MongoDB Compass (NoSQL database GUI)
344
- - Pinecone Console
345
- - Various vector database management tools
346
-
347
- ---
348
-
349
- **Status**: ✅ Phase 2 Complete - Advanced Features Implemented!
350
-
351
- **What's New in Phase 2:**
352
- - 🔍 Advanced metadata filtering with customizable filter rules (AND/OR logic)
353
- - ✏️ Double-click to edit items directly in the data browser
354
- - 📥 Import data from CSV, JSON, and Parquet files
355
- - 📤 Export filtered data to CSV, JSON, and Parquet formats
356
- - 💾 Comprehensive backup and restore system for collections
357
- - 🔄 Metadata filters integrated with search for powerful queries
358
-
359
- See [GETTING_STARTED.md](GETTING_STARTED.md) for usage instructions and [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md) for technical details.
360
-
361
- **Contact**: Anthony Dawson