feather-db 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,70 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2025-11-16
11
+
12
+ ### Added
13
+ - Initial release of Feather DB
14
+ - Python API with NumPy integration and pybind11 bindings
15
+ - C++ core implementation with HNSW algorithm
16
+ - Rust CLI for command-line operations
17
+ - Binary file format with magic number validation for persistence
18
+ - Support for L2 (Euclidean) distance metric
19
+ - SIMD optimizations (AVX512/AVX/SSE) for distance calculations
20
+ - Comprehensive documentation:
21
+ - HOW_TO_USE.md - Beginner-friendly guide
22
+ - USAGE_GUIDE.md - Complete API reference
23
+ - Architecture diagrams and internals documentation
24
+ - Working examples for common use cases:
25
+ - Basic Python usage
26
+ - Semantic search implementation
27
+ - Batch processing for large datasets
28
+ - Batch processing capabilities with periodic saves
29
+ - Configurable k parameter for search results
30
+ - Automatic database save on destruction
31
+ - Memory-efficient vector storage
32
+
33
+ ### Features
34
+ - **Fast Search**: Approximate nearest neighbor search using HNSW algorithm
35
+ - **Multi-Language**: Python, C++, and Rust APIs
36
+ - **Persistent Storage**: Custom binary format with header validation
37
+ - **Scalable**: Supports up to 1 million vectors (configurable)
38
+ - **Easy to Use**: Simple, intuitive APIs across all languages
39
+ - **Production Ready**: Tested with comprehensive test suite
40
+
41
+ ### Performance
42
+ - Add rate: 2,000-5,000 vectors/second (depending on dimension)
43
+ - Search time: 0.5-1.5ms per query (k=10)
44
+ - Memory usage: ~4 bytes per dimension per vector + index overhead
45
+ - Tested with up to 10,000 vectors
46
+
47
+ ### Documentation
48
+ - Complete usage guide with real-world examples
49
+ - Beginner-friendly how-to guide
50
+ - Architecture documentation with visual diagrams
51
+ - Performance benchmarks and optimization tips
52
+ - Troubleshooting guide
53
+ - API reference for all three languages
54
+
55
+ ### Testing
56
+ - Automated test suite for Rust CLI
57
+ - Test data generation scripts
58
+ - Validation of search accuracy
59
+ - Binary format verification
60
+ - Memory leak testing
61
+
62
+ ### Known Limitations
63
+ - Maximum 1 million vectors (configurable in C++ code)
64
+ - Only L2 distance metric supported
65
+ - No vector deletion functionality
66
+ - No metadata storage with vectors
67
+ - Single-threaded operations
68
+
69
+ [Unreleased]: https://github.com/yourusername/feather-db/compare/v0.1.0...HEAD
70
+ [0.1.0]: https://github.com/yourusername/feather-db/releases/tag/v0.1.0
@@ -0,0 +1,341 @@
1
+ # Contributing to Feather DB
2
+
3
+ Thank you for your interest in contributing to Feather DB! We welcome contributions from the community.
4
+
5
+ ## 🤝 How to Contribute
6
+
7
+ ### Reporting Bugs
8
+
9
+ If you find a bug, please open an issue on GitHub with:
10
+
11
+ - **Clear description** of the problem
12
+ - **Steps to reproduce** the issue
13
+ - **Expected behavior** vs **actual behavior**
14
+ - **Environment details**:
15
+ - Operating system and version
16
+ - Python version (if using Python API)
17
+ - Rust version (if using CLI)
18
+ - Compiler version (if building from source)
19
+ - **Code sample** that demonstrates the issue (if applicable)
20
+ - **Error messages** or stack traces
21
+
22
+ ### Suggesting Features
23
+
24
+ Feature requests are welcome! Please open an issue with:
25
+
26
+ - **Clear description** of the proposed feature
27
+ - **Use case** - why is this feature needed?
28
+ - **Examples** of how it would be used
29
+ - **Alternatives** you've considered
30
+
31
+ ### Asking Questions
32
+
33
+ For questions about usage:
34
+ - Check the [documentation](HOW_TO_USE.md) first
35
+ - Search existing issues
36
+ - Open a new issue with the "question" label
37
+
38
+ ## 🔧 Development Setup
39
+
40
+ ### Prerequisites
41
+
42
+ - **C++ compiler** with C++17 support (GCC, Clang, or MSVC)
43
+ - **Python 3.8+** with pip
44
+ - **Rust 1.70+** (for CLI development)
45
+ - **Git**
46
+
47
+ ### Setting Up Development Environment
48
+
49
+ ```bash
50
+ # 1. Fork and clone the repository
51
+ git clone https://github.com/yourusername/feather-db.git
52
+ cd feather-db
53
+
54
+ # 2. Build C++ core
55
+ g++ -O3 -std=c++17 -fPIC -c src/feather_core.cpp -o feather_core.o
56
+ ar rcs libfeather.a feather_core.o
57
+
58
+ # 3. Install Python dependencies
59
+ pip install pybind11 numpy
60
+
61
+ # 4. Build Python bindings
62
+ python setup.py build_ext --inplace
63
+
64
+ # 5. Install in development mode
65
+ pip install -e .
66
+
67
+ # 6. Build Rust CLI (optional)
68
+ cd feather-cli
69
+ cargo build --release
70
+ cd ..
71
+ ```
72
+
73
+ ### Running Tests
74
+
75
+ ```bash
76
+ # Run Python examples
77
+ python3 examples/basic_python_example.py
78
+ python3 examples/semantic_search_example.py
79
+ python3 examples/batch_processing_example.py
80
+
81
+ # Run Rust CLI tests
82
+ ./p-test/run_tests.sh
83
+
84
+ # Generate test data
85
+ python3 p-test/test_rust_cli.py
86
+ ```
87
+
88
+ ## 📝 Pull Request Process
89
+
90
+ ### 1. Create a Branch
91
+
92
+ ```bash
93
+ git checkout -b feature/your-feature-name
94
+ # or
95
+ git checkout -b fix/your-bug-fix
96
+ ```
97
+
98
+ ### 2. Make Your Changes
99
+
100
+ - Write clear, readable code
101
+ - Follow existing code style
102
+ - Add comments for complex logic
103
+ - Update documentation if needed
104
+
105
+ ### 3. Test Your Changes
106
+
107
+ - Ensure all existing tests pass
108
+ - Add new tests for new features
109
+ - Test on multiple platforms if possible
110
+ - Check for memory leaks (C++ changes)
111
+
112
+ ### 4. Commit Your Changes
113
+
114
+ ```bash
115
+ git add .
116
+ git commit -m "Add feature: brief description"
117
+ ```
118
+
119
+ **Commit message guidelines:**
120
+ - Use present tense ("Add feature" not "Added feature")
121
+ - Use imperative mood ("Move cursor to..." not "Moves cursor to...")
122
+ - First line should be 50 characters or less
123
+ - Reference issues and pull requests when relevant
124
+
125
+ ### 5. Push and Create Pull Request
126
+
127
+ ```bash
128
+ git push origin feature/your-feature-name
129
+ ```
130
+
131
+ Then open a pull request on GitHub with:
132
+ - **Clear title** describing the change
133
+ - **Description** of what changed and why
134
+ - **Related issues** (if any)
135
+ - **Testing done** to verify the changes
136
+ - **Screenshots** (if applicable)
137
+
138
+ ### 6. Code Review
139
+
140
+ - Respond to feedback promptly
141
+ - Make requested changes
142
+ - Keep the discussion focused and professional
143
+
144
+ ## 💻 Code Style Guidelines
145
+
146
+ ### Python
147
+
148
+ - Follow [PEP 8](https://pep8.org/)
149
+ - Use meaningful variable names
150
+ - Add docstrings to functions
151
+ - Type hints are encouraged
152
+
153
+ ```python
154
+ def add_vector(db: feather_py.DB, id: int, vector: np.ndarray) -> None:
155
+ """
156
+ Add a vector to the database.
157
+
158
+ Args:
159
+ db: Feather database instance
160
+ id: Unique identifier for the vector
161
+ vector: NumPy array of floats
162
+ """
163
+ db.add(id=id, vec=vector)
164
+ ```
165
+
166
+ ### C++
167
+
168
+ - Use C++17 standard
169
+ - Follow existing naming conventions
170
+ - Use smart pointers for memory management
171
+ - Add comments for complex algorithms
172
+
173
+ ```cpp
174
+ // Good: Clear naming and smart pointers
175
+ std::unique_ptr<DB> db = DB::open("db.feather", 768);
176
+
177
+ // Bad: Raw pointers and unclear names
178
+ DB* d = new DB();
179
+ ```
180
+
181
+ ### Rust
182
+
183
+ - Run `cargo fmt` before committing
184
+ - Run `cargo clippy` and fix warnings
185
+ - Follow Rust naming conventions
186
+ - Add documentation comments
187
+
188
+ ```rust
189
+ /// Opens a database at the specified path
190
+ pub fn open(path: &Path, dim: usize) -> Option<Self> {
191
+ // Implementation
192
+ }
193
+ ```
194
+
195
+ ## 🧪 Testing Guidelines
196
+
197
+ ### Adding Tests
198
+
199
+ When adding new features:
200
+ 1. Add unit tests for core functionality
201
+ 2. Add integration tests for API changes
202
+ 3. Update example code if relevant
203
+ 4. Test edge cases and error conditions
204
+
205
+ ### Test Coverage
206
+
207
+ - Aim for high test coverage
208
+ - Test both success and failure cases
209
+ - Test with different dimensions
210
+ - Test with large datasets
211
+
212
+ ## 📚 Documentation Guidelines
213
+
214
+ ### Updating Documentation
215
+
216
+ When making changes that affect users:
217
+ - Update relevant markdown files
218
+ - Add examples for new features
219
+ - Update API reference
220
+ - Keep documentation clear and concise
221
+
222
+ ### Documentation Files
223
+
224
+ - `README.md` - Project overview and quick start
225
+ - `HOW_TO_USE.md` - Beginner-friendly guide
226
+ - `USAGE_GUIDE.md` - Complete API reference
227
+ - `examples/` - Working code examples
228
+ - `CHANGELOG.md` - Version history
229
+
230
+ ## 🐛 Debugging Tips
231
+
232
+ ### Python Issues
233
+
234
+ ```python
235
+ # Enable verbose output
236
+ import logging
237
+ logging.basicConfig(level=logging.DEBUG)
238
+
239
+ # Check vector dimensions
240
+ print(f"Vector shape: {vector.shape}")
241
+ print(f"Database dimension: {db.dim()}")
242
+ ```
243
+
244
+ ### C++ Issues
245
+
246
+ ```bash
247
+ # Compile with debug symbols
248
+ g++ -g -std=c++17 src/feather_core.cpp -o test
249
+
250
+ # Use valgrind for memory leaks
251
+ valgrind --leak-check=full ./test
252
+ ```
253
+
254
+ ### Rust Issues
255
+
256
+ ```bash
257
+ # Run with backtrace
258
+ RUST_BACKTRACE=1 cargo run
259
+
260
+ # Check for common issues
261
+ cargo clippy
262
+ ```
263
+
264
+ ## 🌟 Areas for Contribution
265
+
266
+ We especially welcome contributions in these areas:
267
+
268
+ ### High Priority
269
+ - [ ] Add vector deletion functionality
270
+ - [ ] Support for cosine similarity
271
+ - [ ] Metadata storage with vectors
272
+ - [ ] Multi-threaded operations
273
+ - [ ] Python type stubs (.pyi files)
274
+
275
+ ### Medium Priority
276
+ - [ ] Additional distance metrics (Manhattan, Hamming)
277
+ - [ ] Batch search operations
278
+ - [ ] Progress callbacks for long operations
279
+ - [ ] Compression for storage
280
+ - [ ] Python async API
281
+
282
+ ### Low Priority
283
+ - [ ] Web API/REST interface
284
+ - [ ] Docker container
285
+ - [ ] Benchmarking suite
286
+ - [ ] Additional language bindings (Go, Java)
287
+ - [ ] GUI tool
288
+
289
+ ## 📜 Code of Conduct
290
+
291
+ ### Our Standards
292
+
293
+ - Be respectful and inclusive
294
+ - Welcome newcomers
295
+ - Accept constructive criticism
296
+ - Focus on what's best for the community
297
+ - Show empathy towards others
298
+
299
+ ### Unacceptable Behavior
300
+
301
+ - Harassment or discriminatory language
302
+ - Trolling or insulting comments
303
+ - Personal or political attacks
304
+ - Publishing others' private information
305
+ - Other unprofessional conduct
306
+
307
+ ### Enforcement
308
+
309
+ Violations may result in:
310
+ 1. Warning
311
+ 2. Temporary ban
312
+ 3. Permanent ban
313
+
314
+ Report issues to: [your.email@example.com]
315
+
316
+ ## 🎓 Learning Resources
317
+
318
+ ### Vector Databases
319
+ - [HNSW Algorithm Paper](https://arxiv.org/abs/1603.09320)
320
+ - [Vector Database Basics](https://www.pinecone.io/learn/vector-database/)
321
+
322
+ ### Development Tools
323
+ - [pybind11 Documentation](https://pybind11.readthedocs.io/)
324
+ - [Rust Book](https://doc.rust-lang.org/book/)
325
+ - [C++ Reference](https://en.cppreference.com/)
326
+
327
+ ## 💬 Communication
328
+
329
+ - **GitHub Issues**: Bug reports and feature requests
330
+ - **Pull Requests**: Code contributions
331
+ - **Discussions**: General questions and ideas
332
+ - **Email**: [your.email@example.com]
333
+
334
+ ## 🙏 Recognition
335
+
336
+ Contributors will be:
337
+ - Listed in CONTRIBUTORS.md
338
+ - Mentioned in release notes
339
+ - Credited in documentation
340
+
341
+ Thank you for contributing to Feather DB! 🚀
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Feather DB Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,29 @@
1
+ # Include documentation
2
+ include README.md
3
+ include LICENSE
4
+ include CHANGELOG.md
5
+ include HOW_TO_USE.md
6
+ include USAGE_GUIDE.md
7
+ include CONTRIBUTING.md
8
+
9
+ # Include C++ source and headers
10
+ recursive-include include *.h
11
+ recursive-include src *.cpp
12
+ recursive-include bindings *.cpp
13
+
14
+ # Include examples
15
+ recursive-include examples *.py *.md
16
+
17
+ # Exclude build artifacts
18
+ global-exclude *.pyc
19
+ global-exclude *.pyo
20
+ global-exclude *.o
21
+ global-exclude *.a
22
+ global-exclude __pycache__
23
+ global-exclude *.feather
24
+ global-exclude *.npy
25
+
26
+ # Exclude test files
27
+ prune p-test
28
+ prune feather-cli/target
29
+ exclude feather-cli/Cargo.lock