PyPI - abloom - Versions diffs - 0.1.0__tar.gz - Mend

abloom 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

abloom-0.1.0/LICENSE +21 -0
abloom-0.1.0/MANIFEST.in +6 -0
abloom-0.1.0/PKG-INFO +94 -0
abloom-0.1.0/README.md +70 -0
abloom-0.1.0/abloom/__init__.py +8 -0
abloom-0.1.0/abloom/_abloom.c +348 -0
abloom-0.1.0/abloom/_abloom.pyi +14 -0
abloom-0.1.0/abloom/py.typed +0 -0
abloom-0.1.0/abloom.egg-info/PKG-INFO +94 -0
abloom-0.1.0/abloom.egg-info/SOURCES.txt +19 -0
abloom-0.1.0/abloom.egg-info/dependency_links.txt +1 -0
abloom-0.1.0/abloom.egg-info/entry_points.txt +2 -0
abloom-0.1.0/abloom.egg-info/top_level.txt +1 -0
abloom-0.1.0/pyproject.toml +69 -0
abloom-0.1.0/setup.cfg +4 -0
abloom-0.1.0/setup.py +14 -0
abloom-0.1.0/tests/test_benchmark.py +174 -0
abloom-0.1.0/tests/test_edge_cases.py +169 -0
abloom-0.1.0/tests/test_fpr.py +116 -0
abloom-0.1.0/tests/test_functionality.py +342 -0
abloom-0.1.0/tests/test_properties.py +142 -0

abloom-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 ampribe
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

abloom-0.1.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,6 @@
+include abloom/*.c
+include abloom/*.h
+include abloom/*.pyi
+include abloom/py.typed
+include LICENSE
+include README.md

abloom-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,94 @@
+Metadata-Version: 2.4
+Name: abloom
+Version: 0.1.0
+Summary: High-performance Bloom filter for Python
+Author-email: Andrew Pribe <andrewpribe@gmail.com>
+License: MIT
+Project-URL: Homepage, https://github.com/ampribe/abloom
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: C
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# abloom
+[![Tests](https://github.com/ampribe/abloom/actions/workflows/test.yml/badge.svg)](https://github.com/ampribe/abloom/actions/workflows/test.yml)
+`abloom` is a high-performance Bloom filter implementation for Python, written in C.
+Why use `abloom`?
+- `abloom` significantly outperforms all other Python Bloom filter libraries. It's 2.77x faster on `add`, 2.43x faster on `update`, and 1.34x faster on lookups than the next fastest implementation, `rbloom` (1M ints, 1% FPR). Complete benchmark results [here](BENCHMARK.md).
+- `abloom` is rigorously tested for Python versions >= 3.8 on Ubuntu, Windows, and macOS.
+## Usage
+Install with `pip install abloom`.
+```python
+from abloom import BloomFilter
+bf = BloomFilter(1_000_000, 0.01)
+bf.add(1)
+bf.add(("arbitrary", "object", "that", "implements", "hash"))
+bf.update([2,3,4])
+assert 1 in bf
+assert ("arbitrary", "object", "that", "implements", "hash") in bf
+assert 5 not in bf
+repr(bf) # '<BloomFilter capacity=1_000_000 items=5 fp_rate=0.01>'
+```
+`abloom` relies on Python's built-in hash function, so types must implement `__hash__`. Python uses a unique seed for hashes within each process, so transferring Bloom filters between processes is not possible.
+`abloom` implements a split block Bloom filter with 512 bits per block and power-of-2 rounding for block count. This requires ~1.5-2x memory overhead compared to the standard implementation and can reduce performance for extremely high capacities or low false positive rates. The benchmark on 10M ints, 0.1% FPR shows this effect, though `abloom` is still significantly faster than alternative libraries. See [implementation](IMPLEMENTATION.md) for additional implementation and memory usage details.
+## Testing
+```bash
+# Install dev dependencies
+pip install -e ".[test]"
+# Run unit tests
+pytest tests/ --ignore=tests/test_benchmark.py --ignore=tests/test_fpr.py -v
+# Run all tests including slow FPR validation
+pytest tests/ --ignore=tests/test_benchmark.py -v
+# Cross-version testing (requires tox and multiple Python versions)
+pip install tox
+tox
+```
+## Benchmarking
+See [BENCHMARK.md](BENCHMARK.md) for detailed results and filtering options.
+```bash
+# Install benchmark dependencies
+pip install -e ".[benchmark]"
+# Run all benchmarks
+pytest tests/test_benchmark.py --benchmark-only
+# Run canonical benchmark (1M ints, 1% FPR)
+pytest tests/test_benchmark.py -k "int_1000000_0.01" --benchmark-only -v
+# Filter by operation, library, or data type
+pytest tests/test_benchmark.py -k "add" --benchmark-only       # Add only
+pytest tests/test_benchmark.py -k "abloom" --benchmark-only    # abloom only
+pytest tests/test_benchmark.py -k "uuid" --benchmark-only      # UUIDs only
+# Save results for report generation
+pytest tests/test_benchmark.py --benchmark-only --benchmark-json=results.json
+python scripts/generate_benchmark_report.py results.json
+```

abloom-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,70 @@
+# abloom
+[![Tests](https://github.com/ampribe/abloom/actions/workflows/test.yml/badge.svg)](https://github.com/ampribe/abloom/actions/workflows/test.yml)
+`abloom` is a high-performance Bloom filter implementation for Python, written in C.
+Why use `abloom`?
+- `abloom` significantly outperforms all other Python Bloom filter libraries. It's 2.77x faster on `add`, 2.43x faster on `update`, and 1.34x faster on lookups than the next fastest implementation, `rbloom` (1M ints, 1% FPR). Complete benchmark results [here](BENCHMARK.md).
+- `abloom` is rigorously tested for Python versions >= 3.8 on Ubuntu, Windows, and macOS.
+## Usage
+Install with `pip install abloom`.
+```python
+from abloom import BloomFilter
+bf = BloomFilter(1_000_000, 0.01)
+bf.add(1)
+bf.add(("arbitrary", "object", "that", "implements", "hash"))
+bf.update([2,3,4])
+assert 1 in bf
+assert ("arbitrary", "object", "that", "implements", "hash") in bf
+assert 5 not in bf
+repr(bf) # '<BloomFilter capacity=1_000_000 items=5 fp_rate=0.01>'
+```
+`abloom` relies on Python's built-in hash function, so types must implement `__hash__`. Python uses a unique seed for hashes within each process, so transferring Bloom filters between processes is not possible.
+`abloom` implements a split block Bloom filter with 512 bits per block and power-of-2 rounding for block count. This requires ~1.5-2x memory overhead compared to the standard implementation and can reduce performance for extremely high capacities or low false positive rates. The benchmark on 10M ints, 0.1% FPR shows this effect, though `abloom` is still significantly faster than alternative libraries. See [implementation](IMPLEMENTATION.md) for additional implementation and memory usage details.
+## Testing
+```bash
+# Install dev dependencies
+pip install -e ".[test]"
+# Run unit tests
+pytest tests/ --ignore=tests/test_benchmark.py --ignore=tests/test_fpr.py -v
+# Run all tests including slow FPR validation
+pytest tests/ --ignore=tests/test_benchmark.py -v
+# Cross-version testing (requires tox and multiple Python versions)
+pip install tox
+tox
+```
+## Benchmarking
+See [BENCHMARK.md](BENCHMARK.md) for detailed results and filtering options.
+```bash
+# Install benchmark dependencies
+pip install -e ".[benchmark]"
+# Run all benchmarks
+pytest tests/test_benchmark.py --benchmark-only
+# Run canonical benchmark (1M ints, 1% FPR)
+pytest tests/test_benchmark.py -k "int_1000000_0.01" --benchmark-only -v
+# Filter by operation, library, or data type
+pytest tests/test_benchmark.py -k "add" --benchmark-only       # Add only
+pytest tests/test_benchmark.py -k "abloom" --benchmark-only    # abloom only
+pytest tests/test_benchmark.py -k "uuid" --benchmark-only      # UUIDs only
+# Save results for report generation
+pytest tests/test_benchmark.py --benchmark-only --benchmark-json=results.json
+python scripts/generate_benchmark_report.py results.json
+```

abloom-0.1.0/abloom/__init__.py ADDED Viewed

@@ -0,0 +1,8 @@
+"""
+abloom - A high-performance Bloom filter for Python
+"""
+from abloom._abloom import BloomFilter
+__version__ = '0.1.0'
+__all__ = ['BloomFilter']

abloom-0.1.0/abloom/_abloom.c ADDED Viewed

@@ -0,0 +1,348 @@
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include <math.h>
+#include <string.h>
+// SBBF constants: 512-bit blocks (8 x 64-bit words)
+#define BLOCK_BITS 512
+#define BLOCK_BYTES 64
+#define BLOCK_WORDS 8
+#define BITS_PER_WORD 64
+// Salt constants from Parquet spec
+static const uint32_t SALT[8] = {0x47b6137bU, 0x44974d91U, 0x8824ad5bU,
+                                 0xa2b7289dU, 0x705495c7U, 0x2df1424bU,
+                                 0x9efc4947U, 0x5c6bfb31U};
+typedef struct {
+  PyObject_HEAD uint64_t *blocks;
+  uint64_t block_count;
+  uint64_t block_mask;
+  uint64_t item_count;
+  uint64_t capacity;
+  double fp_rate;
+} BloomFilter;
+static const float SBBF512_LUT[] = {
+    3.2304f,  3.8302f,  4.3978f,  4.9555f,  5.5148f,  6.0828f,  6.6644f,
+    7.2634f,  7.8830f,  8.5260f,  9.1952f,  9.8929f,  10.6217f, 11.3841f,
+    12.1826f, 13.0199f, 13.8988f, 14.8220f, 15.7926f, 16.8139f, 17.8892f,
+    19.0222f, 20.2168f, 21.4771f, 22.8076f, 24.2130f, 25.6984f, 27.2693f,
+    28.9318f, 30.6921f, 32.5573f, 34.5347f, 36.6325f, 38.8595f, 41.2251f,
+    43.7396f, 46.4143f, 49.2614f, 52.2942f};
+#define SBBF512_LUT_SIZE 39
+static inline uint64_t mix64(uint64_t x) {
+  x ^= x >> 33;
+  x *= 0xff51afd7ed558ccdULL;
+  x ^= x >> 33;
+  x *= 0xc4ceb9fe1a85ec53ULL;
+  x ^= x >> 33;
+  return x;
+}
+static uint64_t next_power_of_2(uint64_t n) {
+  if (n == 0)
+    return 1;
+  n--;
+  n |= n >> 1;
+  n |= n >> 2;
+  n |= n >> 4;
+  n |= n >> 8;
+  n |= n >> 16;
+  n |= n >> 32;
+  return n + 1;
+}
+static uint64_t calculate_block_count(uint64_t capacity, double fp_rate) {
+  if (capacity == 0)
+    capacity = 1;
+  double x = -log2(fp_rate);
+  double bits_per_item;
+  if (x <= 1.0) {
+    bits_per_item = SBBF512_LUT[0];
+  } else if (x >= 20.0) {
+    double slope = (SBBF512_LUT[38] - SBBF512_LUT[37]) / 0.5;
+    bits_per_item = SBBF512_LUT[38] + slope * (x - 20.0);
+  } else {
+    int idx = (int)((x - 1.0) / 0.5);
+    if (idx >= SBBF512_LUT_SIZE - 1)
+      idx = SBBF512_LUT_SIZE - 2;
+    double t = (x - 1.0 - idx * 0.5) / 0.5;
+    bits_per_item = SBBF512_LUT[idx] * (1.0 - t) + SBBF512_LUT[idx + 1] * t;
+  }
+  if (bits_per_item < 8.0)
+    bits_per_item = 8.0;
+  uint64_t total_bits = (uint64_t)ceil(capacity * bits_per_item);
+  uint64_t min_blocks = (total_bits + BLOCK_BITS - 1) / BLOCK_BITS;
+  return next_power_of_2(min_blocks);
+}
+static inline void bloom_insert(BloomFilter *bf, uint64_t hash) {
+  // Upper 32 bits select the block
+  uint64_t block_idx = (hash >> 32) & bf->block_mask;
+  uint32_t h_low = (uint32_t)hash;
+  uint64_t *block = &bf->blocks[block_idx * BLOCK_WORDS];
+  uint32_t p0 = (h_low * SALT[0]) >> 26;
+  uint32_t p1 = (h_low * SALT[1]) >> 26;
+  uint32_t p2 = (h_low * SALT[2]) >> 26;
+  uint32_t p3 = (h_low * SALT[3]) >> 26;
+  uint32_t p4 = (h_low * SALT[4]) >> 26;
+  uint32_t p5 = (h_low * SALT[5]) >> 26;
+  uint32_t p6 = (h_low * SALT[6]) >> 26;
+  uint32_t p7 = (h_low * SALT[7]) >> 26;
+  block[0] |= (1ULL << p0);
+  block[1] |= (1ULL << p1);
+  block[2] |= (1ULL << p2);
+  block[3] |= (1ULL << p3);
+  block[4] |= (1ULL << p4);
+  block[5] |= (1ULL << p5);
+  block[6] |= (1ULL << p6);
+  block[7] |= (1ULL << p7);
+}
+static inline int bloom_check(BloomFilter *bf, uint64_t hash) {
+  uint64_t block_idx = (hash >> 32) & bf->block_mask;
+  uint32_t h_low = (uint32_t)hash;
+  uint64_t *block = &bf->blocks[block_idx * BLOCK_WORDS];
+#define CHECK_WORD(i)                                                          \
+  if (!(block[i] & (1ULL << ((h_low * SALT[i]) >> 26))))                       \
+  return 0
+  CHECK_WORD(0);
+  CHECK_WORD(1);
+  CHECK_WORD(2);
+  CHECK_WORD(3);
+  CHECK_WORD(4);
+  CHECK_WORD(5);
+  CHECK_WORD(6);
+  CHECK_WORD(7);
+  return 1;
+#undef CHECK_WORD
+}
+static int get_hash(PyObject *item, uint64_t *out_hash) {
+  Py_hash_t py_hash = PyObject_Hash(item);
+  if (py_hash == -1 && PyErr_Occurred()) return -1;
+  // try mixing first
+  *out_hash = mix64((uint64_t)py_hash);
+  return 0;
+}
+static PyObject *BloomFilter_update(BloomFilter *self, PyObject *iterable) {
+  PyObject *iter = PyObject_GetIter(iterable);
+  if (iter == NULL)
+    return NULL;
+  PyObject *item;
+  while ((item = PyIter_Next(iter)) != NULL) {
+    uint64_t hash;
+    if (get_hash(item, &hash) < 0) {
+      Py_DECREF(item);
+      Py_DECREF(iter);
+      return NULL;
+    }
+    bloom_insert(self, hash);
+    self->item_count++;
+    Py_DECREF(item);
+  }
+  Py_DECREF(iter);
+  if (PyErr_Occurred())
+    return NULL;
+  Py_RETURN_NONE;
+}
+static PyObject *BloomFilter_add(BloomFilter *self, PyObject *item) {
+  uint64_t hash;
+  if (get_hash(item, &hash) < 0)
+    return NULL;
+  bloom_insert(self, hash);
+  self->item_count++;
+  Py_RETURN_NONE;
+}
+static int BloomFilter_contains(BloomFilter *self, PyObject *item) {
+  uint64_t hash;
+  if (get_hash(item, &hash) < 0)
+    return -1;
+  return bloom_check(self, hash);
+}
+static Py_ssize_t BloomFilter_len(BloomFilter *self) {
+  return (Py_ssize_t)self->item_count;
+}
+static PyObject *BloomFilter_get_capacity(BloomFilter *self, void *closure) {
+  return PyLong_FromUnsignedLongLong(self->capacity);
+}
+static PyObject *BloomFilter_get_fp_rate(BloomFilter *self, void *closure) {
+  return PyFloat_FromDouble(self->fp_rate);
+}
+static PyObject *BloomFilter_get_k(BloomFilter *self, void *closure) {
+  return PyLong_FromLong(BLOCK_WORDS); // Always 8 for SBBF
+}
+static PyObject *BloomFilter_get_byte_count(BloomFilter *self, void *closure) {
+  uint64_t bytes = self->block_count * BLOCK_BYTES;
+  return PyLong_FromUnsignedLongLong(bytes);
+}
+static PyObject *BloomFilter_get_bit_count(BloomFilter *self, void *closure) {
+  uint64_t bits = self->block_count * BLOCK_BITS;
+  return PyLong_FromUnsignedLongLong(bits);
+}
+static void BloomFilter_dealloc(BloomFilter *self) {
+  if (self->blocks) {
+    PyMem_Free(self->blocks);
+  }
+  Py_TYPE(self)->tp_free((PyObject *)self);
+}
+static int BloomFilter_init(BloomFilter *self, PyObject *args, PyObject *kwds) {
+  static char *kwlist[] = {"capacity", "fp_rate", NULL};
+  unsigned long long capacity;
+  double fp_rate = 0.01;
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "K|d", kwlist, &capacity,
+                                   &fp_rate)) {
+    return -1;
+  }
+  if (capacity == 0) {
+    PyErr_SetString(PyExc_ValueError, "Capacity must be greater than 0");
+    return -1;
+  }
+  if (fp_rate <= 0.0 || fp_rate >= 1.0) {
+    PyErr_SetString(PyExc_ValueError,
+                    "False positive rate must be between 0.0 and 1.0");
+    return -1;
+  }
+  self->capacity = capacity;
+  self->fp_rate = fp_rate;
+  self->block_count = calculate_block_count(capacity, fp_rate);
+  self->block_mask = self->block_count - 1;
+  self->item_count = 0;
+  size_t num_bytes = self->block_count * BLOCK_BYTES;
+  self->blocks = PyMem_Calloc(num_bytes, 1);
+  if (self->blocks == NULL) {
+    PyErr_NoMemory();
+    return -1;
+  }
+  return 0;
+}
+static PyObject *BloomFilter_new(PyTypeObject *type, PyObject *args,
+                                 PyObject *kwds) {
+  BloomFilter *self = (BloomFilter *)type->tp_alloc(type, 0);
+  if (self != NULL) {
+    self->blocks = NULL;
+    self->block_count = 0;
+    self->block_mask = 0;
+    self->item_count = 0;
+    self->capacity = 0;
+    self->fp_rate = 0.0;
+  }
+  return (PyObject *)self;
+}
+static PyMethodDef BloomFilter_methods[] = {
+    {"add", (PyCFunction)BloomFilter_add, METH_O,
+     "Add an item to the bloom filter"},
+    {"update", (PyCFunction)BloomFilter_update, METH_O,
+     "Add items from an iterable to the bloom filter"},
+    {NULL}};
+static PyGetSetDef BloomFilter_getsetters[] = {
+    {"capacity", (getter)BloomFilter_get_capacity, NULL,
+     "Expected number of items", NULL},
+    {"fp_rate", (getter)BloomFilter_get_fp_rate, NULL,
+     "Target false positive rate", NULL},
+    {"k", (getter)BloomFilter_get_k, NULL,
+     "Number of hash functions (always 8 for SBBF)", NULL},
+    {"byte_count", (getter)BloomFilter_get_byte_count, NULL,
+     "Memory usage in bytes", NULL},
+    {"bit_count", (getter)BloomFilter_get_bit_count, NULL,
+     "Total bits in filter", NULL},
+    {NULL}};
+static PySequenceMethods BloomFilter_as_sequence = {
+    .sq_length = (lenfunc)BloomFilter_len,
+    .sq_contains = (objobjproc)BloomFilter_contains,
+};
+static PyObject *BloomFilter_repr(BloomFilter *self) {
+  PyObject *fp_obj = PyFloat_FromDouble(self->fp_rate);
+  if (!fp_obj)
+    return NULL;
+  PyObject *repr =
+      PyUnicode_FromFormat("<BloomFilter capacity=%llu items=%llu fp_rate=%R>",
+                           self->capacity, self->item_count, fp_obj);
+  Py_DECREF(fp_obj);
+  return repr;
+}
+static PyTypeObject BloomFilterType = {
+    PyVarObject_HEAD_INIT(NULL, 0).tp_name = "_abloom.BloomFilter",
+    .tp_doc = "High-performance Split Block Bloom Filter",
+    .tp_basicsize = sizeof(BloomFilter),
+    .tp_itemsize = 0,
+    .tp_flags = Py_TPFLAGS_DEFAULT,
+    .tp_new = BloomFilter_new,
+    .tp_init = (initproc)BloomFilter_init,
+    .tp_dealloc = (destructor)BloomFilter_dealloc,
+    .tp_repr = (reprfunc)BloomFilter_repr,
+    .tp_methods = BloomFilter_methods,
+    .tp_getset = BloomFilter_getsetters,
+    .tp_as_sequence = &BloomFilter_as_sequence,
+};
+static PyModuleDef abloommodule = {
+    PyModuleDef_HEAD_INIT,
+    .m_name = "_abloom",
+    .m_doc = "High-performance Split Block Bloom Filter for Python",
+    .m_size = -1,
+};
+PyMODINIT_FUNC PyInit__abloom(void) {
+  PyObject *m;
+  if (PyType_Ready(&BloomFilterType) < 0)
+    return NULL;
+  m = PyModule_Create(&abloommodule);
+  if (m == NULL)
+    return NULL;
+  Py_INCREF(&BloomFilterType);
+  if (PyModule_AddObject(m, "BloomFilter", (PyObject *)&BloomFilterType) < 0) {
+    Py_DECREF(&BloomFilterType);
+    Py_DECREF(m);
+    return NULL;
+  }
+  return m;
+}

abloom-0.1.0/abloom/_abloom.pyi ADDED Viewed

@@ -0,0 +1,14 @@
+from typing import Iterable
+class BloomFilter:
+    capacity: int
+    fp_rate: float
+    k: int
+    byte_count: int
+    bit_count: int
+    def __init__(self, capacity: int, fp_rate: float = 0.01) -> None: ...
+    def add(self, item: object) -> None: ...
+    def update(self, items: Iterable[object]) -> None: ...
+    def __contains__(self, item: object) -> bool: ...
+    def __len__(self) -> int: ...

abloom-0.1.0/abloom/py.typed ADDED Viewed

File without changes

abloom-0.1.0/abloom.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,94 @@
+Metadata-Version: 2.4
+Name: abloom
+Version: 0.1.0
+Summary: High-performance Bloom filter for Python
+Author-email: Andrew Pribe <andrewpribe@gmail.com>
+License: MIT
+Project-URL: Homepage, https://github.com/ampribe/abloom
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: C
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# abloom
+[![Tests](https://github.com/ampribe/abloom/actions/workflows/test.yml/badge.svg)](https://github.com/ampribe/abloom/actions/workflows/test.yml)
+`abloom` is a high-performance Bloom filter implementation for Python, written in C.
+Why use `abloom`?
+- `abloom` significantly outperforms all other Python Bloom filter libraries. It's 2.77x faster on `add`, 2.43x faster on `update`, and 1.34x faster on lookups than the next fastest implementation, `rbloom` (1M ints, 1% FPR). Complete benchmark results [here](BENCHMARK.md).
+- `abloom` is rigorously tested for Python versions >= 3.8 on Ubuntu, Windows, and macOS.
+## Usage
+Install with `pip install abloom`.
+```python
+from abloom import BloomFilter
+bf = BloomFilter(1_000_000, 0.01)
+bf.add(1)
+bf.add(("arbitrary", "object", "that", "implements", "hash"))
+bf.update([2,3,4])
+assert 1 in bf
+assert ("arbitrary", "object", "that", "implements", "hash") in bf
+assert 5 not in bf
+repr(bf) # '<BloomFilter capacity=1_000_000 items=5 fp_rate=0.01>'
+```
+`abloom` relies on Python's built-in hash function, so types must implement `__hash__`. Python uses a unique seed for hashes within each process, so transferring Bloom filters between processes is not possible.
+`abloom` implements a split block Bloom filter with 512 bits per block and power-of-2 rounding for block count. This requires ~1.5-2x memory overhead compared to the standard implementation and can reduce performance for extremely high capacities or low false positive rates. The benchmark on 10M ints, 0.1% FPR shows this effect, though `abloom` is still significantly faster than alternative libraries. See [implementation](IMPLEMENTATION.md) for additional implementation and memory usage details.
+## Testing
+```bash
+# Install dev dependencies
+pip install -e ".[test]"
+# Run unit tests
+pytest tests/ --ignore=tests/test_benchmark.py --ignore=tests/test_fpr.py -v
+# Run all tests including slow FPR validation
+pytest tests/ --ignore=tests/test_benchmark.py -v
+# Cross-version testing (requires tox and multiple Python versions)
+pip install tox
+tox
+```
+## Benchmarking
+See [BENCHMARK.md](BENCHMARK.md) for detailed results and filtering options.
+```bash
+# Install benchmark dependencies
+pip install -e ".[benchmark]"
+# Run all benchmarks
+pytest tests/test_benchmark.py --benchmark-only
+# Run canonical benchmark (1M ints, 1% FPR)
+pytest tests/test_benchmark.py -k "int_1000000_0.01" --benchmark-only -v
+# Filter by operation, library, or data type
+pytest tests/test_benchmark.py -k "add" --benchmark-only       # Add only
+pytest tests/test_benchmark.py -k "abloom" --benchmark-only    # abloom only
+pytest tests/test_benchmark.py -k "uuid" --benchmark-only      # UUIDs only
+# Save results for report generation
+pytest tests/test_benchmark.py --benchmark-only --benchmark-json=results.json
+python scripts/generate_benchmark_report.py results.json
+```

abloom-0.1.0/abloom.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,19 @@
+LICENSE
+MANIFEST.in
+README.md
+pyproject.toml
+setup.py
+abloom/__init__.py
+abloom/_abloom.c
+abloom/_abloom.pyi
+abloom/py.typed
+abloom.egg-info/PKG-INFO
+abloom.egg-info/SOURCES.txt
+abloom.egg-info/dependency_links.txt
+abloom.egg-info/entry_points.txt
+abloom.egg-info/top_level.txt
+tests/test_benchmark.py
+tests/test_edge_cases.py
+tests/test_fpr.py
+tests/test_functionality.py
+tests/test_properties.py

abloom-0.1.0/abloom.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

abloom-0.1.0/abloom.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ benchmark-report = analysis.generate_benchmark_report:main

abloom-0.1.0/abloom.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ abloom