PyPI - sqlrite - Versions diffs - 0.1.11__tar.gz → 0.1.13__tar.gz - Mend

sqlrite 0.1.11tar.gz → 0.1.13tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

{sqlrite-0.1.11 → sqlrite-0.1.13}/Cargo.lock RENAMED Viewed

@@ -3736,7 +3736,7 @@ dependencies = [
 [[package]]
 name = "sqlrite-desktop"
-version = "0.1.11"
+version = "0.1.13"
 dependencies = [
  "serde",
  "serde_json",
@@ -3748,7 +3748,7 @@ dependencies = [
 [[package]]
 name = "sqlrite-engine"
-version = "0.1.11"
+version = "0.1.13"
 dependencies = [
  "clap",
  "env_logger",
@@ -3763,7 +3763,7 @@ dependencies = [
 [[package]]
 name = "sqlrite-ffi"
-version = "0.1.11"
+version = "0.1.13"
 dependencies = [
  "cbindgen",
  "sqlrite-engine",
@@ -3771,7 +3771,7 @@ dependencies = [
 [[package]]
 name = "sqlrite-nodejs"
-version = "0.1.11"
+version = "0.1.13"
 dependencies = [
  "napi",
  "napi-build",
@@ -3781,7 +3781,7 @@ dependencies = [
 [[package]]
 name = "sqlrite-python"
-version = "0.1.11"
+version = "0.1.13"
 dependencies = [
  "pyo3",
  "sqlrite-engine",

{sqlrite-0.1.11 → sqlrite-0.1.13}/Cargo.toml RENAMED Viewed

@@ -27,7 +27,7 @@ resolver = "3"
 # `package =` key so the import name stays `sqlrite` internally:
 #     sqlrite = { package = "sqlrite-engine", path = "…" }
 name = "sqlrite-engine"
-version = "0.1.11"
+version = "0.1.13"
 authors = ["Joao Henrique Machado Silva <joaoh82@gmail.com>"]
 edition = "2024"
 rust-version = "1.85"

{sqlrite-0.1.11 → sqlrite-0.1.13}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sqlrite
-Version: 0.1.11
+Version: 0.1.13
 Classifier: Development Status :: 3 - Alpha
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: MIT License

{sqlrite-0.1.11 → sqlrite-0.1.13}/desktop/package.json RENAMED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "sqlrite-desktop-frontend",
   "private": true,
-  "version": "0.1.11",
+  "version": "0.1.13",
   "type": "module",
   "scripts": {
     "dev": "vite",

{sqlrite-0.1.11 → sqlrite-0.1.13}/docs/phase-7-plan.md RENAMED Viewed

@@ -122,13 +122,17 @@ SELECT id, title FROM docs ORDER BY embedding <-> [0.1, ...] LIMIT 10;
 ---
-### 7c — Brute-force KNN executor optimization
+### ✅ 7c — Brute-force KNN executor optimization
-**What.** Recognize the pattern `ORDER BY <distance-expr> LIMIT k` and execute it with a bounded min-heap (size k) instead of a full sort. O(N log k) instead of O(N log N).
+**What shipped.** The SELECT executor now branches on `(ORDER BY, LIMIT k)` shape. When both are present and `k < N`, the new `select_topk` function maintains a bounded `BinaryHeap` of size k instead of full-sorting all N rowids. O(N log k) instead of O(N log N).
-**Why a separate sub-phase.** 7b makes it work; 7c makes it fast enough to be useful on millions of rows. Worth shipping as its own commit so the perf delta is visible in benchmarks.
+**Implementation note: max-heap with direction-aware Ord.** A single `HeapEntry { key: Value, rowid: i64, asc: bool }` wrapper handles both `ORDER BY ASC LIMIT k` (k smallest) and `ORDER BY DESC LIMIT k` (k largest) without separate code paths. The `asc` flag inverts the natural Ord, so the displacement test reduces to "new entry < heap top" in both cases. After the scan, `into_sorted_vec` returns the right caller-facing order (ascending for ASC, descending for DESC).
-**LOC estimate:** ~150 lines including a tiny benchmark to prove the speedup.
+**Measured speedup** (N=10k, k=10, single REAL column sort key, release build): ~1.8×. The advantage scales with N and with per-row work — KNN queries where the sort key is `vec_distance_l2(col, [...])` benefit much more because each key evaluation is itself O(dim).
+**LOC**: ~120 implementation + ~180 tests/benchmark = ~300 total. Slightly over the ~150 estimate because the test surface (correctness + bench + edge cases for k=0, k>N, empty input, distance-function integration) ended up larger than initially projected.
+**Pre-existing bug surfaced.** The seed function for the benchmark needed positive scores because the INSERT parser doesn't currently handle `Expr::UnaryOp(Minus, Number(...))` for negative literals. Worked around with a Knuth-hash scrambler that stays positive; the underlying parser bug is documented as a follow-up.
 ---
@@ -154,6 +158,14 @@ SELECT id, title FROM docs ORDER BY embedding <-> [0.1, ...] LIMIT 10;
 **LOC estimate:** ~700-900 lines. The big sub-phase.
+> **Scope correction (2026-04-27, post-7c):** Re-scoping during implementation showed 7d works out to ~1300 LOC across three logical chunks, more than the original ~700-900 estimate and too much for one reviewable PR. Splitting into three:
+>
+> - **7d.1 — Pure HNSW algorithm** *(~700 LOC).* `src/sql/hnsw.rs` standalone module: insert + search + layer assignment + beam search per layer + L2/cosine/dot distance dispatch. No SQL integration yet — vectors are passed in via a `get_vec` closure so the algorithm doesn't depend on table types. Tests verify recall@k ≥ 0.95 vs brute-force on randomly-generated vector sets; deterministic via a fixed RNG seed.
+> - **7d.2 — SQL integration** *(~400 LOC).* `CREATE INDEX … USING hnsw (col)` parser + engine, INSERT wiring (also calls `hnsw.insert()`), query optimizer hook (recognizes `ORDER BY vec_distance_*(col, literal) LIMIT k` and probes the HNSW instead of full-scanning). HNSW lives in memory only at this point — gets rebuilt on every database open.
+> - **7d.3 — Persistence** *(~300 LOC).* Wire HNSW into the cell format: new `KIND_HNSW` cell tag, page-tree storage parallel to secondary indexes, save/reopen round-trip.
+>
+> Each 7d.x ships as its own PR + release wave. The user-facing value lands at 7d.2; 7d.3 closes the persistence loop. 7d.1 is foundational but ships a tested algorithmic primitive on its own — useful as documentation of the engine's "from scratch" theme.
 **Tests:** recall@k vs brute-force baseline (should be ≥ 0.95 on standard benchmark vectors); insert performance; delete semantics; persistence roundtrip.
 ---

{sqlrite-0.1.11 → sqlrite-0.1.13}/docs/roadmap.md RENAMED Viewed

@@ -470,9 +470,9 @@ The full plan + recorded design decisions live in [`docs/phase-7-plan.md`](phase
 Approved sub-phases (Q1–Q10 resolved):
-- **7a — `VECTOR(N)` column type** — dense fixed-dimension f32 storage via the existing cell encoding; bump file format to v4. Bracket-array literal syntax `[0.1, 0.2, …]` (Q7).
-- **7b — Distance functions + KNN operators** — `vec_distance_l2/cosine/dot` plus pgvector-style `<->` `<=>` `<#>` operators (Q6).
-- **7c — Brute-force KNN executor optimization** — recognize `ORDER BY <distance> LIMIT k`, use bounded min-heap.
+- **✅ 7a — `VECTOR(N)` column type** *(v0.1.10)* — dense fixed-dimension f32 storage via the existing cell encoding; format bumped to v4. Bracket-array literal syntax `[0.1, 0.2, …]` (Q7).
+- **✅ 7b — Distance functions** *(v0.1.11)* — `vec_distance_l2/cosine/dot`, plus the ORDER BY-expressions parser change so KNN queries work end-to-end. Operators (`<->` `<=>` `<#>`) deferred to **7b.1** — sqlparser doesn't parse them natively, contradicting Q6's "tiny parser change" assumption.
+- **✅ 7c — Brute-force KNN executor optimization** — bounded `BinaryHeap` of size k for `ORDER BY <expr> LIMIT k`. ~1.8× faster than full-sort at N=10k for cheap keys; bigger gains on expensive keys like `vec_distance_l2`.
 - **7d — HNSW ANN index** — `CREATE INDEX … USING hnsw (col)`; persisted as cell-encoded graph. Fixed defaults `M=16, ef_construction=200, ef_search=50` (Q2).
 - **7e — JSON column type + path queries** — `JSON` data type stored as bincoded `serde_json::Value` (Q3); `json_extract` / `json_array_length` / `json_object_keys` / `json_type`.
 - **7f — ~~Full-text search with BM25~~** — **deferred to Phase 8** (Q1).

{sqlrite-0.1.11 → sqlrite-0.1.13}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "maturin"
 [project]
 name = "sqlrite"
-version = "0.1.11"
+version = "0.1.13"
 description = "Python bindings for SQLRite — a small, embeddable SQLite clone written in Rust."
 authors = [{ name = "Joao Henrique Machado Silva", email = "joaoh82@gmail.com" }]
 license = { text = "MIT" }

{sqlrite-0.1.11 → sqlrite-0.1.13}/sdk/python/Cargo.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [package]
 name = "sqlrite-python"
-version = "0.1.11"
+version = "0.1.13"
 authors = ["Joao Henrique Machado Silva <joaoh82@gmail.com>"]
 edition = "2024"
 rust-version = "1.85"

{sqlrite-0.1.11 → sqlrite-0.1.13}/src/sql/executor.rs RENAMED Viewed

@@ -73,13 +73,39 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result<SelectRe
     };
     let mut matching = matching;
-    // Sort before applying LIMIT, matching SQL semantics.
-    if let Some(order) = &query.order_by {
-        sort_rowids(&mut matching, table, order)?;
-    }
-    if let Some(n) = query.limit {
-        matching.truncate(n);
+    // Phase 7c — bounded-heap top-k optimization.
+    //
+    // The naive "ORDER BY <expr>" path (Phase 7b) sorts every matching
+    // rowid: O(N log N) sort_by + a truncate. For KNN queries
+    //
+    //     SELECT id FROM docs
+    //     ORDER BY vec_distance_l2(embedding, [...])
+    //     LIMIT 10;
+    //
+    // N is the table row count and k is the LIMIT. With a bounded
+    // max-heap of size k we can find the top-k in O(N log k) — same
+    // sort_by-per-row cost on the heap operations, but k is typically
+    // 10-100 while N can be millions.
+    //
+    // We branch in three cases:
+    //   1. ORDER BY + LIMIT k where k < |matching|  → bounded heap.
+    //   2. ORDER BY without LIMIT, or LIMIT >= |matching| → full sort
+    //      (heap saves nothing when we'd keep everyone anyway).
+    //   3. LIMIT without ORDER BY → just truncate (no sort needed).
+    match (&query.order_by, query.limit) {
+        (Some(order), Some(k)) if k < matching.len() => {
+            matching = select_topk(&matching, table, order, k)?;
+        }
+        (Some(order), _) => {
+            sort_rowids(&mut matching, table, order)?;
+            if let Some(k) = query.limit {
+                matching.truncate(k);
+            }
+        }
+        (None, Some(k)) => {
+            matching.truncate(k);
+        }
+        (None, None) => {}
     }
     // Build typed rows. Missing cells surface as `Value::Null` — that
@@ -500,6 +526,99 @@ fn try_extract_equality(expr: &Expr) -> Option<(String, sqlparser::ast::Value)>
     None
 }
+/// One entry in the bounded-heap top-k path. Holds a pre-evaluated
+/// sort key + the rowid it came from. The `asc` flag inverts `Ord`
+/// so a single `BinaryHeap<HeapEntry>` works for both ASC and DESC
+/// without wrapping in `std::cmp::Reverse` at the call site:
+///
+///   - ASC LIMIT k = "k smallest": natural Ord. Max-heap top is the
+///     largest currently kept; new items smaller than top displace.
+///   - DESC LIMIT k = "k largest": Ord reversed. Max-heap top is now
+///     the smallest currently kept (under reversed Ord, smallest
+///     looks largest); new items larger than top displace.
+///
+/// In both cases the displacement test reduces to "new entry < heap top".
+struct HeapEntry {
+    key: Value,
+    rowid: i64,
+    asc: bool,
+}
+impl PartialEq for HeapEntry {
+    fn eq(&self, other: &Self) -> bool {
+        self.cmp(other) == Ordering::Equal
+    }
+}
+impl Eq for HeapEntry {}
+impl PartialOrd for HeapEntry {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+impl Ord for HeapEntry {
+    fn cmp(&self, other: &Self) -> Ordering {
+        let raw = compare_values(Some(&self.key), Some(&other.key));
+        if self.asc { raw } else { raw.reverse() }
+    }
+}
+/// Bounded-heap top-k selection. Returns at most `k` rowids in the
+/// caller's desired order (ascending key for `order.ascending`,
+/// descending otherwise).
+///
+/// O(N log k) where N = `matching.len()`. Caller must check
+/// `k < matching.len()` for this to be a win — for k ≥ N the
+/// `sort_rowids` full-sort path is the same asymptotic cost without
+/// the heap overhead.
+fn select_topk(
+    matching: &[i64],
+    table: &Table,
+    order: &OrderByClause,
+    k: usize,
+) -> Result<Vec<i64>> {
+    use std::collections::BinaryHeap;
+    if k == 0 || matching.is_empty() {
+        return Ok(Vec::new());
+    }
+    let mut heap: BinaryHeap<HeapEntry> = BinaryHeap::with_capacity(k + 1);
+    for &rowid in matching {
+        let key = eval_expr(&order.expr, table, rowid)?;
+        let entry = HeapEntry {
+            key,
+            rowid,
+            asc: order.ascending,
+        };
+        if heap.len() < k {
+            heap.push(entry);
+        } else {
+            // peek() returns the largest under our direction-aware Ord
+            // — the worst entry currently kept. Displace it iff the
+            // new entry is "better" (i.e. compares Less).
+            if entry < *heap.peek().unwrap() {
+                heap.pop();
+                heap.push(entry);
+            }
+        }
+    }
+    // `into_sorted_vec` returns ascending under our direction-aware Ord:
+    //   ASC: ascending by raw key (what we want)
+    //   DESC: ascending under reversed Ord = descending by raw key (what
+    //         we want for an ORDER BY DESC LIMIT k result)
+    Ok(heap
+        .into_sorted_vec()
+        .into_iter()
+        .map(|e| e.rowid)
+        .collect())
+}
 fn sort_rowids(rowids: &mut [i64], table: &Table, order: &OrderByClause) -> Result<()> {
     // Phase 7b: ORDER BY now accepts any expression (column ref,
     // arithmetic, function call, …). Pre-compute the sort key for
@@ -1063,4 +1182,240 @@ mod tests {
         let cos = vec_distance_cosine(&a, &b).unwrap();
         assert!(approx_eq(dot, cos - 1.0, 1e-5));
     }
+    // -----------------------------------------------------------------
+    // Phase 7c — bounded-heap top-k correctness + benchmark
+    // -----------------------------------------------------------------
+    use crate::sql::db::database::Database;
+    use crate::sql::parser::select::SelectQuery;
+    use sqlparser::dialect::SQLiteDialect;
+    use sqlparser::parser::Parser;
+    /// Builds a `docs(id INTEGER PK, score REAL)` table with N rows of
+    /// distinct positive scores so top-k tests aren't sensitive to
+    /// tie-breaking (heap is unstable; full-sort is stable; we want
+    /// both to agree without arguing about equal-score row order).
+    ///
+    /// **Why positive scores:** the INSERT parser doesn't currently
+    /// handle `Expr::UnaryOp(Minus, …)` for negative number literals
+    /// (it would parse `-3.14` as a unary expression and the value
+    /// extractor would skip it). That's a pre-existing bug, out of
+    /// scope for 7c. Using the Knuth multiplicative hash gives us
+    /// distinct positive scrambled values without dancing around the
+    /// negative-literal limitation.
+    fn seed_score_table(n: usize) -> Database {
+        let mut db = Database::new("tempdb".to_string());
+        crate::sql::process_command(
+            "CREATE TABLE docs (id INTEGER PRIMARY KEY, score REAL);",
+            &mut db,
+        )
+        .expect("create");
+        for i in 0..n {
+            // Knuth multiplicative hash mod 1_000_000 — distinct,
+            // dense in [0, 999_999], no collisions for n up to ~tens
+            // of thousands.
+            let score = ((i as u64).wrapping_mul(2_654_435_761) % 1_000_000) as f64;
+            let sql = format!("INSERT INTO docs (score) VALUES ({score});");
+            crate::sql::process_command(&sql, &mut db).expect("insert");
+        }
+        db
+    }
+    /// Helper: parses an SQL SELECT into a SelectQuery so we can drive
+    /// `select_topk` / `sort_rowids` directly without the rest of the
+    /// process_command pipeline.
+    fn parse_select(sql: &str) -> SelectQuery {
+        let dialect = SQLiteDialect {};
+        let mut ast = Parser::parse_sql(&dialect, sql).expect("parse");
+        let stmt = ast.pop().expect("one statement");
+        SelectQuery::new(&stmt).expect("select-query")
+    }
+    #[test]
+    fn topk_matches_full_sort_asc() {
+        // Build N=200, top-k=10. Bounded heap output must equal
+        // full-sort-then-truncate output (both produce ASC order).
+        let db = seed_score_table(200);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score ASC LIMIT 10;");
+        let order = q.order_by.as_ref().unwrap();
+        let all_rowids = table.rowids();
+        // Full-sort path
+        let mut full = all_rowids.clone();
+        sort_rowids(&mut full, table, order).unwrap();
+        full.truncate(10);
+        // Bounded-heap path
+        let topk = select_topk(&all_rowids, table, order, 10).unwrap();
+        assert_eq!(topk, full, "top-k via heap should match full-sort+truncate");
+    }
+    #[test]
+    fn topk_matches_full_sort_desc() {
+        // Same with DESC — verifies the direction-aware Ord wrapper.
+        let db = seed_score_table(200);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score DESC LIMIT 10;");
+        let order = q.order_by.as_ref().unwrap();
+        let all_rowids = table.rowids();
+        let mut full = all_rowids.clone();
+        sort_rowids(&mut full, table, order).unwrap();
+        full.truncate(10);
+        let topk = select_topk(&all_rowids, table, order, 10).unwrap();
+        assert_eq!(
+            topk, full,
+            "top-k DESC via heap should match full-sort+truncate"
+        );
+    }
+    #[test]
+    fn topk_k_larger_than_n_returns_everything_sorted() {
+        // The executor branches off to the full-sort path when k >= N,
+        // but if a caller invokes select_topk directly with k > N, it
+        // should still produce all-sorted output (no truncation
+        // because we don't have N items to truncate to k).
+        let db = seed_score_table(50);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score ASC LIMIT 1000;");
+        let order = q.order_by.as_ref().unwrap();
+        let topk = select_topk(&table.rowids(), table, order, 1000).unwrap();
+        assert_eq!(topk.len(), 50);
+        // All scores in ascending order.
+        let scores: Vec<f64> = topk
+            .iter()
+            .filter_map(|r| match table.get_value("score", *r) {
+                Some(Value::Real(f)) => Some(f),
+                _ => None,
+            })
+            .collect();
+        assert!(scores.windows(2).all(|w| w[0] <= w[1]));
+    }
+    #[test]
+    fn topk_k_zero_returns_empty() {
+        let db = seed_score_table(10);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score ASC LIMIT 1;");
+        let order = q.order_by.as_ref().unwrap();
+        let topk = select_topk(&table.rowids(), table, order, 0).unwrap();
+        assert!(topk.is_empty());
+    }
+    #[test]
+    fn topk_empty_input_returns_empty() {
+        let db = seed_score_table(0);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score ASC LIMIT 5;");
+        let order = q.order_by.as_ref().unwrap();
+        let topk = select_topk(&[], table, order, 5).unwrap();
+        assert!(topk.is_empty());
+    }
+    #[test]
+    fn topk_works_through_select_executor_with_distance_function() {
+        // Integration check that the executor actually picks the
+        // bounded-heap path on a KNN-shaped query and produces the
+        // correct top-k.
+        let mut db = Database::new("tempdb".to_string());
+        crate::sql::process_command(
+            "CREATE TABLE docs (id INTEGER PRIMARY KEY, e VECTOR(2));",
+            &mut db,
+        )
+        .unwrap();
+        // Five rows with distinct distances from probe [1.0, 0.0]:
+        //   id=1 [1.0, 0.0]   distance=0
+        //   id=2 [2.0, 0.0]   distance=1
+        //   id=3 [0.0, 3.0]   distance=√(1+9) = √10 ≈ 3.16
+        //   id=4 [1.0, 4.0]   distance=4
+        //   id=5 [10.0, 10.0] distance=√(81+100) ≈ 13.45
+        for v in &[
+            "[1.0, 0.0]",
+            "[2.0, 0.0]",
+            "[0.0, 3.0]",
+            "[1.0, 4.0]",
+            "[10.0, 10.0]",
+        ] {
+            crate::sql::process_command(&format!("INSERT INTO docs (e) VALUES ({v});"), &mut db)
+                .unwrap();
+        }
+        let resp = crate::sql::process_command(
+            "SELECT id FROM docs ORDER BY vec_distance_l2(e, [1.0, 0.0]) ASC LIMIT 3;",
+            &mut db,
+        )
+        .unwrap();
+        // Top-3 closest to [1.0, 0.0] are id=1, id=2, id=3 (in that order).
+        // The status message tells us how many rows came back.
+        assert!(resp.contains("3 rows returned"), "got: {resp}");
+    }
+    /// Manual benchmark — not run by default. Recommended invocation:
+    ///
+    ///     cargo test -p sqlrite-engine --lib topk_benchmark --release \
+    ///         -- --ignored --nocapture
+    ///
+    /// (`--release` matters: Rust's optimized sort gets very fast under
+    /// optimization, so the heap's relative advantage is best observed
+    /// against a sort that's also been optimized.)
+    ///
+    /// Measured numbers on an Apple Silicon laptop with N=10_000 + k=10:
+    ///   - bounded heap:    ~820µs
+    ///   - full sort+trunc: ~1.5ms
+    ///   - ratio:           ~1.8×
+    ///
+    /// The advantage is real but moderate at this size because the sort
+    /// key here is a single REAL column read (cheap) and Rust's sort_by
+    /// has a very low constant factor. The asymptotic O(N log k) vs
+    /// O(N log N) advantage scales with N and with per-row work — KNN
+    /// queries where the sort key is `vec_distance_l2(col, [...])` are
+    /// where this path really pays off, because each key evaluation is
+    /// itself O(dim) and the heap path skips the per-row evaluation
+    /// in the comparator (see `sort_rowids` for the contrast).
+    #[test]
+    #[ignore]
+    fn topk_benchmark() {
+        use std::time::Instant;
+        const N: usize = 10_000;
+        const K: usize = 10;
+        let db = seed_score_table(N);
+        let table = db.get_table("docs".to_string()).unwrap();
+        let q = parse_select("SELECT * FROM docs ORDER BY score ASC LIMIT 10;");
+        let order = q.order_by.as_ref().unwrap();
+        let all_rowids = table.rowids();
+        // Time bounded heap.
+        let t0 = Instant::now();
+        let _topk = select_topk(&all_rowids, table, order, K).unwrap();
+        let heap_dur = t0.elapsed();
+        // Time full sort + truncate.
+        let t1 = Instant::now();
+        let mut full = all_rowids.clone();
+        sort_rowids(&mut full, table, order).unwrap();
+        full.truncate(K);
+        let sort_dur = t1.elapsed();
+        let ratio = sort_dur.as_secs_f64() / heap_dur.as_secs_f64().max(1e-9);
+        println!("\n--- topk_benchmark (N={N}, k={K}) ---");
+        println!("  bounded heap:   {heap_dur:?}");
+        println!("  full sort+trunc: {sort_dur:?}");
+        println!("  speedup ratio:  {ratio:.2}×");
+        // Soft assertion. Floor is 1.4× because the cheap-key
+        // benchmark hovers around 1.8× empirically; setting this too
+        // close to the measured value risks flaky CI on slower
+        // runners. Floor of 1.4× still catches an actual regression
+        // (e.g., if select_topk became O(N²) or stopped using the
+        // heap entirely).
+        assert!(
+            ratio > 1.4,
+            "bounded heap should be substantially faster than full sort, but ratio = {ratio:.2}"
+        );
+    }
 }

sqlrite 0.1.11__tar.gz → 0.1.13__tar.gz

sqlrite 0.1.11tar.gz → 0.1.13tar.gz