PyPI - qql-cli - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

qql-cli 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

{qql_cli-2.0.0 → qql_cli-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: qql-cli
-Version: 2.0.0
-Summary: QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), WHERE clause filters, script execution, and collection dump/restore.
+Version: 2.1.0
+Summary: QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), WHERE clause filters, script execution, and collection dump/restore.
 Project-URL: Homepage, https://github.com/pavanjava/qql
 Project-URL: Repository, https://github.com/pavanjava/qql
 Project-URL: Documentation, https://pavanjava.github.io/qql
@@ -45,7 +45,7 @@ Classifier: Topic :: Utilities
 Requires-Python: >=3.12
 Requires-Dist: click>=8.1.0
 Requires-Dist: prompt-toolkit>=3.0.0
-Requires-Dist: qdrant-client[fastembed]>=1.13.0
+Requires-Dist: qdrant-client[fastembed]>=1.18.0
 Requires-Dist: rich>=13.0.0
 Description-Content-Type: text/markdown
@@ -58,7 +58,7 @@ Description-Content-Type: text/markdown
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
 [![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen)](tests/)
-Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
+Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
 ```
 qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
@@ -135,7 +135,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
 | [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
 | [SEARCH / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, hybrid, reranking, recommendations |
 | [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
-| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/binary/product), CREATE INDEX |
+| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
 | [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
 | [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library |
 | [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
@@ -162,6 +162,9 @@ RECOMMEND FROM articles POSITIVE IDS (1001, 1002) LIMIT 5
 CREATE COLLECTION articles
 CREATE COLLECTION articles HYBRID
 CREATE COLLECTION articles QUANTIZE SCALAR
+CREATE COLLECTION articles QUANTIZE TURBO
+CREATE COLLECTION articles QUANTIZE TURBO BITS 2
+CREATE COLLECTION articles QUANTIZE TURBO BITS 1.5 ALWAYS RAM
 CREATE INDEX ON COLLECTION articles FOR year TYPE integer
 SHOW COLLECTIONS
 DROP COLLECTION articles

{qql_cli-2.0.0 → qql_cli-2.1.0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
 [![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen)](tests/)
-Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
+Write `INSERT`, `SEARCH`, `RECOMMEND`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
 ```
 qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
@@ -84,7 +84,7 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
 | [INSERT / INSERT BULK](docs/insert.md) | Adding documents, batch inserts, payload types |
 | [SEARCH / RECOMMEND / Hybrid / RERANK](docs/search.md) | Semantic search, hybrid, reranking, recommendations |
 | [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
-| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/binary/product), CREATE INDEX |
+| [Collections & Quantization](docs/collections.md) | CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX |
 | [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
 | [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library |
 | [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
@@ -111,6 +111,9 @@ RECOMMEND FROM articles POSITIVE IDS (1001, 1002) LIMIT 5
 CREATE COLLECTION articles
 CREATE COLLECTION articles HYBRID
 CREATE COLLECTION articles QUANTIZE SCALAR
+CREATE COLLECTION articles QUANTIZE TURBO
+CREATE COLLECTION articles QUANTIZE TURBO BITS 2
+CREATE COLLECTION articles QUANTIZE TURBO BITS 1.5 ALWAYS RAM
 CREATE INDEX ON COLLECTION articles FOR year TYPE integer
 SHOW COLLECTIONS
 DROP COLLECTION articles

{qql_cli-2.0.0 → qql_cli-2.1.0}/docs/collections.md RENAMED Viewed

@@ -67,27 +67,38 @@ When `USING MODEL` is omitted, the collection uses the **default embedding model
 ## Quantization — QUANTIZE clause
-Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all three Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.
+Quantization reduces the memory footprint of vector collections and speeds up search at the cost of a small, controllable accuracy loss. QQL supports all four Qdrant quantization strategies via an optional `QUANTIZE` clause appended to `CREATE COLLECTION`.
-**Three strategies:**
+**Four strategies:**
-| Type | Compression | Accuracy Loss | Best For |
+| Type | Compression | Accuracy | Best For |
 |---|---|---|---|
-| `SCALAR` | 4× (float32 → int8) | < 1% | Most collections — best balance |
-| `BINARY` | 32× (float32 → 1-bit) | Higher | High-dimensional vectors (768+), speed priority |
+| `SCALAR` | 4× (float32 → int8) | < 1% loss | Most collections — best balance |
+| `TURBO` | 8–32× (4-bit to 1-bit) | Low–medium | Better recall than BINARY at same storage budget |
+| `BINARY` | 32× (float32 → 1-bit) | Higher loss | Speed priority; centered distributions only |
 | `PRODUCT` | 4× (configurable) | Variable | Memory-constrained deployments |
 **Full syntax:**
 ```
 CREATE COLLECTION <name> ... QUANTIZE SCALAR [QUANTILE <0.0–1.0>] [ALWAYS RAM]
+CREATE COLLECTION <name> ... QUANTIZE TURBO  [BITS <1|1.5|2|4>]   [ALWAYS RAM]
 CREATE COLLECTION <name> ... QUANTIZE BINARY  [ALWAYS RAM]
 CREATE COLLECTION <name> ... QUANTIZE PRODUCT [ALWAYS RAM]
 ```
-- **`QUANTILE <float>`** — (scalar only) calibration quantile for the INT8 conversion; defaults to Qdrant's built-in default (0.99) when omitted.
-- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all three quantization types.
+- **`QUANTILE <float>`** — (SCALAR only) calibration quantile for the INT8 conversion; defaults to Qdrant's built-in default (0.99) when omitted.
+- **`BITS <depth>`** — (TURBO only) bit depth passed to the Qdrant SDK:
+  - `4` — 4-bit (default when `BITS` is omitted; server applies its own default)
+  - `2` — 2-bit
+  - `1.5` — 1.5-bit
+  - `1` — 1-bit
+  > Compression ratios (8×, 16×, 24×, 32×) and recall characteristics are
+  > Qdrant server-side behaviors. QQL maps the `BITS` value to the SDK model and
+  > passes it to Qdrant; actual results depend on your Qdrant server version.
+- **`ALWAYS RAM`** — keep the **quantized** vectors in RAM at all times, regardless of the collection's `on_disk` setting. Improves search throughput at the cost of higher RAM usage for the compressed index. The original full-precision vectors are stored and managed independently of this flag. Supported by all four quantization types.
 - **`QUANTIZE`** always appears **after** all other clauses (`HYBRID`, `USING MODEL`, etc.).
 - For `PRODUCT`, the compression ratio is fixed at **4×** in this version.
+- For `TURBO`, Cosine, Dot, and Euclidean distance are supported by the Qdrant server when TurboQuant is enabled.
 - When used with `HYBRID` collections, quantization applies only to the **dense** vector.
 **Examples:**
@@ -102,6 +113,26 @@ Scalar with explicit calibration and quantized vectors pinned to RAM:
 CREATE COLLECTION research_papers QUANTIZE SCALAR QUANTILE 0.95 ALWAYS RAM
 ```
+TurboQuant — default 4-bit (8× compression, good recall):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO
+```
+TurboQuant — 2-bit (16× compression):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 2
+```
+TurboQuant — 1.5-bit (24× compression) with quantized vectors pinned to RAM:
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 1.5 ALWAYS RAM
+```
+TurboQuant — 1-bit (32× compression, same ratio as BINARY but better recall):
+```sql
+CREATE COLLECTION research_papers QUANTIZE TURBO BITS 1
+```
 Binary quantization for large high-dimensional embeddings:
 ```sql
 CREATE COLLECTION research_papers QUANTIZE BINARY
@@ -115,22 +146,29 @@ CREATE COLLECTION research_papers QUANTIZE PRODUCT ALWAYS RAM
 Combined with hybrid collection:
 ```sql
 CREATE COLLECTION research_papers HYBRID QUANTIZE SCALAR
+CREATE COLLECTION research_papers HYBRID QUANTIZE TURBO BITS 2
 ```
 Combined with a pinned model:
 ```sql
 CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE SCALAR QUANTILE 0.99
+CREATE COLLECTION research_papers USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE TURBO BITS 2
+```
+Combined with hybrid + dense model:
+```sql
+CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE TURBO
 ```
 **Valid combinations:**
-| Base form | + QUANTIZE SCALAR | + QUANTIZE BINARY | + QUANTIZE PRODUCT |
-|---|---|---|---|
-| `CREATE COLLECTION name` | ✓ | ✓ | ✓ |
-| `... HYBRID` | ✓ | ✓ | ✓ |
-| `... USING MODEL 'x'` | ✓ | ✓ | ✓ |
-| `... USING HYBRID` | ✓ | ✓ | ✓ |
-| `... USING HYBRID DENSE MODEL 'x'` | ✓ | ✓ | ✓ |
+| Base form | + SCALAR | + TURBO | + BINARY | + PRODUCT |
+|---|---|---|---|---|
+| `CREATE COLLECTION name` | ✓ | ✓ | ✓ | ✓ |
+| `... HYBRID` | ✓ | ✓ | ✓ | ✓ |
+| `... USING MODEL 'x'` | ✓ | ✓ | ✓ | ✓ |
+| `... USING HYBRID` | ✓ | ✓ | ✓ | ✓ |
+| `... USING HYBRID DENSE MODEL 'x'` | ✓ | ✓ | ✓ | ✓ |
 > INSERT and SEARCH on quantized collections work exactly the same as on non-quantized ones — no changes to INSERT or SEARCH syntax are needed.

{qql_cli-2.0.0 → qql_cli-2.1.0}/docs/scripts.md RENAMED Viewed

@@ -79,6 +79,9 @@ Export every point in a collection to a `.qql` script file. The generated file i
 **CLI usage:**
 ```bash
 qql dump <collection_name> <output.qql>
+# Override the default 50 points/INSERT BULK batch
+qql dump <collection_name> <output.qql> --batch-size 200
 ```
 **In-shell usage (inside the QQL REPL):**

{qql_cli-2.0.0 → qql_cli-2.1.0}/pyproject.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [project]
 name = "qql-cli"
-version = "2.0.0"
-description = "QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, binary, product), WHERE clause filters, script execution, and collection dump/restore."
+version = "2.1.0"
+description = "QQL is a SQL-like query language and CLI for Qdrant vector database. Write INSERT, SEARCH, RECOMMEND, DELETE, and CREATE COLLECTION statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, cross-encoder reranking, quantization (scalar, turbo, binary, product), WHERE clause filters, script execution, and collection dump/restore."
 readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.12"
@@ -37,7 +37,7 @@ classifiers = [
     "Topic :: Text Processing :: Indexing",
 ]
 dependencies = [
-    "qdrant-client[fastembed]>=1.13.0",
+    "qdrant-client[fastembed]>=1.18.0",
     "click>=8.1.0",
     "rich>=13.0.0",
     "prompt_toolkit>=3.0.0",

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/ast_nodes.py RENAMED Viewed

@@ -9,14 +9,16 @@ class QuantizationType(Enum):
     SCALAR  = "scalar"
     BINARY  = "binary"
     PRODUCT = "product"
+    TURBO   = "turbo"
 @dataclass(frozen=True)
 class QuantizationConfig:
     """Quantization settings parsed from a QUANTIZE clause."""
     type: QuantizationType
-    quantile: float | None = None   # SCALAR only; None → Qdrant default (0.99)
-    always_ram: bool = False        # all types; default False
+    quantile: float | None = None    # SCALAR only; None → Qdrant default (0.99)
+    always_ram: bool = False         # all types; default False
+    turbo_bits: float | None = None  # TURBO only; None → bits4 (Qdrant default 4-bit, 8×)
 @dataclass(frozen=True)

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/cli.py RENAMED Viewed

@@ -201,7 +201,14 @@ def execute(file: str, stop_on_error: bool) -> None:
 @main.command()
 @click.argument("collection")
 @click.argument("output", type=click.Path())
-def dump(collection: str, output: str) -> None:
+@click.option(
+    "--batch-size",
+    type=click.IntRange(min=1),
+    default=50,
+    show_default=True,
+    help="Points per INSERT BULK batch in the generated script.",
+)
+def dump(collection: str, output: str, batch_size: int) -> None:
     """Dump a collection to a .qql script file.
     OUTPUT is the path for the generated .qql file.
@@ -230,7 +237,9 @@ def dump(collection: str, output: str) -> None:
     console.print(
         f"[bold cyan]Dumping:[/bold cyan] '{collection}'  ->  {output}\n"
     )
-    written, skipped = dump_collection(collection, output, client, console, err_console)
+    written, skipped = dump_collection(
+        collection, output, client, console, err_console, batch_size=batch_size
+    )
     if written == 0 and skipped == 0:
         # collection not found — error already printed by dump_collection

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/dumper.py RENAMED Viewed

@@ -3,7 +3,8 @@
 The generated file contains:
   1. A header comment with metadata
   2. CREATE COLLECTION <name> [HYBRID]
-  3. One INSERT BULK statement per batch of _DUMP_BATCH_SIZE points
+  3. One INSERT BULK statement per batch of *batch_size* points
+     (default _DEFAULT_DUMP_BATCH_SIZE = 50, overridable via the CLI flag)
   4. A footer comment with totals
 The file is valid QQL and can be re-executed with ``qql execute <file>``.
@@ -20,7 +21,7 @@ from typing import Any
 from qdrant_client import QdrantClient
 from rich.console import Console
-_DUMP_BATCH_SIZE = 50
+_DEFAULT_DUMP_BATCH_SIZE = 50
 # ── Value serializer ──────────────────────────────────────────────────────────
@@ -81,12 +82,16 @@ def dump_collection(
     client: QdrantClient,
     console: Console,
     err_console: Console,
+    batch_size: int = _DEFAULT_DUMP_BATCH_SIZE,
 ) -> tuple[int, int]:
     """Export every point in *collection* to a .qql script at *output_path*.
     Returns ``(points_written, points_skipped)`` counts.
     Points without a ``'text'`` key are skipped and counted in *points_skipped*.
     """
+    if batch_size <= 0:
+        raise ValueError(f"batch_size must be a positive integer, got {batch_size}")
     if not client.collection_exists(collection):
         err_console.print(
             f"[bold red]Error:[/bold red] Collection '{collection}' does not exist."
@@ -100,13 +105,13 @@ def dump_collection(
     # ── First pass: count total points for the header ─────────────────────
     count_info = client.count(collection_name=collection, exact=True)
     total_points = count_info.count
-    total_batches = max(1, math.ceil(total_points / _DUMP_BATCH_SIZE))
+    total_batches = max(1, math.ceil(total_points / batch_size))
     console.print(
         f"  Collection type : [cyan]{col_type}[/cyan]\n"
         f"  Points          : [cyan]{total_points}[/cyan]\n"
         f"  Batches         : [cyan]{total_batches}[/cyan] "
-        f"([dim]{_DUMP_BATCH_SIZE} points/batch[/dim])\n"
+        f"([dim]{batch_size} points/batch[/dim])\n"
     )
     out = Path(output_path)
@@ -140,7 +145,7 @@ def dump_collection(
         while True:
             records, next_offset = client.scroll(
                 collection_name=collection,
-                limit=_DUMP_BATCH_SIZE,
+                limit=batch_size,
                 offset=offset,
                 with_payload=True,
                 with_vectors=False,
@@ -150,7 +155,7 @@ def dump_collection(
                 break
             batch_num += 1
-            batch_start = (batch_num - 1) * _DUMP_BATCH_SIZE + 1
+            batch_start = (batch_num - 1) * batch_size + 1
             batch_end = batch_start + len(records) - 1
             # Filter points that have a 'text' field

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/executor.py RENAMED Viewed

@@ -41,6 +41,9 @@ from qdrant_client.models import (
     ScalarQuantization,
     ScalarQuantizationConfig,
     ScalarType,
+    TurboQuantBitSize,
+    TurboQuantization,
+    TurboQuantQuantizationConfig,
     SearchParams,
     SparseVector,
     SparseVectorParams,
@@ -81,6 +84,7 @@ from .config import QQLConfig
 from .embedder import CrossEncoderEmbedder, Embedder, SparseEmbedder
 _RERANK_FETCH_MULTIPLIER = 4
+_HYBRID_PREFETCH_MULTIPLIER = 4
 _COLLECTION_VISIBILITY_TIMEOUT_SECONDS = 5.0
 _COLLECTION_VISIBILITY_POLL_SECONDS = 0.05
 from .exceptions import QQLRuntimeError
@@ -446,13 +450,13 @@ class Executor:
                         Prefetch(
                             query=dense_vector,
                             using="dense",
-                            limit=node.limit * 4,
+                            limit=node.limit * _HYBRID_PREFETCH_MULTIPLIER,
                             params=search_params,
                         ),
                         Prefetch(
                             query=sparse_vector,
                             using="sparse",
-                            limit=node.limit * 4,
+                            limit=node.limit * _HYBRID_PREFETCH_MULTIPLIER,
                             params=search_params,
                         ),
                     ],
@@ -846,7 +850,7 @@ class Executor:
     def _build_quantization_config(
         self, qc: QuantizationConfig
-    ) -> ScalarQuantization | BinaryQuantization | ProductQuantization:
+    ) -> ScalarQuantization | BinaryQuantization | ProductQuantization | TurboQuantization:
         """Convert a parsed QuantizationConfig to a Qdrant SDK quantization object."""
         if qc.type == QuantizationType.SCALAR:
             return ScalarQuantization(
@@ -867,6 +871,28 @@ class Executor:
                     always_ram=qc.always_ram,
                 )
             )
+        if qc.type == QuantizationType.TURBO:
+            _BITS_MAP: dict[float, TurboQuantBitSize] = {
+                4.0: TurboQuantBitSize.BITS4,
+                2.0: TurboQuantBitSize.BITS2,
+                1.5: TurboQuantBitSize.BITS1_5,
+                1.0: TurboQuantBitSize.BITS1,
+            }
+            if qc.turbo_bits is None:
+                bits_enum = None           # user omitted BITS → preserve None, server applies default
+            elif qc.turbo_bits in _BITS_MAP:
+                bits_enum = _BITS_MAP[qc.turbo_bits]
+            else:
+                raise QQLRuntimeError(
+                    f"Unsupported TURBO bit depth: {qc.turbo_bits}. "
+                    f"Valid values: 1, 1.5, 2, 4"
+                )
+            return TurboQuantization(
+                turbo=TurboQuantQuantizationConfig(
+                    bits=bits_enum,
+                    always_ram=qc.always_ram,
+                )
+            )
         raise QQLRuntimeError(f"Unknown quantization type: {qc.type}")
     def _collection_is_hybrid(self, name: str) -> bool:

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/lexer.py RENAMED Viewed

@@ -27,6 +27,8 @@ class TokenKind(Enum):
     QUANTILE = auto()
     ALWAYS   = auto()
     RAM      = auto()
+    TURBO    = auto()
+    BITS     = auto()
     CREATE = auto()
     INDEX = auto()
     ON = auto()
@@ -113,6 +115,8 @@ _KEYWORDS: dict[str, TokenKind] = {
     "QUANTILE": TokenKind.QUANTILE,
     "ALWAYS":   TokenKind.ALWAYS,
     "RAM":      TokenKind.RAM,
+    "TURBO":    TokenKind.TURBO,
+    "BITS":     TokenKind.BITS,
     "CREATE": TokenKind.CREATE,
     "INDEX": TokenKind.INDEX,
     "ON": TokenKind.ON,
@@ -209,7 +213,7 @@ class Lexer:
                     tokens.append(Token(TokenKind.NOT_EQUALS, "!=", i))
                     i += 2
                 else:
-                    raise QQLSyntaxError(f"Unexpected character '!'", i)
+                    raise QQLSyntaxError("Unexpected character '!'", i)
             elif ch == ">":
                 if i + 1 < n and query[i + 1] == "=":
                     tokens.append(Token(TokenKind.GTE, ">=", i))

{qql_cli-2.0.0 → qql_cli-2.1.0}/src/qql/parser.py RENAMED Viewed

@@ -248,8 +248,32 @@ class Parser:
                 always_ram = True
             return QuantizationConfig(type=QuantizationType.PRODUCT, always_ram=always_ram)
+        if tok.kind == TokenKind.TURBO:
+            self._advance()
+            turbo_bits: float | None = None
+            always_ram = False
+            if self._peek().kind == TokenKind.BITS:
+                self._advance()
+                bits_tok = self._peek()
+                raw = float(self._parse_number())
+                if raw not in (1.0, 1.5, 2.0, 4.0):
+                    raise QQLSyntaxError(
+                        f"BITS must be one of 1, 1.5, 2, or 4 for TURBO quantization, got {raw}",
+                        bits_tok.pos,
+                    )
+                turbo_bits = raw
+            if self._peek().kind == TokenKind.ALWAYS:
+                self._advance()
+                self._expect(TokenKind.RAM)
+                always_ram = True
+            return QuantizationConfig(
+                type=QuantizationType.TURBO,
+                turbo_bits=turbo_bits,
+                always_ram=always_ram,
+            )
         raise QQLSyntaxError(
-            f"Expected SCALAR, BINARY, or PRODUCT after QUANTIZE, got '{tok.value}'",
+            f"Expected SCALAR, BINARY, PRODUCT, or TURBO after QUANTIZE, got '{tok.value}'",
             tok.pos,
         )

{qql_cli-2.0.0 → qql_cli-2.1.0}/tests/test_dumper.py RENAMED Viewed

@@ -5,7 +5,7 @@ import pytest
 from rich.console import Console
 from qql.dumper import (
-    _DUMP_BATCH_SIZE,
+    _DEFAULT_DUMP_BATCH_SIZE,
     _is_hybrid,
     _serialize_dict,
     _serialize_value,
@@ -32,7 +32,7 @@ def _make_client(mocker, *, exists=True, hybrid=False, points=None, total=None):
     """Build a mock QdrantClient for dump tests.
     *points* is a list of payload dicts.  scroll() returns them all in one
-    batch when len(points) <= _DUMP_BATCH_SIZE, else two batches.
+    batch when len(points) <= _DEFAULT_DUMP_BATCH_SIZE, else two batches.
     """
     points = points or []
     client = mocker.MagicMock()
@@ -202,10 +202,10 @@ class TestDumpCollection:
         client.collection_exists.return_value = True
         client.get_collection.return_value.config.params.vectors = mocker.MagicMock(spec=[])
         cnt = mocker.MagicMock()
-        cnt.count = _DUMP_BATCH_SIZE + 1
+        cnt.count = _DEFAULT_DUMP_BATCH_SIZE + 1
         client.count.return_value = cnt
-        batch1 = [_make_record(mocker, {"text": f"doc {i}"}, f"id-{i}") for i in range(_DUMP_BATCH_SIZE)]
+        batch1 = [_make_record(mocker, {"text": f"doc {i}"}, f"id-{i}") for i in range(_DEFAULT_DUMP_BATCH_SIZE)]
         batch2 = [_make_record(mocker, {"text": "last doc"}, "id-last")]
         # First scroll call returns batch1 with a non-None offset; second returns batch2 + None
         client.scroll.side_effect = [
@@ -215,7 +215,7 @@ class TestDumpCollection:
         written, skipped = dump_collection("col", out, client, null_console(), null_console())
         content = (tmp_path / "dump.qql").read_text()
-        assert written == _DUMP_BATCH_SIZE + 1
+        assert written == _DEFAULT_DUMP_BATCH_SIZE + 1
         assert content.count("INSERT BULK") == 2
     def test_header_contains_collection_name(self, tmp_path, mocker):
@@ -230,3 +230,37 @@ class TestDumpCollection:
         client = _make_client(mocker, points=[{"text": "x"}])
         dump_collection("col", out, client, null_console(), null_console())
         assert (tmp_path / "sub" / "dir" / "dump.qql").exists()
+    def test_custom_batch_size_splits_pages(self, tmp_path, mocker):
+        """A batch_size of 2 over 3 points should produce two INSERT BULK blocks."""
+        out = str(tmp_path / "dump.qql")
+        client = mocker.MagicMock()
+        client.collection_exists.return_value = True
+        client.get_collection.return_value.config.params.vectors = mocker.MagicMock(spec=[])
+        cnt = mocker.MagicMock()
+        cnt.count = 3
+        client.count.return_value = cnt
+        batch1 = [_make_record(mocker, {"text": f"doc {i}"}, f"id-{i}") for i in range(2)]
+        batch2 = [_make_record(mocker, {"text": "last"}, "id-last")]
+        client.scroll.side_effect = [
+            (batch1, "offset-1"),
+            (batch2, None),
+        ]
+        written, _ = dump_collection(
+            "col", out, client, null_console(), null_console(), batch_size=2
+        )
+        content = (tmp_path / "dump.qql").read_text()
+        assert written == 3
+        assert content.count("INSERT BULK") == 2
+        # client.scroll should have been called with limit=2
+        assert client.scroll.call_args_list[0].kwargs["limit"] == 2
+    def test_invalid_batch_size_raises(self, tmp_path, mocker):
+        out = str(tmp_path / "dump.qql")
+        client = _make_client(mocker, points=[{"text": "x"}])
+        with pytest.raises(ValueError):
+            dump_collection(
+                "col", out, client, null_console(), null_console(), batch_size=0
+            )

{qql_cli-2.0.0 → qql_cli-2.1.0}/tests/test_executor.py RENAMED Viewed

@@ -1640,3 +1640,110 @@ class TestQuantizeCreate:
         node = CreateCollectionStmt(collection="articles")
         result = executor.execute(node)
         assert "quantization" not in result.message
+class TestTurboQuantCreate:
+    """Executor tests for QUANTIZE TURBO — verifies correct SDK objects are built."""
+    @pytest.fixture
+    def executor(self, cfg, mock_client):
+        return Executor(mock_client, cfg)
+    # ── TurboQuantization object is produced ──────────────────────────────
+    def test_turbo_passes_turbo_quantization(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantization
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert isinstance(kw.get("quantization_config"), TurboQuantization)
+    def test_turbo_default_bits_is_none(self, executor, mock_client):
+        """When BITS is omitted, bits must be None — preserving omission so the
+        SDK/server applies its own default rather than QQL forcing BITS4."""
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits is None
+    def test_turbo_bits2(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=2.0),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS2
+    def test_turbo_bits1_5(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=1.5),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS1_5
+    def test_turbo_bits1(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantBitSize
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=1.0),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.bits == TurboQuantBitSize.BITS1
+    def test_turbo_always_ram_true(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO, always_ram=True),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.always_ram is True
+    def test_turbo_always_ram_false_by_default(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert kw["quantization_config"].turbo.always_ram is False
+    def test_turbo_hybrid_collection_has_both_configs(self, executor, mock_client):
+        from qdrant_client.models import TurboQuantization
+        node = CreateCollectionStmt(
+            collection="articles",
+            hybrid=True,
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        executor.execute(node)
+        kw = mock_client.create_collection.call_args.kwargs
+        assert isinstance(kw.get("quantization_config"), TurboQuantization)
+        assert "sparse_vectors_config" in kw
+    def test_turbo_result_message_includes_turbo(self, executor, mock_client):
+        node = CreateCollectionStmt(
+            collection="articles",
+            quantization=QuantizationConfig(type=QuantizationType.TURBO),
+        )
+        result = executor.execute(node)
+        assert "turbo" in result.message
+    def test_turbo_invalid_bits_at_executor_raises(self, executor, mock_client):
+        """An unexpected turbo_bits value that bypasses parser validation must
+        raise QQLRuntimeError explicitly instead of silently coercing to BITS4."""
+        from qql.exceptions import QQLRuntimeError as QQLErr
+        qc = QuantizationConfig(type=QuantizationType.TURBO, turbo_bits=3.0)
+        with pytest.raises(QQLErr, match="Unsupported TURBO bit depth"):
+            executor._build_quantization_config(qc)

{qql_cli-2.0.0 → qql_cli-2.1.0}/tests/test_parser.py RENAMED Viewed

@@ -1031,3 +1031,83 @@ class TestQuantizeCreate:
     def test_scalar_quantile_integer_above_one_raises(self):
         with pytest.raises(QQLSyntaxError):
             parse("CREATE COLLECTION articles QUANTIZE SCALAR QUANTILE 2")
+class TestTurboQuantCreate:
+    """Parser tests for QUANTIZE TURBO [BITS n] [ALWAYS RAM]."""
+    # ── Default / no options ──────────────────────────────────────────────
+    def test_turbo_no_options(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO")
+        assert node.quantization is not None
+        assert node.quantization.type == QuantizationType.TURBO
+        assert node.quantization.turbo_bits is None
+        assert node.quantization.always_ram is False
+    # ── BITS variants ─────────────────────────────────────────────────────
+    def test_turbo_bits4(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 4")
+        assert node.quantization.type == QuantizationType.TURBO
+        assert node.quantization.turbo_bits == 4.0
+    def test_turbo_bits2(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 2")
+        assert node.quantization.turbo_bits == 2.0
+    def test_turbo_bits1_5(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 1.5")
+        assert node.quantization.turbo_bits == 1.5
+    def test_turbo_bits1(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 1")
+        assert node.quantization.turbo_bits == 1.0
+    # ── ALWAYS RAM ────────────────────────────────────────────────────────
+    def test_turbo_always_ram_no_bits(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO ALWAYS RAM")
+        assert node.quantization.type == QuantizationType.TURBO
+        assert node.quantization.always_ram is True
+        assert node.quantization.turbo_bits is None
+    def test_turbo_bits_and_always_ram(self):
+        node = parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 2 ALWAYS RAM")
+        assert node.quantization.turbo_bits == 2.0
+        assert node.quantization.always_ram is True
+    # ── Composed with other clauses ───────────────────────────────────────
+    def test_turbo_with_hybrid_shorthand(self):
+        node = parse("CREATE COLLECTION articles HYBRID QUANTIZE TURBO")
+        assert node.hybrid is True
+        assert node.quantization.type == QuantizationType.TURBO
+    def test_turbo_with_using_hybrid(self):
+        node = parse("CREATE COLLECTION articles USING HYBRID QUANTIZE TURBO BITS 2")
+        assert node.hybrid is True
+        assert node.quantization.turbo_bits == 2.0
+    def test_turbo_with_model(self):
+        node = parse("CREATE COLLECTION articles USING MODEL 'BAAI/bge-base-en-v1.5' QUANTIZE TURBO BITS 1.5")
+        assert node.model == "BAAI/bge-base-en-v1.5"
+        assert node.quantization.type == QuantizationType.TURBO
+        assert node.quantization.turbo_bits == 1.5
+    def test_turbo_with_hybrid_dense_model(self):
+        node = parse("CREATE COLLECTION articles USING HYBRID DENSE MODEL 'x' QUANTIZE TURBO BITS 1 ALWAYS RAM")
+        assert node.hybrid is True
+        assert node.model == "x"
+        assert node.quantization.turbo_bits == 1.0
+        assert node.quantization.always_ram is True
+    # ── Error cases ───────────────────────────────────────────────────────
+    def test_turbo_invalid_bits_raises(self):
+        with pytest.raises(QQLSyntaxError):
+            parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 3")
+    def test_turbo_invalid_bits_float_raises(self):
+        with pytest.raises(QQLSyntaxError):
+            parse("CREATE COLLECTION articles QUANTIZE TURBO BITS 0.5")