RubyGems - rails-paradedb - Versions diffs - 0.1.0 → 0.2.0 - Mend

rails-paradedb 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +47 -6
data/README.md +130 -196
data/lib/parade_db/aggregations.rb +112 -1
data/lib/parade_db/arel/README.md +28 -21
data/lib/parade_db/arel/builder.rb +265 -25
data/lib/parade_db/arel/nodes.rb +70 -8
data/lib/parade_db/arel/predications.rb +93 -30
data/lib/parade_db/arel/visitor.rb +32 -1
data/lib/parade_db/diagnostics.rb +78 -0
data/lib/parade_db/migration_helpers.rb +10 -12
data/lib/parade_db/model.rb +51 -15
data/lib/parade_db/proximity.rb +19 -0
data/lib/parade_db/query.rb +14 -0
data/lib/parade_db/search_methods.rb +370 -49
data/lib/parade_db/tokenizer_sql.rb +21 -0
data/lib/parade_db/version.rb +1 -1
data/lib/parade_db.rb +23 -0
metadata +35 -13

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: aca8f9d891571b983de00268880360903e8e391e95e8272e538f67d033092c57
-  data.tar.gz: 6194f019217d69935cbe6a3e1408514f45419cfe94e359240de362b0ecb01639
+  metadata.gz: be1908cb2b7b8e9062ac3a567ac2cb27ee4e90a62e5137ff09aa00345e805576
+  data.tar.gz: 0ee0f049473df3ab660b86a84f494885fc95f1021c7ab258c57d9278083f993e
 SHA512:
-  metadata.gz: 2e9577cbb0203e5b053edeffa1cba33765b8b74c1f8245f0988022d5069a51f667bc9d19bedfeb52f26187c09b3ae32ab43e0f2bf7ad07eba4513b05bf4f601f
-  data.tar.gz: 155fc1086655a107ee6d28704af95b02ed8fc181f9be8d2ec22df34b8e5d47ef944ab89434bdf2e4a7323aec7411787dd09cd01050f6f388b8bf8c6138698975
+  metadata.gz: 9038e6a1fa469c4e0de2d0ba10949074d5c995943d655216626590bca0f912a9e9381f5054ae5f647df7d699cef56586b1491e837d390908038c3a7f2142315b
+  data.tar.gz: 8fbbec4ccec76d6c1591e773dc72e06975b5b176ec7cc91ffcfe09200ccad9b892ecf2ab8c616e95190b180eed31d03df54e62a7b5d90d30b43df070b7eee48c

data/CHANGELOG.md CHANGED Viewed

@@ -1,13 +1,53 @@
 # Changelog
-<!-- markdownlint-disable MD024 -->
-All notable changes to this project will be documented in this file.
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
+## [0.2.0] - 2026-03-13
+### Added
+- Rails 7.2 support and CI coverage
+- New search/query APIs: `regex_phrase`, `phrase_prefix`, `parse`,
+  grouped `aggregate_by`, and `ParadeDB::Query.regex`
+- Expanded snippet support with `with_snippets` and
+  `with_snippet_positions`
+- ParadeDB diagnostics helpers:
+  `paradedb_indexes`, `paradedb_index_segments`,
+  `paradedb_verify_index`, and `paradedb_verify_all_indexes`
+- Additional aggregation helpers:
+  `percentiles`, `histogram`, `date_histogram`, `top_hits`, and
+  `filtered`
+- Support for passing regexes into proximity queries using
+  `ParadeDB.regex_term`
+### Changed
+- Fuzzy search controls are now flattened across the relation and Arel
+  DSLs with direct `distance`, `prefix`, and
+  `transposition_cost_one` options
+- `matching_all` and `matching_any` now accept explicit `tokenizer:`
+  overrides
+- Runtime index validation now includes index-class discovery, drift
+  checks, indexed-field validation, and model helpers for
+  `paradedb_index_classes`, `paradedb_indexed_fields`,
+  `paradedb_key_field`, and `paradedb_index_name`
+- Facet and aggregation APIs now support `exact:` controls for exact
+  versus windowed execution
+- README, examples, and Arel documentation were expanded to cover the
+  newer query, snippet, aggregation, and diagnostics APIs
+### Fixed
+- Search/runtime tokenizer handling now renders tokenizer SQL safely and
+  validates unsupported tokenizer and facet combinations earlier
+### Removed
+- **BREAKING**: `near_regex` has been removed in favor of calling
+  `near` with a regex argument using `ParadeDB.regex_term`
 ## [0.1.0] - 2026-02-07
 ### Added
@@ -50,5 +90,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Schema dump/load round-trip for tokenizer configuration and index options
   (including `target_segment_count`)
-[Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.1.0...HEAD
+[Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.2.0...HEAD
+[0.2.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.2.0
 [0.1.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.1.0

data/README.md CHANGED Viewed

@@ -3,280 +3,214 @@
 [![Gem Version](https://img.shields.io/gem/v/rails-paradedb)](https://rubygems.org/gems/rails-paradedb)
 [![CI](https://github.com/paradedb/rails-paradedb/actions/workflows/ci.yml/badge.svg)](https://github.com/paradedb/rails-paradedb/actions/workflows/ci.yml)
 [![License](https://img.shields.io/github/license/paradedb/rails-paradedb?color=blue)](https://github.com/paradedb/rails-paradedb?tab=MIT-1-ov-file#readme)
-[![Slack URL](https://img.shields.io/badge/Join%20Slack-purple?logo=slack&link=https%3A%2F%2Fjoin.slack.com%2Ft%2Fparadedbcommunity%2Fshared_invite%2Fzt-32abtyjg4-yoYoi~RPh9MSW8tDbl0BQw)](https://join.slack.com/t/paradedbcommunity/shared_invite/zt-32abtyjg4-yoYoi~RPh9MSW8tDbl0BQw)
-[![X URL](https://img.shields.io/twitter/url?url=https%3A%2F%2Ftwitter.com%2Fparadedb&label=Follow%20%40paradedb)](https://x.com/paradedb)
-[ParadeDB](https://paradedb.com) — simple, Elastic-quality search for Postgres — **BM25 full-text** integration for ActiveRecord.
+ActiveRecord integration for [ParadeDB](https://paradedb.com): BM25 full-text search, scoring, snippets, facets, and aggregations in PostgreSQL.
-For complete ParadeDB documentation, see [docs.paradedb.com](https://docs.paradedb.com/).
+ParadeDB docs: <https://docs.paradedb.com>
-## Requirements & Compatibility
+## Requirements
-| Component  | Version                          |
-|------------|----------------------------------|
-| Ruby       | 3.2+                             |
-| Rails      | 8.1+                             |
-| ParadeDB   | 0.21.0+                          |
-| PostgreSQL | 17+    (with ParadeDB extension) |
-**Note**: This gem requires ActiveRecord with PostgreSQL. The DSL and Arel layer delegate SQL value quoting to `ActiveRecord::Base.connection.quote` for type safety and proper escaping.
+- Ruby 3.2+
+- Rails 7.2+
+- PostgreSQL 17+ with `pg_search` (ParadeDB)
 ## Installation
-Add to your Gemfile:
 ```ruby
 gem "rails-paradedb"
 ```
-Then run:
 ```bash
 bundle install
 ```
 ## Quick Start
-Enable ParadeDB on a model:
 ```ruby
-class Product < ApplicationRecord
+class MockItem < ActiveRecord::Base
   include ParadeDB::Model
+  self.table_name = "mock_items"
+  self.primary_key = "id"
+  self.has_paradedb_index = true
 end
 ```
-Search with a simple query:
 ```ruby
-Product.search(:description).matching_all("shoes")
-```
-Check out some examples:
-- [Quick Start](examples/quickstart/quickstart.rb)
-- [Faceted Search](examples/faceted_search/faceted_search.rb)
-- [Autocomplete](examples/autocomplete/autocomplete.rb)
-- [More Like This](examples/more_like_this/more_like_this.rb)
-- [RAG](examples/rag/rag.rb)
-## BM25 Index
-Generate an index class and migration:
-```bash
-rails g parade_db:index Product description category rating
+MockItem.search(:description).matching_all("running shoes")
+MockItem.search(:description).matching_any("wireless", "bluetooth")
+MockItem.search(:description).term("electronics")
 ```
-Or define one manually:
+## Index Definition
 ```ruby
-class ProductIndex < ParadeDB::Index
-  self.table_name = :products
+class MockItemIndex < ParadeDB::Index
+  self.table_name = :mock_items
   self.key_field = :id
-  self.index_options = { target_segment_count: 17 }
+  self.index_name = :search_idx
   self.fields = {
-    id: {},
-    description: {
-      tokenizers: [
-        { tokenizer: :literal },
-        { tokenizer: :simple, alias: "description_simple", filters: [:lowercase] }
-      ]
-    },
-    category: { tokenizer: :literal, alias: "category" },
-    "metadata->>'color'": { tokenizer: :literal, alias: "metadata_color" },
-    metadata: { fast: true, expand_dots: false }
+    id: nil,
+    description: nil,
+    category: nil,
+    rating: nil,
+    in_stock: nil,
+    created_at: nil,
+    metadata: nil,
+    weight_range: nil
   }
 end
 ```
-Field config supports:
-- `tokenizer` for a single tokenizer entry.
-- `tokenizers` for multiple tokenizer entries on the same source field.
-- `args`, `named_args`, `filters`, `stemmer`, `alias` inside tokenizer entries.
-- field options such as `fast`, `record`, `normalizer`, `expand_dots`.
+For text or JSON fields you plan to use in Top K queries, facets, grouped
+aggregations, or `top_hits` docvalue fields, use `:literal` or
+`:literal_normalized`.
-Create/remove it in a migration:
+Create in migration:
 ```ruby
-class AddProductBm25Index < ActiveRecord::Migration[8.1]
+class AddMockItemBm25Index < ActiveRecord::Migration[8.1]
   def up
-    create_paradedb_index(ProductIndex, if_not_exists: true)
+    create_paradedb_index(MockItemIndex, if_not_exists: true)
   end
   def down
-    remove_bm25_index :products, name: :products_bm25_idx, if_exists: true
+    remove_bm25_index :mock_items, name: :search_idx, if_exists: true
   end
 end
 ```
-Available migration helpers:
-- `create_paradedb_index(index_class_or_name, if_not_exists: false)`
-- `replace_paradedb_index(index_class_or_name)`
-- `add_bm25_index(table, fields:, key_field:, name: nil, index_options: nil, if_not_exists: false)`
-- `remove_bm25_index(table, name: nil, if_exists: false)`
-- `reindex_bm25(table, name: nil, concurrently: false)`
-### Index Validation Mode
-Runtime index drift validation is controlled by `ParadeDB.index_validation_mode`.
-Default is `:off` (no runtime drift checks).
-```ruby
-ParadeDB.index_validation_mode = :warn  # log drift warnings
-ParadeDB.index_validation_mode = :raise # raise ParadeDB::IndexDriftError on drift
-ParadeDB.index_validation_mode = :off   # disable drift checks (default)
-```
-## Query Types
-For advanced options, see [ParadeDB Query Builder Documentation](https://docs.paradedb.com/documentation/query-builder/overview) and the runnable scripts in [`examples/`](examples).
+## Query API
 ```ruby
 # Full-text
-Product.search(:description).matching_all("running shoes")
-Product.search(:description).matching_any("wireless", "bluetooth")
-Product.search(:description).phrase("running shoes", slop: 2)
-Product.search(:description).fuzzy("runing", distance: 2, prefix: true, boost: 1.5)
-Product.search(:description).regex("run.*")
-Product.search(:description).parse("running AND shoes", lenient: true)
-# Exact token matching
-Product.search(:category).term("electronics", boost: 2.0)
-Product.search(:category).term_set("electronics", "audio")
-# Other predicates
-Product.search(:description).excluding("cheap", "budget")
-Product.search(:description).near("running", "shoes", distance: 3)
-Product.search(:description).phrase_prefix("run", "sh")
-Product.search(:id).match_all
-Product.search(:id).exists
-Product.search(:rating).range(gte: 3, lt: 5)
-# Similarity
-Product.more_like_this(42, fields: [:description])
+MockItem.search(:description).matching_all("running shoes")
+MockItem.search(:description).matching_any("wireless bluetooth")
+# Query-time tokenizer override
+MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespace")
+MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespace('lowercase=false')")
+# Fuzzy options on match/term
+MockItem.search(:description).matching_any("runing shose", distance: 1)
+MockItem.search(:description).matching_all("runing", distance: 1, prefix: true)
+MockItem.search(:description).term("shose", distance: 1, transposition_cost_one: true)
+# Other query types
+MockItem.search(:description).phrase("running shoes", slop: 2)
+MockItem.search(:description).phrase("running shoes", tokenizer: "whitespace")
+MockItem.search(:description).phrase(%w[running shoes])
+MockItem.search(:description).regex("run.*")
+MockItem.search(:description).near("running", anchor: "shoes", distance: 3)
+MockItem.search(:description).near("running", anchor: "shoes", distance: 3, ordered: true)
+MockItem.search(:description).near(ParadeDB.regex_term("run.*"), anchor: "shoes", distance: 3)
+MockItem.search(:description).near("running", "trail", anchor: "shoes", distance: 3)
+MockItem.search(:description).near(ParadeDB.regex_term("run.*"), "trail", anchor: "shoes", distance: 3)
+MockItem.search(:description).regex_phrase("run.*", "shoes")
+MockItem.search(:description).phrase_prefix("run", "sh")
+MockItem.search(:description).phrase_prefix("run", "sh", max_expansion: 100)
+MockItem.search(:description).parse("running AND shoes", lenient: true)
+MockItem.search(:description).parse("running shoes", conjunction_mode: true)
+MockItem.search(:id).match_all
+MockItem.search(:id).exists
+MockItem.search(:rating).range(gte: 3, lt: 5)
+MockItem.search(:weight_range).range_term("(10, 12]", relation: "Intersects")
+MockItem.more_like_this(42, fields: [:description])
 ```
-## Annotations
-See [BM25 Scoring](https://docs.paradedb.com/documentation/sorting/score) and [Highlighting](https://docs.paradedb.com/documentation/full-text/highlight) for full function details.
+## Scoring and Highlighting
 ```ruby
-Product.search(:description).matching_all("shoes").with_score
-Product.search(:description).matching_all("shoes").with_snippet(:description, start_tag: "<b>", end_tag: "</b>", max_chars: 80)
-Product.search(:description).matching_all("running").with_snippets(:description, max_chars: 15, limit: 2, offset: 0, sort_by: :position)
-Product.search(:description).matching_all("running").with_snippet_positions(:description)
+results = MockItem.search(:description)
+                 .matching_all("shoes")
+                 .with_score
+                 .order(search_score: :desc)
+MockItem.search(:description)
+       .matching_all("shoes")
+       .with_snippet(:description, start_tag: "<b>", end_tag: "</b>", max_chars: 80)
+MockItem.search(:description)
+       .matching_all("running")
+       .with_snippets(:description, max_chars: 15, limit: 2, offset: 0, sort_by: :position)
+MockItem.search(:description)
+       .matching_all("running")
+       .with_snippet_positions(:description)
 ```
-## Faceted Search
-For supported aggregate functions and JSON shapes, see [ParadeDB Aggregations Documentation](https://docs.paradedb.com/documentation/aggregates/overview).
-`with_facets(...)` requires:
-- an existing ParadeDB predicate
-- `.order(...)`
-- `.limit(...)`
+## Facets and Aggregations
 ```ruby
-# Rows + facets
-relation = Product.search(:description).matching_all("shoes")
+# Rows + facets (requires order + limit)
+relation = MockItem.search(:description)
+                  .matching_all("shoes")
                   .with_facets(:category, size: 10)
                   .order(:id)
                   .limit(10)
 rows = relation.to_a
 facets = relation.facets
-# Facets only
-facets_only = Product.search(:description).matching_all("shoes")
-                     .facets(:category)
-# Named aggregation helpers
-aggs = Product.search(:description).matching_all("shoes")
-              .facets_agg(
-                docs: ParadeDB::Aggregations.value_count(:id),
-                avg_rating: ParadeDB::Aggregations.avg(:rating)
-              )
-```
+# Non-exact window facets
+relation = MockItem.search(:description)
+                  .matching_all("shoes")
+                  .with_facets(:category, size: 10, exact: false)
+                  .order(:id)
+                  .limit(10)
-## ActiveRecord Integration
+# Facets-only aggregate
+MockItem.search(:description).matching_all("shoes").facets(:category)
-ParadeDB scopes compose with regular ActiveRecord chaining:
+# Named aggregations
+MockItem.search(:description).matching_all("shoes").facets_agg(
+  docs: ParadeDB::Aggregations.value_count(:id),
+  avg_rating: ParadeDB::Aggregations.avg(:rating)
+)
-```ruby
-Product.search(:description).matching_all("running")
-       .search(:category).term("footwear")
-       .where(in_stock: true)
-       .order(:id)
-       .limit(10)
+# Non-exact window named aggregations
+MockItem.search(:description).matching_all("shoes").with_agg(
+  exact: false,
+  docs: ParadeDB::Aggregations.value_count(:id)
+).order(:id).limit(10)
 ```
-### Method Name Conflicts
+## Diagnostics Helpers
-This gem defines a model class method named `.search`.
-If your application already defines `.search`, rails-paradedb will **not** override it.
-Use `.paradedb_search` instead:
+Ruby helpers:
 ```ruby
-Product.paradedb_search(:description).matching_all("shoes")
+ParadeDB.paradedb_indexes
+ParadeDB.paradedb_index_segments("search_idx")
+ParadeDB.paradedb_verify_index("search_idx", sample_rate: 0.1)
+ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_idx")
 ```
-## Arel Layer
-See the dedicated Arel guide: [`lib/parade_db/arel/README.md`](lib/parade_db/arel/README.md).
-## Security
-### SQL Injection Protection
-rails-paradedb uses **ActiveRecord's quoting** for all search terms:
-**Quoting Strategy:**
-- All user input is quoted via `ActiveRecord::Base.connection.quote`
-- Search terms use Arel's `Nodes.build_quoted()` for type-safe SQL generation
-- This prevents SQL injection while maintaining compatibility with ParadeDB's full-text operators
-**Implementation Details:**
-All values flow through ActiveRecord's connection adapter quoting, which handles:
+Rake tasks:
-- String escaping (`'` → `''`)
-- Type coercion (booleans, numbers)
-- NULL handling
-**Safety Guarantee:**
-```ruby
-# Even malicious input is safely escaped
-user_query = "'; DROP TABLE products; --"
-Product.search(:description).matching_all(user_query)
-# The query is escaped and treated as a literal search term
+```bash
+rake paradedb:diagnostics:indexes
+rake "paradedb:diagnostics:index_segments[search_idx]"
+rake "paradedb:diagnostics:verify_index[search_idx]" SAMPLE_RATE=0.1
+rake paradedb:diagnostics:verify_all_indexes INDEX_PATTERN=search_idx
 ```
-## Documentation
-- **ParadeDB Official Docs**: <https://docs.paradedb.com>
-- **ParadeDB Website**: <https://paradedb.com>
-## Contributing
-Contribution and local development workflow live in [`CONTRIBUTING.md`](CONTRIBUTING.md).
+Note: availability depends on your installed `pg_search` version.
-## Support
+## Examples
-If you're missing a feature or have found a bug, please open a
-[GitHub Issue](https://github.com/paradedb/rails-paradedb/issues/new/choose).
-To get community support, you can:
+- [Quick Start](examples/quickstart/quickstart.rb)
+- [Faceted Search](examples/faceted_search/faceted_search.rb)
+- [Autocomplete](examples/autocomplete/autocomplete.rb)
+- [More Like This](examples/more_like_this/more_like_this.rb)
+- [Hybrid RRF](examples/hybrid_rrf/hybrid_rrf.rb)
+- [RAG](examples/rag/rag.rb)
-- Post a question in the [ParadeDB Slack Community](https://join.slack.com/t/paradedbcommunity/shared_invite/zt-32abtyjg4-yoYoi~RPh9MSW8tDbl0BQw)
-- Ask for help on our [GitHub Discussions](https://github.com/paradedb/paradedb/discussions)
+## Contributing
-If you need commercial support, please [contact the ParadeDB team](mailto:sales@paradedb.com).
+See [CONTRIBUTING.md](CONTRIBUTING.md).
 ## License
-rails-paradedb is licensed under the [MIT License](LICENSE).
+MIT

data/lib/parade_db/aggregations.rb CHANGED Viewed

@@ -3,6 +3,19 @@
 module ParadeDB
   # Typed helpers for building agg JSON payloads passed to pdb.agg(...).
   module Aggregations
+    FilteredSpec = Struct.new(:spec, :agg_filter, keyword_init: true) do
+      # Backward-compatible reader for code that accessed `filtered_spec.filter`.
+      alias filter agg_filter
+    end
+    FieldTermFilter = Struct.new(
+      :field,
+      :term,
+      :distance,
+      :prefix,
+      :transposition_cost_one,
+      keyword_init: true
+    )
     TERMS_ORDER = {
       count_desc: { "_count" => "desc" },
       count_asc: { "_count" => "asc" },
@@ -18,7 +31,7 @@ module ParadeDB
       specs.each_with_object({}) do |(alias_name, spec), payload|
         alias_key = normalize_alias(alias_name)
-        payload[alias_key] = normalize_spec(spec)
+        payload[alias_key] = normalize_named_spec(spec)
       end
     end
@@ -132,11 +145,43 @@ module ParadeDB
       }
     end
+    def top_hits(size: nil, from: nil, sort: nil, docvalue_fields: nil)
+      payload = {}
+      payload["size"] = normalize_non_negative_integer(size, "size") unless size.nil?
+      payload["from"] = normalize_non_negative_integer(from, "from") unless from.nil?
+      payload["sort"] = normalize_top_hits_sort(sort) unless sort.nil?
+      payload["docvalue_fields"] = normalize_docvalue_fields(docvalue_fields) unless docvalue_fields.nil?
+      { "top_hits" => payload }
+    end
+    def filtered(spec, filter: nil, field: nil, term: nil, distance: nil, prefix: nil, transposition_cost_one: nil)
+      normalized_spec = normalize_spec(spec)
+      normalized_filter = normalize_filter(
+        filter: filter,
+        field: field,
+        term: term,
+        distance: distance,
+        prefix: prefix,
+        transposition_cost_one: transposition_cost_one
+      )
+      FilteredSpec.new(spec: normalized_spec, agg_filter: normalized_filter)
+    end
     def metric(name, field)
       { name => { "field" => normalize_field(field) } }
     end
     private_class_method :metric
+    def normalize_named_spec(spec)
+      case spec
+      when FilteredSpec
+        FilteredSpec.new(spec: normalize_spec(spec.spec), agg_filter: spec.agg_filter)
+      else
+        normalize_spec(spec)
+      end
+    end
+    private_class_method :normalize_named_spec
     def normalize_alias(alias_name)
       value =
         case alias_name
@@ -166,6 +211,32 @@ module ParadeDB
     end
     private_class_method :normalize_spec
+    def normalize_filter(filter:, field:, term:, distance:, prefix:, transposition_cost_one:)
+      if filter
+        if !field.nil? || !term.nil?
+          raise ArgumentError, "filtered aggregation accepts either filter: or field/term arguments, not both"
+        end
+        return filter
+      end
+      if field.nil? || term.nil?
+        raise ArgumentError, "filtered aggregation requires filter: or both field: and term:"
+      end
+      normalized_distance = distance.nil? ? nil : normalize_non_negative_integer(distance, "distance")
+      normalized_prefix = normalize_boolean_option(prefix, "prefix")
+      normalized_transposition = normalize_boolean_option(transposition_cost_one, "transposition_cost_one")
+      FieldTermFilter.new(
+        field: normalize_field(field),
+        term: term,
+        distance: normalized_distance,
+        prefix: normalized_prefix,
+        transposition_cost_one: normalized_transposition
+      )
+    end
+    private_class_method :normalize_filter
     def normalize_field(field)
       case field
       when Symbol
@@ -215,6 +286,46 @@ module ParadeDB
     end
     private_class_method :normalize_bounds
+    def normalize_top_hits_sort(sort)
+      entries = Array(sort)
+      raise ArgumentError, "top_hits sort must include at least one field" if entries.empty?
+      entries.map do |entry|
+        raise ArgumentError, "top_hits sort entries must be Hash values" unless entry.is_a?(Hash)
+        raise ArgumentError, "top_hits sort entries must include exactly one field" unless entry.size == 1
+        field, direction = entry.first
+        {
+          normalize_field(field) => normalize_sort_direction(direction)
+        }
+      end
+    end
+    private_class_method :normalize_top_hits_sort
+    def normalize_docvalue_fields(fields)
+      values = Array(fields)
+      raise ArgumentError, "top_hits docvalue_fields must include at least one field" if values.empty?
+      values.map { |field| normalize_field(field) }
+    end
+    private_class_method :normalize_docvalue_fields
+    def normalize_sort_direction(direction)
+      value = direction.to_s
+      return value if %w[asc desc].include?(value)
+      raise ArgumentError, "sort direction must be 'asc' or 'desc'"
+    end
+    private_class_method :normalize_sort_direction
+    def normalize_boolean_option(value, name)
+      return nil if value.nil?
+      return value if value == true || value == false
+      raise ArgumentError, "#{name} must be true, false, or nil"
+    end
+    private_class_method :normalize_boolean_option
     def deep_stringify(value)
       case value
       when Hash