RubyGems - pgvector - Versions diffs - 0.2.2 → 0.3.1 - Mend

pgvector 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +11 -0
data/LICENSE.txt +1 -1
data/README.md +28 -3
data/lib/pgvector/bit.rb +28 -0
data/lib/pgvector/half_vector.rb +19 -0
data/lib/pgvector/pg.rb +41 -4
data/lib/pgvector/sparse_vector.rb +87 -0
data/lib/pgvector/vector.rb +25 -0
data/lib/pgvector/version.rb +1 -1
data/lib/pgvector.rb +16 -2
data/lib/sequel/extensions/pgvector.rb +5 -0
data/lib/sequel/plugins/pgvector.rb +6 -0
metadata +9 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 7485ea4be0d5be0177a972db911c696daf3438a661ddac61b08f4e8b2da3ac51
-  data.tar.gz: 2532ef79f5db88aecb681d9455e38f3e5fc1d30bde015d0a1e9daaa9fe82635e
+  metadata.gz: 3d3ef6a53417383ff7f8a2a514df78513dfe9029e524f0f624df8828fb99dba0
+  data.tar.gz: f326a21d3942079cdadc22d9a61367e7a6651ae37c5eb07a3955182f9f569ad0
 SHA512:
-  metadata.gz: be40e4c3e16dd904a200115794a8ffaa850b40c5055330bd873ec7a707164a53b29d22040defbb4dd8a9cff597d4e1ad5c659d37a580ccc201ce30a9eb17fef9
-  data.tar.gz: f5d36289b043d987920911ab08d85d7d1066039f84dcc2a24436701c06b246adcb8fd3c32e3f76f3e1604001403574c14818f91a64357ce4ebd675458e57184c
+  metadata.gz: fa4d6519685d179e3d5712cbcf1e6989ca4f98e5b0902f5061eca2be55451162f199fcc75f19841dbd8ada8c18de69ac6fa39cad545d4b5a6c542e9cdd0109ea
+  data.tar.gz: fd71245492b2ff8a06a0af60dbd12e57e44025f2662d4ea77eb7176aaaa4c4cb3b39c1e159edce92944da5b54fddbb19d1b504acd76a12c4efdb9a0fb6e797da

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,14 @@
+## 0.3.1 (2024-07-10)
+- Added support for `bit` type to pg
+- Added extension for Sequel
+## 0.3.0 (2024-06-25)
+- Added support for `halfvec` and `sparsevec` types
+- Added `taxicab`, `hamming`, and `jaccard` distances for Sequel
+- Dropped support for Ruby < 3.1
 ## 0.2.2 (2023-10-03)
 - Added `nearest_neighbors` method to datasets with Sequel

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (c) 2022-2023 Andrew Kane
+Copyright (c) 2022-2024 Andrew Kane
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/README.md CHANGED Viewed

@@ -6,7 +6,7 @@ Supports [pg](https://github.com/ged/ruby-pg) and [Sequel](https://github.com/je
 For Rails, check out [Neighbor](https://github.com/ankane/neighbor)
-[![Build Status](https://github.com/pgvector/pgvector-ruby/workflows/build/badge.svg?branch=master)](https://github.com/pgvector/pgvector-ruby/actions)
+[![Build Status](https://github.com/pgvector/pgvector-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/pgvector/pgvector-ruby/actions)
 ## Installation
@@ -26,6 +26,7 @@ Or check out some examples:
 - [Embeddings](examples/openai_embeddings.rb) with OpenAI
 - [User-based recommendations](examples/disco_user_recs.rb) with Disco
 - [Item-based recommendations](examples/disco_item_recs.rb) with Disco
+- [Bulk loading](examples/bulk_loading.rb) with `COPY`
 ## pg
@@ -35,7 +36,7 @@ Enable the extension
 conn.exec("CREATE EXTENSION IF NOT EXISTS vector")
 ```
-Register the vector type with your connection
+Optionally enable type casting for results
 ```ruby
 registry = PG::BasicTypeRegistry.new.define_default_types
@@ -43,6 +44,12 @@ Pgvector::PG.register_vector(registry)
 conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn, registry: registry)
 ```
+Create a table
+```ruby
+conn.exec("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")
+```
 Insert a vector
 ```ruby
@@ -56,6 +63,16 @@ Get the nearest neighbors to a vector
 conn.exec_params("SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5", [embedding]).to_a
 ```
+Add an approximate index
+```ruby
+conn.exec("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")
+# or
+conn.exec("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")
+```
+Use `vector_ip_ops` for inner product and `vector_cosine_ops` for cosine distance
 ## Sequel
 Enable the extension
@@ -93,7 +110,7 @@ Get the nearest neighbors to a record
 item.nearest_neighbors(:embedding, distance: "euclidean").limit(5)
 ```
-Also supports `inner_product` and `cosine` distance
+Also supports `inner_product`, `cosine`, `taxicab`, `hamming`, and `jaccard` distance
 Get the nearest neighbors to a vector
@@ -101,6 +118,14 @@ Get the nearest neighbors to a vector
 Item.nearest_neighbors(:embedding, [1, 1, 1], distance: "euclidean").limit(5)
 ```
+Add an approximate index
+```ruby
+DB.add_index :items, :embedding, type: "hnsw", opclass: "vector_l2_ops"
+```
+Use `vector_ip_ops` for inner product and `vector_cosine_ops` for cosine distance
 ## History
 View the [changelog](https://github.com/pgvector/pgvector-ruby/blob/master/CHANGELOG.md)

data/lib/pgvector/bit.rb ADDED Viewed

@@ -0,0 +1,28 @@
+module Pgvector
+  class Bit
+    def initialize(data)
+      if data.is_a?(Array)
+        @data = data.map { |v| v ? "1" : "0" }.join
+      else
+        @data = data.to_str
+      end
+    end
+    def self.from_text(string)
+      Bit.new(string)
+    end
+    def self.from_binary(string)
+      length = string[..3].unpack1("l>")
+      Bit.new(string[4..].unpack("B*").join[...length])
+    end
+    def to_s
+      @data
+    end
+    def to_a
+      @data.each_char.map { |v| v != "0" }
+    end
+  end
+end

data/lib/pgvector/half_vector.rb ADDED Viewed

@@ -0,0 +1,19 @@
+module Pgvector
+  class HalfVector
+    def initialize(data)
+      @data = data.to_a.map(&:to_f)
+    end
+    def self.from_text(string)
+      new(string[1..-2].split(",").map(&:to_f))
+    end
+    def to_s
+      "[#{@data.to_a.map(&:to_f).join(",")}]"
+    end
+    def to_a
+      @data
+    end
+  end
+end

data/lib/pgvector/pg.rb CHANGED Viewed

@@ -5,14 +5,33 @@ module Pgvector
     def self.register_vector(registry)
       registry.register_type(0, "vector", nil, TextDecoder::Vector)
       registry.register_type(1, "vector", nil, BinaryDecoder::Vector)
+      # no binary decoder for halfvec since unpack does not have directive for half-precision
+      registry.register_type(0, "halfvec", nil, TextDecoder::Halfvec)
+      registry.register_type(0, "bit", nil, TextDecoder::Bit)
+      registry.register_type(1, "bit", nil, BinaryDecoder::Bit)
+      registry.register_type(0, "sparsevec", nil, TextDecoder::Sparsevec)
+      registry.register_type(1, "sparsevec", nil, BinaryDecoder::Sparsevec)
     end
     module BinaryDecoder
       class Vector < ::PG::SimpleDecoder
         def decode(string, tuple = nil, field = nil)
-          dim, unused = string[0, 4].unpack("nn")
-          raise "expected unused to be 0" if unused != 0
-          string[4..-1].unpack("g#{dim}")
+          ::Pgvector::Vector.from_binary(string).to_a
+        end
+      end
+      class Bit < ::PG::SimpleDecoder
+        def decode(string, tuple = nil, field = nil)
+          ::Pgvector::Bit.from_binary(string).to_s
+        end
+      end
+      class Sparsevec < ::PG::SimpleDecoder
+        def decode(string, tuple = nil, field = nil)
+          SparseVector.from_binary(string)
         end
       end
     end
@@ -20,7 +39,25 @@ module Pgvector
     module TextDecoder
       class Vector < ::PG::SimpleDecoder
         def decode(string, tuple = nil, field = nil)
-          Pgvector.decode(string)
+          ::Pgvector::Vector.from_text(string).to_a
+        end
+      end
+      class Halfvec < ::PG::SimpleDecoder
+        def decode(string, tuple = nil, field = nil)
+          HalfVector.from_text(string).to_a
+        end
+      end
+      class Bit < ::PG::SimpleDecoder
+        def decode(string, tuple = nil, field = nil)
+          ::Pgvector::Bit.from_text(string).to_s
+        end
+      end
+      class Sparsevec < ::PG::SimpleDecoder
+        def decode(string, tuple = nil, field = nil)
+          SparseVector.from_text(string)
         end
       end
     end

data/lib/pgvector/sparse_vector.rb ADDED Viewed

@@ -0,0 +1,87 @@
+module Pgvector
+  class SparseVector
+    attr_reader :dimensions, :indices, :values
+    NO_DEFAULT = Object.new
+    def initialize(value, dimensions = NO_DEFAULT)
+      if value.is_a?(Hash)
+        if dimensions == NO_DEFAULT
+          raise ArgumentError, "missing dimensions"
+        end
+        from_hash(value, dimensions)
+      else
+        unless dimensions == NO_DEFAULT
+          raise ArgumentError, "extra argument"
+        end
+        from_array(value)
+      end
+    end
+    def to_s
+      "{#{@indices.zip(@values).map { |i, v| "#{i.to_i + 1}:#{v.to_f}" }.join(",")}}/#{@dimensions.to_i}"
+    end
+    def to_a
+      arr = Array.new(dimensions, 0.0)
+      @indices.zip(@values) do |i, v|
+        arr[i] = v
+      end
+      arr
+    end
+    private
+    def from_hash(data, dimensions)
+      elements = data.select { |_, v| v != 0 }.sort
+      @dimensions = dimensions.to_i
+      @indices = elements.map { |v| v[0].to_i }
+      @values = elements.map { |v| v[1].to_f }
+    end
+    def from_array(arr)
+      arr = arr.to_a
+      @dimensions = arr.size
+      @indices = []
+      @values = []
+      arr.each_with_index do |v, i|
+        if v != 0
+          @indices << i
+          @values << v.to_f
+        end
+      end
+    end
+    class << self
+      def from_text(string)
+        elements, dimensions = string.split("/", 2)
+        indices = []
+        values = []
+        elements[1..-2].split(",").each do |e|
+          index, value = e.split(":", 2)
+          indices << index.to_i - 1
+          values << value.to_f
+        end
+        from_parts(dimensions.to_i, indices, values)
+      end
+      def from_binary(string)
+        dim, nnz, unused = string[0, 12].unpack("l>l>l>")
+        raise "expected unused to be 0" if unused != 0
+        indices = string[12, nnz * 4].unpack("l>#{nnz}")
+        values = string[(12 + nnz * 4)..-1].unpack("g#{nnz}")
+        from_parts(dim, indices, values)
+      end
+      private
+      def from_parts(dimensions, indices, values)
+        vec = allocate
+        vec.instance_variable_set(:@dimensions, dimensions)
+        vec.instance_variable_set(:@indices, indices)
+        vec.instance_variable_set(:@values, values)
+        vec
+      end
+    end
+  end
+end

data/lib/pgvector/vector.rb ADDED Viewed

@@ -0,0 +1,25 @@
+module Pgvector
+  class Vector
+    def initialize(data)
+      @data = data.to_a.map(&:to_f)
+    end
+    def self.from_text(string)
+      Vector.new(string[1..-2].split(",").map(&:to_f))
+    end
+    def self.from_binary(string)
+      dim, unused = string[0, 4].unpack("nn")
+      raise "expected unused to be 0" if unused != 0
+      Vector.new(string[4..-1].unpack("g#{dim}"))
+    end
+    def to_s
+      "[#{@data.to_a.map(&:to_f).join(",")}]"
+    end
+    def to_a
+      @data
+    end
+  end
+end

data/lib/pgvector/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Pgvector
-  VERSION = "0.2.2"
+  VERSION = "0.3.1"
 end

data/lib/pgvector.rb CHANGED Viewed

@@ -1,14 +1,28 @@
 # modules
+require_relative "pgvector/bit"
+require_relative "pgvector/half_vector"
+require_relative "pgvector/sparse_vector"
+require_relative "pgvector/vector"
 require_relative "pgvector/version"
 module Pgvector
   autoload :PG, "pgvector/pg"
   def self.encode(data)
-    "[#{data.to_a.map(&:to_f).join(",")}]"
+    if data.is_a?(Vector) || data.is_a?(HalfVector) || data.is_a?(SparseVector)
+      data.to_s
+    else
+      Vector.new(data).to_s
+    end
   end
   def self.decode(string)
-    string[1..-2].split(",").map(&:to_f)
+    if string[0] == "["
+      Vector.from_text(string).to_a
+    elsif string[0] == "{"
+      SparseVector.from_text(string)
+    else
+      string
+    end
   end
 end

data/lib/sequel/extensions/pgvector.rb ADDED Viewed

@@ -0,0 +1,5 @@
+require_relative "../plugins/pgvector"
+module Sequel
+  Dataset.register_extension(:pgvector, Plugins::Pgvector::DatasetMethods)
+end

data/lib/sequel/plugins/pgvector.rb CHANGED Viewed

@@ -22,6 +22,12 @@ module Sequel
               "<=>"
             when "euclidean"
               "<->"
+            when "taxicab"
+              "<+>"
+            when "hamming"
+              "<~>"
+            when "jaccard"
+              "<%>"
             end
           raise ArgumentError, "Invalid distance: #{distance}" unless operator

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: pgvector
 version: !ruby/object:Gem::Version
-  version: 0.2.2
+  version: 0.3.1
 platform: ruby
 authors:
 - Andrew Kane
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-10-04 00:00:00.000000000 Z
+date: 2024-07-11 00:00:00.000000000 Z
 dependencies: []
 description:
 email: andrew@ankane.org
@@ -20,8 +20,13 @@ files:
 - LICENSE.txt
 - README.md
 - lib/pgvector.rb
+- lib/pgvector/bit.rb
+- lib/pgvector/half_vector.rb
 - lib/pgvector/pg.rb
+- lib/pgvector/sparse_vector.rb
+- lib/pgvector/vector.rb
 - lib/pgvector/version.rb
+- lib/sequel/extensions/pgvector.rb
 - lib/sequel/plugins/pgvector.rb
 homepage: https://github.com/pgvector/pgvector-ruby
 licenses:
@@ -35,14 +40,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '3'
+      version: '3.1'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.4.10
+rubygems_version: 3.5.11
 signing_key:
 specification_version: 4
 summary: pgvector support for Ruby