RubyGems - torch-rb - Versions diffs - 0.8.1 → 0.8.2 - Mend

torch-rb 0.8.1 → 0.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +5 -0
data/README.md +21 -40
data/codegen/generate_functions.rb +2 -1
data/lib/torch/nn/convnd.rb +2 -0
data/lib/torch/nn/functional_attention.rb +241 -0
data/lib/torch/nn/module.rb +2 -0
data/lib/torch/nn/module_list.rb +49 -0
data/lib/torch/nn/multihead_attention.rb +123 -0
data/lib/torch/nn/transformer.rb +92 -0
data/lib/torch/nn/transformer_decoder.rb +25 -0
data/lib/torch/nn/transformer_decoder_layer.rb +43 -0
data/lib/torch/nn/transformer_encoder.rb +25 -0
data/lib/torch/nn/transformer_encoder_layer.rb +36 -0
data/lib/torch/nn/utils.rb +16 -0
data/lib/torch/tensor.rb +2 -0
data/lib/torch/utils/data/data_loader.rb +2 -0
data/lib/torch/version.rb +1 -1
data/lib/torch.rb +6 -0
metadata +11 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d52cf2bf4770e9166623614f6071e5180e9492b0063757be8bdec73a2c930b38
-  data.tar.gz: 1b650a3277d1aebe28cdd5d75ce54420feba7a1c9a2046335c9271c72eb3f74f
+  metadata.gz: '05811faa93ab089485bfa213362bfed0462227e6964e726bf1c3f9fc0cdba0c3'
+  data.tar.gz: 8d25304063db51850e535e71c2b4fe643e038a18d046b8f27ee50269e5e9695d
 SHA512:
-  metadata.gz: 3afad67c5ca6cedc4925dab1aadc37541daac6752a8b508a8d4bd6cd3b25b71600ed32625a65651bf150d6ddd519ec86ca7b9ccd2f12022366ed692603f65c1a
-  data.tar.gz: 970fa451044ce68d60e13f2da297ea8e20e53f01a1ffee31f152f190baf5d8f709f9be3defcd6b24739499cb400c5448abd43939a451be826e87b2e62eba3cac
+  metadata.gz: 8cea906b03b37ec848be7b1c7cfa6bfb0fde4ef7ed384818bcc85826d04611621835bbeecad0fd31c94b497bb527a21c107901a9d991692b6fffa6bb24c23c38
+  data.tar.gz: 2ded65d614d274afe61e061898268172e6dec85dc28e3094481c2650713a129c0574b7ba17dbafccd8b8dc7ee064602fd261ce5034cec4a184fa32f2965eb476

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,8 @@
+## 0.8.2 (2021-10-03)
+- Added transformers
+- Added left shift and right shift
 ## 0.8.1 (2021-06-15)
 - Added `Backends` module

data/README.md CHANGED Viewed

@@ -28,15 +28,19 @@ It can take a few minutes to compile the extension.
 ## Getting Started
-Deep learning is significantly faster with a GPU. If you don’t have an NVIDIA GPU, we recommend using a cloud service. [Paperspace](https://www.paperspace.com/) has a great free plan.
+A good place to start is [Deep Learning with Torch.rb: A 60 Minute Blitz](tutorials/blitz/README.md).
-We’ve put together a [Docker image](https://github.com/ankane/ml-stack) to make it easy to get started. On Paperspace, create a notebook with a custom container. Under advanced options, set the container name to:
+## Tutorials
-```text
-ankane/ml-stack:torch-gpu
-```
+- [Transfer learning](tutorials/transfer_learning/README.md)
+- [Sequence models](tutorials/nlp/sequence_models.md)
+- [Word embeddings](tutorials/nlp/word_embeddings.md)
-And leave the other fields in that section blank. Once the notebook is running, you can run the [MNIST example](https://github.com/ankane/ml-stack/blob/master/torch-gpu/MNIST.ipynb).
+## Examples
+- [Image classification with MNIST](examples/mnist) ([日本語版](https://qiita.com/kojix2/items/c19c36dc1bf73ea93409))
+- [Collaborative filtering with MovieLens](examples/movielens)
+- [Generative adversarial networks](examples/gan)
 ## API
@@ -48,7 +52,7 @@ This library follows the [PyTorch API](https://pytorch.org/docs/stable/torch.htm
 You can follow PyTorch tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
-## Tutorial
+## Overview
 Some examples below are from [Deep Learning with PyTorch: A 60 Minutes Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
@@ -214,7 +218,7 @@ Define a neural network
 ```ruby
 class MyNet < Torch::NN::Module
   def initialize
-    super
+    super()
     @conv1 = Torch::NN::Conv2d.new(1, 6, 3)
     @conv2 = Torch::NN::Conv2d.new(6, 16, 3)
     @fc1 = Torch::NN::Linear.new(16 * 6 * 6, 120)
@@ -225,20 +229,10 @@ class MyNet < Torch::NN::Module
   def forward(x)
     x = Torch::NN::F.max_pool2d(Torch::NN::F.relu(@conv1.call(x)), [2, 2])
     x = Torch::NN::F.max_pool2d(Torch::NN::F.relu(@conv2.call(x)), 2)
-    x = x.view(-1, num_flat_features(x))
+    x = Torch.flatten(x, 1)
     x = Torch::NN::F.relu(@fc1.call(x))
     x = Torch::NN::F.relu(@fc2.call(x))
-    x = @fc3.call(x)
-    x
-  end
-  def num_flat_features(x)
-    size = x.size[1..-1]
-    num_features = 1
-    size.each do |s|
-      num_features *= s
-    end
-    num_features
+    @fc3.call(x)
   end
 end
 ```
@@ -402,19 +396,9 @@ Here’s a list of functions to create tensors (descriptions from the [C++ docs]
   Torch.zeros(3) # tensor([0, 0, 0])
   ```
-## Examples
-Here are a few full examples:
-- [Image classification with MNIST](examples/mnist) ([日本語版](https://qiita.com/kojix2/items/c19c36dc1bf73ea93409))
-- [Collaborative filtering with MovieLens](examples/movielens)
-- [Sequence models and word embeddings](examples/nlp)
-- [Generative adversarial networks](examples/gan)
-- [Transfer learning](examples/transfer-learning)
 ## LibTorch Installation
-[Download LibTorch](https://pytorch.org/). For Linux, use the `cxx11 ABI` version. Then run:
+[Download LibTorch](https://pytorch.org/) (for Linux, use the `cxx11 ABI` version). Then run:
 ```sh
 bundle config build.torch-rb --with-torch-dir=/path/to/libtorch
@@ -444,9 +428,7 @@ Then install the gem (no need for `bundle config`).
 ## Performance
-### Linux
-Deep learning is significantly faster on a GPU. Install [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn) and reinstall the gem.
+Deep learning is significantly faster on a GPU. With Linux, install [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn) and reinstall the gem.
 Check if CUDA is available
@@ -460,15 +442,14 @@ Move a neural network to a GPU
 net.cuda
 ```
-## rbenv
+If you don’t have a GPU that supports CUDA, we recommend using a cloud service. [Paperspace](https://www.paperspace.com/) has a great free plan. We’ve put together a [Docker image](https://github.com/ankane/ml-stack) to make it easy to get started. On Paperspace, create a notebook with a custom container. Under advanced options, set the container name to:
-This library uses [Rice](https://github.com/jasonroelofs/rice) to interface with LibTorch. Rice and earlier versions of rbenv don’t play nicely together. If you encounter an error during installation, upgrade ruby-build and reinstall your Ruby version.
-```sh
-brew upgrade ruby-build
-rbenv install [version]
+```text
+ankane/ml-stack:torch-gpu
 ```
+And leave the other fields in that section blank. Once the notebook is running, you can run the [MNIST example](https://github.com/ankane/ml-stack/blob/master/torch-gpu/MNIST.ipynb).
 ## History
 View the [changelog](https://github.com/ankane/torch.rb/blob/master/CHANGELOG.md)

data/codegen/generate_functions.rb CHANGED Viewed

@@ -23,7 +23,7 @@ end
 def skip_functions(functions)
   functions.reject do |f|
-    f.base_name.start_with?("_") ||
+    (f.base_name.start_with?("_") && f.base_name != "__lshift__" && f.base_name != "__rshift__") ||
     f.base_name.include?("_backward") ||
     f.base_name.include?("_forward") ||
     f.base_name == "to" ||
@@ -133,6 +133,7 @@ def generate_attach_def(name, type, def_method)
   ruby_name = ruby_name.sub(/\Afft_/, "") if type == "fft"
   ruby_name = ruby_name.sub(/\Alinalg_/, "") if type == "linalg"
   ruby_name = ruby_name.sub(/\Aspecial_/, "") if type == "special"
+  ruby_name = name if name.start_with?("__")
   # cast for Ruby < 2.7 https://github.com/thisMagpie/fftw/issues/22#issuecomment-49508900
   cast = RUBY_VERSION.to_f > 2.7 ? "" : "(VALUE (*)(...)) "

data/lib/torch/nn/convnd.rb CHANGED Viewed

@@ -1,6 +1,8 @@
 module Torch
   module NN
     class ConvNd < Module
+      attr_reader :in_channels, :out_channels, :kernel_size, :stride, :padding, :dilation, :transposed, :output_paddding, :groups, :padding_mode
       def initialize(in_channels, out_channels, kernel_size, stride, padding, dilation, transposed, output_padding, groups, bias, padding_mode)
         super()
         raise ArgumentError, "in_channels must be divisible by groups" if in_channels % groups != 0

data/lib/torch/nn/functional_attention.rb ADDED Viewed

@@ -0,0 +1,241 @@
+module Torch
+  module NN
+    class Functional
+      class << self
+        def in_projection_packed(q, k, v, w, b: nil)
+          e = q.size(-1)
+          if k.eql? v
+            if q.eql? k
+              # self-attention
+              return linear(q, w, b).chunk(3, dim: -1)
+            else
+              # encoder-decoder attention
+              w_q, w_kv = w.split_with_sizes([e, e * 2])
+              if b.nil?
+                b_q = b_kv = nil
+              else
+                b_q, b_kv = b.split_with_sizes([e, e * 2])
+              end
+              return [linear(q, w_q, b_q), *linear(k, w_kv, b_kv).chunk(2, dim: -1)]
+            end
+          else
+            w_q, w_k, w_v = w.chunk(3)
+            if b.nil?
+              b_q = b_k = b_v = nil
+            else
+              b_q, b_k, b_v = b.chunk(3)
+            end
+            return [linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)]
+          end
+        end
+        def in_projection(
+          q, k, v,
+          w_q, w_k, w_v,
+          b_q: nil, b_k: nil, b_v: nil
+        )
+          e_q, e_k, e_v = q.size(-1), k.size(-1), v.size(-1)
+          raise ArgumentError, "Expecting query weights shape of #{[e_q, e_q]}, but got #{w_q.shape}" unless w_q.shape == [e_q, e_q]
+          raise ArgumentError, "Expecting key weights shape of #{[e_k, e_k]}, but got #{w_k.shape}" unless w_k.shape == [e_k, e_k]
+          raise ArgumentError, "Expecting value weights shape of #{[e_v, e_v]}, but got #{w_v.shape}" unless w_v.shape == [e_v, e_v]
+          raise ArgumentError, "Expecting query bias shape of #{[e_q]}, but got #{b_q.shape}" if b_q && b_q.shape != [e_q]
+          raise ArgumentError, "Expecting key bias shape of #{[e_k]}, but got #{b_k.shape}" if b_k && b_k.shape != [e_k]
+          raise ArgumentError, "Expecting value bias shape of #{[e_v]}, but got #{b_v.shape}" if b_v && b_v.shape != [e_v]
+          [linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)]
+        end
+        def scaled_dot_product_attention(
+          q, k, v,
+          attn_mask: nil, dropout_p: 0.0
+        )
+          b, nt, e = q.shape
+          q = q / Math.sqrt(e)
+          attn = Torch.bmm(q, k.transpose(-2, -1))
+          attn += attn_mask if attn_mask
+          attn = softmax(attn, dim: -1)
+          attn = dropout(attn, p: dropout_p) if dropout_p > 0
+          output = Torch.bmm(attn, v)
+          [output, attn]
+        end
+        def multi_head_attention_forward(
+          query, key, value,
+          embed_dim_to_check, num_heads,
+          in_proj_weight, in_proj_bias,
+          bias_k, bias_v,
+          add_zero_attn,
+          dropout_p,
+          out_proj_weight, out_proj_bias,
+          training: true,
+          key_padding_mask: nil,
+          need_weights: true,
+          attn_mask: nil,
+          use_separate_proj_weight: false,
+          q_proj_weight: nil, k_proj_weight: nil, v_proj_weight: nil,
+          static_k: nil, static_v: nil
+        )
+          tgt_len, bsz, embed_dim = query.shape
+          src_len = key.shape.first
+          raise ArgumentError, "Was expecting embedding dimension of #{embed_dim_to_check}, but got #{embed_dim}" unless embed_dim == embed_dim_to_check
+          head_dim = if embed_dim.is_a?(Torch::Tensor)
+            embed_dim.div(num_heads, rounding_mode: 'trunc')
+          else
+            head_dim = embed_dim.div num_heads
+          end
+          if use_separate_proj_weight
+            raise ArgumentError, "Key's sequence and batch dims #{key.shape[0...2]} do not match value's #{value.shape[0...2]}" unless key.shape[0...2] == value.shape[0...2]
+          else
+            raise ArgumentError, "Key shape #{key.shape} does not match value shape #{value.shape}" unless key.shape == value.shape
+          end
+          # compute in-projection
+          q, k, v =
+            if use_separate_proj_weight
+              raise ArgumentError, "use_separate_proj_weight is true but q_proj_weight is nil" unless q_proj_weight
+              raise ArgumentError, "use_separate_proj_weight is true but k_proj_weight is nil" unless k_proj_weight
+              raise ArgumentError, "use_separate_proj_weight is true but v_proj_weight is nil" unless v_proj_weight
+              if in_proj_bias
+                b_q, b_k, b_v = in_proj_bias.chunk(3)
+              else
+                b_q = b_k = b_v = nil
+              end
+              in_projection(query, key, value, q_proj_weight, k_proj_weight, v_proj_weight, b_q: b_q, b_k: b_k, b_v: b_v)
+            else
+              in_projection_packed(query, key, value, in_proj_weight, b: in_proj_bias)
+            end
+          # prep attention mask
+          if attn_mask
+            if attn_mask.dtype == :uint8
+              puts "[WARN] Byte tensor for attn_mask in Multihead Attention is deprecated. Use bool tensor instead."
+              attn_mask = attn_mask.bool
+            else
+              raise ArgumentError, "Only float, byte, and bool types are supported for attn_mask, not #{attn_mask.dtype}" unless attn_mask.floating_point? || attn_mask.dtype == :bool
+            end
+            if attn_mask.dim == 2
+              correct_2d_size = [tgt_len, src_len]
+              raise ArgumentError, "The shape of the 2D attn_mask is #{attn_mask.shape}, but should be #{correct_2d_size}." unless attn_mask.shape == correct_2d_size
+              attn_mask = attn_mask.unsqueeze(0)
+            elsif attn_mask.dim == 3
+              correct_3d_size = [bsz * num_heads, tgt_len, src_len]
+              raise ArgumentError, "The shape of the 3D attn_mask is #{attn_mask.shape}, but should be #{correct_3d_size}." unless attn_mask.shape == correct_3d_size
+            else
+              raise ArgumentError, "attn_mask's dimension #{attn_mask.dim} is not supported"
+            end
+          end
+          # prep key padding mask
+          if key_padding_mask && key_padding_mask.dtype == :uint8
+            puts "[WARN] Byte tensor for key_padding_mask in Multihead Attention is deprecated. Use bool tensor instead."
+            key_padding_mask = key_padding_mask.bool
+          end
+          # add bias along batch dimension (currently second)
+          if bias_k && bias_v
+            raise ArgumentError, "bias cannot be added to static key." if static_k
+            raise ArgumentError, "bias cannot be added to static value." if static_v
+            k = Torch.cat([k, bias_k.repeat(1, bsz, 1)])
+            v = Torch.cat([v, bias_v.repeat(1, bsz, 1)])
+            attn_mask = pad(attn_mask, [0, 1]) if attn_mask
+            key_padding_mask = pad(key_padding_mask, [0, 1]) if key_padding_mask
+          else
+            raise ArgumentError unless bias_k.nil?
+            raise ArgumentError unless bias_v.nil?
+          end
+          # reshape q, k, v for multihead attention and make em batch first
+          q = q.contiguous.view(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)
+          if static_k.nil?
+            k = k.contiguous.view(-1, bsz * num_heads, head_dim).transpose(0, 1)
+          else
+            raise ArgumentError, "Expecting static_k.size(0) of #{bsz * num_heads}, but got #{static_k.size(0)}" unless static_k.size(0) == bsz * num_heads
+            raise ArgumentError, "Expecting static_k.size(2) of #{head_dim}, but got #{static_k.size(2)}" unless static_k.size(2) == head_dim
+            k = static_k
+          end
+          if static_v.nil?
+            v = v.contiguous.view(-1, bsz * num_heads, head_dim).transpose(0, 1)
+          else
+            raise ArgumentError, "Expecting static_v.size(0) of #{bsz * num_heads}, but got #{static_v.size(0)}" unless static_v.size(0) == bsz * num_heads
+            raise ArgumentError, "Expecting static_v.size(2) of #{head_dim}, but got #{static_v.size(2)}" unless static_v.size(2) == head_dim
+            v = static_v
+          end
+          # add zero attention along batch dimension (now first)
+          if add_zero_attn
+            zero_attn_shape = [bsz * num_heads, 1, head_dim]
+            k = Torch.cat([k, Torch.zeros(zero_attn_shape, dtype: k.dtype, device: k.device)], dim: 1)
+            v = Torch.cat([v, Torch.zeros(zero_attn_shape, dtype: v.dtype, device: v.device)], dim: 1)
+            attn_mask = pad(attn_mask, [0, 1]) if attn_mask
+            key_padding_mask = pad(key_padding_mask, [0, 1]) if key_padding_mask
+          end
+          # update source sequence length after adjustments
+          src_len = k.size(1)
+          # merge key padding and attention masks
+          if key_padding_mask
+            raise ArgumentError, "Expecting key_padding_mask shape of #{[bsz, src_len]}, but got #{key_padding_mask.shape}" unless key_padding_mask.shape == [bsz, src_len]
+            key_padding_mask = key_padding_mask.view(bsz, 1, 1, src_len).expand(-1, num_heads, -1, -1).reshape(bsz * num_heads, 1, src_len)
+            attn_mask = if attn_mask.nil?
+              key_padding_mask
+            elsif attn_mask.dtype == :bool
+              attn_mask.logical_or(key_padding_mask)
+            else
+              attn_mask.masked_fill(key_padding_mask, -Float::INFINITY)
+            end
+          end
+          # convert mask to float
+          if attn_mask && attn_mask.dtype == :bool
+            new_attn_mask = Torch.zeros_like(attn_mask, dtype: :float32)
+            attn_mask = new_attn_mask.masked_fill(attn_mask, -Float::INFINITY)
+          end
+          dropout_p = 0.0 unless training
+          # (deep breath) calculate attention and out projection
+          attn_output, attn_output_weights = scaled_dot_product_attention(q, k, v, attn_mask: attn_mask, dropout_p: dropout_p)
+          attn_output = attn_output.transpose(0, 1).contiguous.view(tgt_len, bsz, embed_dim)
+          attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
+          if need_weights
+            # average attention weights over heads
+            attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
+            [attn_output, attn_output_weights.sum(dim: 1) / num_heads]
+          else
+            [attn_output, nil]
+          end
+        end
+      end
+    end
+  end
+end

data/lib/torch/nn/module.rb CHANGED Viewed

@@ -3,6 +3,8 @@ module Torch
     class Module
       include Utils
+      attr_reader :training
       def initialize
         @training = true
         @parameters = {}

data/lib/torch/nn/module_list.rb ADDED Viewed

@@ -0,0 +1,49 @@
+module Torch
+  module NN
+    class ModuleList < Module
+      include Enumerable
+      def initialize(mods = nil)
+        super()
+        self.concat(mods) if mods
+      end
+      def length
+        @modules.length
+      end
+      alias_method :count, :length
+      alias_method :size, :length
+      def concat(mods)
+        raise ArgumentError, "Modules should respond to #each" unless mods.respond_to?(:each)
+        mods.each { |m| append m }
+        self
+      end
+      def each(&block)
+        if block_given?
+          @modules.values.each(&block)
+        else
+          to_enum(:each)
+        end
+      end
+      def append(mod)
+        raise ArgumentError, "Provided element is not a module" unless mod.is_a?(Module)
+        add_module(length.to_s, mod)
+        self
+      end
+      def [](idx)
+        if idx.is_a?(Range)
+          self.class.new(@modules.values[idx])
+        else
+          @modules[idx.to_s]
+        end
+      end
+    end
+  end
+end

data/lib/torch/nn/multihead_attention.rb ADDED Viewed

@@ -0,0 +1,123 @@
+module Torch
+  module NN
+    class MultiheadAttention < Module
+      def initialize(
+        embed_dim, num_heads,
+        dropout: 0.0, bias: true, add_bias_kv: false, add_zero_attn: false,
+        kdim: nil, vdim: nil, batch_first: false, device: nil, dtype: nil
+      )
+        super()
+        @embed_dim = embed_dim
+        @kdim = kdim || @embed_dim
+        @vdim = vdim || @embed_dim
+        @qkv_same_embed_dim = @kdim == @embed_dim && @vdim == @embed_dim
+        @num_heads = num_heads
+        @dropout = dropout
+        @batch_first = batch_first
+        @head_dim = @embed_dim.div @num_heads
+        raise ArgumentError, "embed_dim must be divisible by num_heads" unless @head_dim * @num_heads == @embed_dim
+        if @qkv_same_embed_dim
+          @in_proj_weight = Parameter.new(Torch.empty([3 * @embed_dim, @embed_dim]))
+          %w(q k v).each { |x| register_parameter("#{x}_proj_weight", nil) }
+        else
+          @q_proj_weight = Parameter.new(Torch.empty([@embed_dim, @embed_dim]))
+          @k_proj_weight = Parameter.new(Torch.empty([@embed_dim, @kdim]))
+          @v_proj_weight = Parameter.new(Torch.empty([@embed_dim, @vdim]))
+          register_parameter('in_proj_weight', nil)
+        end
+        if bias
+          @in_proj_bias = Parameter.new(Torch.empty(3 * @embed_dim))
+        else
+          register_parameter('in_proj_bias', nil)
+        end
+        @out_proj = Linear.new(@embed_dim, @embed_dim, bias: bias)
+        if add_bias_kv
+          @bias_k = Parameter.new(Torch.empty([1, 1, @embed_dim]))
+          @bias_v = Parameter.new(Torch.empty([1, 1, @embed_dim]))
+        else
+          @bias_k = @bias_v = nil
+        end
+        @add_zero_attn = add_zero_attn
+        reset_parameters
+      end
+      def batch_first?
+        !!@batch_first
+      end
+      def reset_parameters
+        if @qkv_same_embed_dim
+          Init.xavier_uniform!(@in_proj_weight)
+        else
+          Init.xavier_uniform!(@q_proj_weight)
+          Init.xavier_uniform!(@k_proj_weight)
+          Init.xavier_uniform!(@v_proj_weight)
+        end
+        if @in_proj_bias
+          Init.constant!(@in_proj_bias, 0.0)
+          Init.constant!(@out_proj.bias, 0.0)
+        end
+        Init.xavier_uniform!(@bias_k) if @bias_k
+        Init.xavier_uniform!(@bias_v) if @bias_v
+      end
+      def forward(
+        query, key, value,
+        key_padding_mask: nil, need_weights: true, attn_mask: nil
+      )
+        if batch_first?
+          query, key, value = [query, key, value].map { |t| t.transpose(1, 0) }
+        end
+        attn_output, attn_output_weights =
+          if @qkv_same_embed_dim
+            F.multi_head_attention_forward(
+              query, key, value,
+              @embed_dim, @num_heads,
+              @in_proj_weight, @in_proj_bias,
+              @bias_k, @bias_v, @add_zero_attn,
+              @dropout, @out_proj.weight, @out_proj.bias,
+              training: @training,
+              key_padding_mask: key_padding_mask,
+              need_weights: need_weights,
+              attn_mask: attn_mask
+            )
+          else
+            F.multi_head_attention_forward(
+              query, key, value,
+              @embed_dim, @num_heads,
+              @in_proj_weight, @in_proj_bias,
+              @bias_k, @bias_v, @add_zero_attn,
+              @dropout, @out_proj.weight, @out_proj.bias,
+              training: @training,
+              key_padding_mask: key_padding_mask,
+              need_weights: need_weights,
+              attn_mask: attn_mask,
+              use_separate_proj_weight: true,
+              q_proj_weight: @q_proj_weight, k_proj_weight: @k_proj_weight, v_proj_weight: @v_proj_weight
+            )
+          end
+        attn_output = attn_output.transpose(1, 0) if batch_first?
+        [attn_output, attn_output_weights]
+      end
+    end
+  end
+end

data/lib/torch/nn/transformer.rb ADDED Viewed

@@ -0,0 +1,92 @@
+require_relative 'transformer_encoder_layer'
+require_relative 'transformer_encoder'
+require_relative 'transformer_decoder_layer'
+require_relative 'transformer_decoder'
+module Torch
+  module NN
+    class Transformer < Module
+      def initialize(
+        d_model: 512, nhead: 8,
+        num_encoder_layers: 6, num_decoder_layers: 6,
+        dim_feedforward: 2048, dropout: 0.1, activation: :relu,
+        custom_encoder: nil, custom_decoder: nil,
+        layer_norm_eps: 1e-5, batch_first: false
+      )
+        super()
+        @encoder =
+          if custom_encoder
+            custom_encoder
+          else
+            encoder_layer = TransformerEncoderLayer.new(
+              d_model, nhead,
+              dim_feedforward: dim_feedforward, dropout: dropout, activation: activation,
+              layer_norm_eps: layer_norm_eps, batch_first: batch_first
+            )
+            encoder_norm = LayerNorm.new(d_model, eps: layer_norm_eps)
+            TransformerEncoder.new(encoder_layer, num_encoder_layers, norm: encoder_norm)
+          end
+        @decoder =
+          if custom_decoder
+            custom_decoder
+          else
+            decoder_layer = TransformerDecoderLayer.new(
+              d_model, nhead,
+              dim_feedforward: dim_feedforward, dropout: dropout, activation: activation,
+              layer_norm_eps: layer_norm_eps, batch_first: batch_first
+            )
+            decoder_norm = LayerNorm.new(d_model, eps: layer_norm_eps)
+            TransformerDecoder.new(decoder_layer, num_decoder_layers, norm: decoder_norm)
+          end
+        reset_parameters
+        @d_model = d_model
+        @nhead = nhead
+        @batch_first = batch_first
+      end
+      attr_reader :d_model, :nhead, :encoder, :decoder
+      def batch_first?
+        !!@batch_first
+      end
+      def reset_parameters
+        parameters.each { |p| Init.xavier_uniform!(p) if p.dim > 1 }
+      end
+      def forward(
+        src, tgt,
+        src_mask: nil, tgt_mask: nil, memory_mask: nil,
+        src_key_padding_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil
+      )
+        if (!batch_first? && src.size(1) != tgt.size(1)) ||
+          (batch_first? && src.size(0) != tgt.size(0))
+          raise ArgumentError, "The batch number of src and tgt must be equal"
+        end
+        if src.size(2) != d_model || tgt.size(2) != d_model
+          raise ArgumentError, "The feature number of src and tgt must be equal to d_model"
+        end
+        memory = @encoder.(src, mask: src_mask, src_key_padding_mask: src_key_padding_mask)
+        @decoder.(
+          tgt, memory,
+          tgt_mask: tgt_mask, memory_mask: memory_mask,
+          tgt_key_padding_mask: tgt_key_padding_mask, memory_key_padding_mask: memory_key_padding_mask
+        )
+      end
+      def generate_square_subsequent_mask(sz)
+        mask = Torch.triu(Torch.ones([sz, sz])).eq(1).transpose(0, 1)
+        mask.float.masked_fill!(mask.eq(0), -Float::INFINITY).masked_fill!(mask.eq(1), 0.0)
+      end
+    end
+  end
+end

data/lib/torch/nn/transformer_decoder.rb ADDED Viewed

@@ -0,0 +1,25 @@
+module Torch
+  module NN
+    class TransformerDecoder < Module
+      def initialize(decoder_layer, num_layers, norm: nil)
+        super()
+        @layers = _clones(decoder_layer, num_layers)
+        @num_layers = num_layers
+        @norm = norm
+      end
+      def forward(tgt, memory, tgt_mask: nil, memory_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil)
+        output = tgt
+        @layers.each do |mod|
+          output = mod.call(output, memory, tgt_mask: tgt_mask, memory_mask: memory_mask, tgt_key_padding_mask: tgt_key_padding_mask, memory_key_padding_mask: memory_key_padding_mask)
+        end
+        output = @norm.call(output) if @norm
+        output
+      end
+    end
+  end
+end

data/lib/torch/nn/transformer_decoder_layer.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Torch
+  module NN
+    class TransformerDecoderLayer < Module
+      def initialize(
+        d_model, n_head,
+        dim_feedforward: 2048, dropout: 0.1, activation: :relu,
+        layer_norm_eps: 1e-5, batch_first: false
+      )
+        super()
+        @self_attn = MultiheadAttention.new(d_model, n_head, dropout: dropout, batch_first: batch_first)
+        @multihead_attn = MultiheadAttention.new(d_model, n_head, dropout: dropout, batch_first: batch_first)
+        @linear1 = Linear.new(d_model, dim_feedforward)
+        @dropout = Dropout.new(p: dropout)
+        @linear2 = Linear.new(dim_feedforward, d_model)
+        @norm1 = LayerNorm.new(d_model, eps: layer_norm_eps)
+        @norm2 = LayerNorm.new(d_model, eps: layer_norm_eps)
+        @norm3 = LayerNorm.new(d_model, eps: layer_norm_eps)
+        @dropout1 = Dropout.new(p: dropout)
+        @dropout2 = Dropout.new(p: dropout)
+        @dropout3 = Dropout.new(p: dropout)
+        @activation = _activation_fn(activation)
+      end
+      def forward(tgt, memory, tgt_mask: nil, memory_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil)
+        tgt2 = @self_attn.(tgt, tgt, tgt, attn_mask: tgt_mask, key_padding_mask: tgt_key_padding_mask).first
+        tgt += @dropout1.(tgt2)
+        tgt = @norm1.(tgt)
+        tgt2 = @multihead_attn.(tgt, memory, memory, attn_mask: memory_mask, key_padding_mask: memory_key_padding_mask).first
+        tgt += @dropout2.(tgt2)
+        tgt = @norm2.(tgt)
+        tgt2 = @linear2.(@dropout.(@activation.(@linear1.(tgt))))
+        tgt += @dropout3.(tgt)
+        @norm3.(tgt)
+      end
+    end
+  end
+end

data/lib/torch/nn/transformer_encoder.rb ADDED Viewed

@@ -0,0 +1,25 @@
+module Torch
+  module NN
+    class TransformerEncoder < Module
+      def initialize(encoder_layer, num_layers, norm: nil)
+        super()
+        @layers = _clones(encoder_layer, num_layers)
+        @num_layers = num_layers
+        @norm = norm
+      end
+      def forward(src, mask: nil, src_key_padding_mask: nil)
+        output = src
+        @layers.each do |mod|
+          output = mod.call(output, src_mask: mask, src_key_padding_mask: src_key_padding_mask)
+        end
+        output = @norm.call(output) if @norm
+        output
+      end
+    end
+  end
+end

data/lib/torch/nn/transformer_encoder_layer.rb ADDED Viewed

@@ -0,0 +1,36 @@
+module Torch
+  module NN
+    class TransformerEncoderLayer < Module
+      def initialize(
+        d_model, n_head,
+        dim_feedforward: 2048, dropout: 0.1, activation: :relu,
+        layer_norm_eps: 1e-5, batch_first: false
+      )
+        super()
+        @self_attn = MultiheadAttention.new(d_model, n_head, dropout: dropout, batch_first: batch_first)
+        @linear1 = Linear.new(d_model, dim_feedforward)
+        @dropout = Dropout.new(p: dropout)
+        @linear2 = Linear.new(dim_feedforward, d_model)
+        @norm1 = LayerNorm.new(d_model, eps: layer_norm_eps)
+        @norm2 = LayerNorm.new(d_model, eps: layer_norm_eps)
+        @dropout1 = Dropout.new(p: dropout)
+        @dropout2 = Dropout.new(p: dropout)
+        @activation = _activation_fn(activation)
+      end
+      def forward(src, src_mask: nil, src_key_padding_mask: nil)
+        src2 = @self_attn.(src, src, src, attn_mask: src_mask, key_padding_mask: src_key_padding_mask).first
+        src += @dropout1.(src2)
+        src = @norm1.(src)
+        src2 = @linear2.(@dropout.(@activation.(@linear1.(src))))
+        src += @dropout2.(src2)
+        @norm2.(src)
+      end
+    end
+  end
+end

data/lib/torch/nn/utils.rb CHANGED Viewed

@@ -20,6 +20,22 @@ module Torch
       def _ntuple(n, value)
         value.is_a?(Array) ? value : [value] * n
       end
+      def _clones(mod, n)
+        state = mod.state_dict
+        layers = n.times.map do |i|
+          mod.clone.tap { |l| l.load_state_dict(state) }
+        end
+        ModuleList.new(layers)
+      end
+      def _activation_fn(activation)
+        case activation.to_sym
+        when :relu then F.method(:relu)
+        when :gelu then F.method(:gelu)
+        else raise ArgumentError, "Activation should be relu/gelu, not `#{activation}`"
+        end
+      end
     end
   end
 end

data/lib/torch/tensor.rb CHANGED Viewed

@@ -19,6 +19,8 @@ module Torch
     alias_method :&, :logical_and
     alias_method :|, :logical_or
     alias_method :^, :logical_xor
+    alias_method :<<, :__lshift__
+    alias_method :>>, :__rshift__
     def self.new(*args)
       FloatTensor.new(*args)

data/lib/torch/utils/data/data_loader.rb CHANGED Viewed

@@ -25,6 +25,8 @@ module Torch
         end
         def each
+          return to_enum(:each) unless block_given?
           # try to keep the random number generator in sync with Python
           # this makes it easy to compare results
           base_seed = Torch.empty([], dtype: :int64).random!.item

data/lib/torch/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Torch
-  VERSION = "0.8.1"
+  VERSION = "0.8.2"
 end

data/lib/torch.rb CHANGED Viewed

@@ -39,6 +39,7 @@ require "torch/nn/utils"
 # nn containers
 require "torch/nn/module"
+require "torch/nn/module_list"
 require "torch/nn/sequential"
 # nn convolution layers
@@ -143,6 +144,10 @@ require "torch/nn/softmin"
 require "torch/nn/embedding"
 require "torch/nn/embedding_bag"
+# attention is all you need
+require "torch/nn/multihead_attention"
+require "torch/nn/transformer"
 # nn distance functions
 require "torch/nn/cosine_similarity"
 require "torch/nn/pairwise_distance"
@@ -174,6 +179,7 @@ require "torch/nn/upsample"
 # nn other
 require "torch/nn/functional"
+require "torch/nn/functional_attention"
 require "torch/nn/init"
 # utils

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: torch-rb
 version: !ruby/object:Gem::Version
-  version: 0.8.1
+  version: 0.8.2
 platform: ruby
 authors:
 - Andrew Kane
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-06-16 00:00:00.000000000 Z
+date: 2021-10-04 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rice
@@ -106,6 +106,7 @@ files:
 - lib/torch/nn/feature_alpha_dropout.rb
 - lib/torch/nn/fold.rb
 - lib/torch/nn/functional.rb
+- lib/torch/nn/functional_attention.rb
 - lib/torch/nn/group_norm.rb
 - lib/torch/nn/gru.rb
 - lib/torch/nn/hardshrink.rb
@@ -139,10 +140,12 @@ files:
 - lib/torch/nn/max_unpool3d.rb
 - lib/torch/nn/max_unpoolnd.rb
 - lib/torch/nn/module.rb
+- lib/torch/nn/module_list.rb
 - lib/torch/nn/mse_loss.rb
 - lib/torch/nn/multi_label_margin_loss.rb
 - lib/torch/nn/multi_label_soft_margin_loss.rb
 - lib/torch/nn/multi_margin_loss.rb
+- lib/torch/nn/multihead_attention.rb
 - lib/torch/nn/nll_loss.rb
 - lib/torch/nn/pairwise_distance.rb
 - lib/torch/nn/parameter.rb
@@ -170,6 +173,11 @@ files:
 - lib/torch/nn/softsign.rb
 - lib/torch/nn/tanh.rb
 - lib/torch/nn/tanhshrink.rb
+- lib/torch/nn/transformer.rb
+- lib/torch/nn/transformer_decoder.rb
+- lib/torch/nn/transformer_decoder_layer.rb
+- lib/torch/nn/transformer_encoder.rb
+- lib/torch/nn/transformer_encoder_layer.rb
 - lib/torch/nn/triplet_margin_loss.rb
 - lib/torch/nn/unfold.rb
 - lib/torch/nn/upsample.rb
@@ -219,7 +227,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.2.3
+rubygems_version: 3.2.22
 signing_key:
 specification_version: 4
 summary: Deep learning for Ruby, powered by LibTorch