RubyGems - torch-rb - Versions diffs - 0.2.4 → 0.3.1 - Mend

torch-rb 0.2.4 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +29 -2
data/README.md +22 -7
data/ext/torch/ext.cpp +46 -24
data/ext/torch/extconf.rb +3 -4
data/lib/torch.rb +7 -5
data/lib/torch/hub.rb +48 -4
data/lib/torch/inspector.rb +236 -61
data/lib/torch/native/function.rb +1 -0
data/lib/torch/native/generator.rb +5 -2
data/lib/torch/native/native_functions.yaml +654 -660
data/lib/torch/native/parser.rb +1 -1
data/lib/torch/nn/conv2d.rb +0 -1
data/lib/torch/nn/module.rb +5 -2
data/lib/torch/optim/optimizer.rb +6 -4
data/lib/torch/optim/rprop.rb +0 -3
data/lib/torch/tensor.rb +69 -39
data/lib/torch/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 67c5a0cf556399dc32d73e8793e3aa794c181150f0f42dfa810c4b98a5acf6f2
-  data.tar.gz: 0a23f6a42595fb9d599962e88438b964180583ead5b9cce934cc447951b4a389
+  metadata.gz: 06e94b492acbbdb71f9e6a11081fb043a03ae0d5c704cc79faa31dd96bde70ef
+  data.tar.gz: 4f38fa52d30ef9bf121204423b4d675f21dbef806b6f137152f2cf9399ddf4bb
 SHA512:
-  metadata.gz: c0f8e9e3395d196d7ea6fa4b40d128284d768033e02f4ed7d2dc9adc985015fd0a80d601601dd97438b803b6a3bd7b81f5dbda353bb5dee4247503a24cd755d7
-  data.tar.gz: c32a22ebbe1b4dfd77324f62a72d6a128639aac0a99d4c5255b16c606e6f961ae2c8b0dbab5012a9b21faa7409511b79a50676bc8314f181c85f90433433fa8b
+  metadata.gz: 2fb2613ca629a70f55009b697b15830d59c0d8fc06c1c5102917b4870cb783427fb56ecc08889c09e15c342381385f258b2a33102dc5adddf2d463d41674994d
+  data.tar.gz: f26a6ba91caa57a92b8b047217a35c39d1e9c4c361df77e2182053b4ab490f20792fc88dba169dae87d4a3d4ee4d69e2c779efb1fa6150b4d3f0d93e3762aec9

data/CHANGELOG.md CHANGED

@@ -1,3 +1,30 @@
+## 0.3.1 (2020-08-17)
+- Added `create_graph` and `retain_graph` options to `backward` method
+- Fixed error when `set` not required
+## 0.3.0 (2020-07-29)
+- Updated LibTorch to 1.6.0
+- Removed `state_dict` method from optimizers until `load_state_dict` is implemented
+## 0.2.7 (2020-06-29)
+- Made tensors enumerable
+- Improved performance of `inspect` method
+## 0.2.6 (2020-06-29)
+- Added support for indexing with tensors
+- Added `contiguous` methods
+- Fixed named parameters for nested parameters
+## 0.2.5 (2020-06-07)
+- Added `download_url_to_file` and `load_state_dict_from_url` to `Torch::Hub`
+- Improved error messages
+- Fixed tensor slicing
 ## 0.2.4 (2020-04-29)
 - Added `to_i` and `to_f` to tensors
@@ -26,7 +53,7 @@
 ## 0.2.0 (2020-04-22)
 - No longer experimental
-- Updated libtorch to 1.5.0
+- Updated LibTorch to 1.5.0
 - Added support for GPUs and OpenMP
 - Added adaptive pooling layers
 - Tensor `dtype` is now based on Numo type for `Torch.tensor`
@@ -35,7 +62,7 @@
 ## 0.1.8 (2020-01-17)
-- Updated libtorch to 1.4.0
+- Updated LibTorch to 1.4.0
 ## 0.1.7 (2020-01-10)

data/README.md CHANGED

@@ -2,6 +2,8 @@
 :fire: Deep learning for Ruby, powered by [LibTorch](https://pytorch.org)
+For computer vision tasks, also check out [TorchVision](https://github.com/ankane/torchvision)
 [![Build Status](https://travis-ci.org/ankane/torch.rb.svg?branch=master)](https://travis-ci.org/ankane/torch.rb)
 ## Installation
@@ -22,12 +24,26 @@ It can take a few minutes to compile the extension.
 ## Getting Started
+Deep learning is significantly faster with a GPU. If you don’t have an NVIDIA GPU, we recommend using a cloud service. [Paperspace](https://www.paperspace.com/) has a great free plan.
+We’ve put together a [Docker image](https://github.com/ankane/ml-stack) to make it easy to get started. On Paperspace, create a notebook with a custom container. Set the container name to:
+```text
+ankane/ml-stack:torch-gpu
+```
+And leave the other fields in that section blank. Once the notebook is running, you can run the [MNIST example](https://github.com/ankane/ml-stack/blob/master/torch-gpu/MNIST.ipynb).
+## API
 This library follows the [PyTorch API](https://pytorch.org/docs/stable/torch.html). There are a few changes to make it more Ruby-like:
 - Methods that perform in-place modifications end with `!` instead of `_` (`add!` instead of `add_`)
 - Methods that return booleans use `?` instead of `is_`  (`tensor?` instead of `is_tensor`)
 - Numo is used instead of NumPy (`x.numo` instead of `x.numpy()`)
+You can follow PyTorch tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
 ## Tutorial
 Some examples below are from [Deep Learning with PyTorch: A 60 Minutes Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
@@ -192,7 +208,7 @@ end
 Define a neural network
 ```ruby
-class Net < Torch::NN::Module
+class MyNet < Torch::NN::Module
   def initialize
     super
     @conv1 = Torch::NN::Conv2d.new(1, 6, 3)
@@ -226,7 +242,7 @@ end
 Create an instance of it
 ```ruby
-net = Net.new
+net = MyNet.new
 input = Torch.randn(1, 1, 32, 32)
 net.call(input)
 ```
@@ -294,7 +310,7 @@ Torch.save(net.state_dict, "net.pth")
 Load a model
 ```ruby
-net = Net.new
+net = MyNet.new
 net.load_state_dict(Torch.load("net.pth"))
 net.eval
 ```
@@ -395,7 +411,8 @@ Here’s the list of compatible versions.
 Torch.rb | LibTorch
 --- | ---
-0.2.0 | 1.5.0
+0.3.0-0.3.1 | 1.6.0
+0.2.0-0.2.7 | 1.5.0-1.5.1
 0.1.8 | 1.4.0
 0.1.0-0.1.7 | 1.3.1
@@ -413,9 +430,7 @@ Then install the gem (no need for `bundle config`).
 ### Linux
-Deep learning is significantly faster on GPUs.
-Install [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn) and reinstall the gem.
+Deep learning is significantly faster on a GPU. Install [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn) and reinstall the gem.
 Check if CUDA is available

data/ext/torch/ext.cpp CHANGED

@@ -23,7 +23,7 @@ class Parameter: public torch::autograd::Variable {
     Parameter(Tensor&& t) : torch::autograd::Variable(t) { }
 };
-void handle_error(c10::Error const & ex)
+void handle_error(torch::Error const & ex)
 {
   throw Exception(rb_eRuntimeError, ex.what_without_backtrace());
 }
@@ -32,29 +32,35 @@ extern "C"
 void Init_ext()
 {
   Module rb_mTorch = define_module("Torch");
+  rb_mTorch.add_handler<torch::Error>(handle_error);
   add_torch_functions(rb_mTorch);
   Class rb_cTensor = define_class_under<torch::Tensor>(rb_mTorch, "Tensor");
+  rb_cTensor.add_handler<torch::Error>(handle_error);
   add_tensor_functions(rb_cTensor);
   Module rb_mNN = define_module_under(rb_mTorch, "NN");
+  rb_mNN.add_handler<torch::Error>(handle_error);
   add_nn_functions(rb_mNN);
   Module rb_mRandom = define_module_under(rb_mTorch, "Random")
+    .add_handler<torch::Error>(handle_error)
     .define_singleton_method(
       "initial_seed",
       *[]() {
-        return at::detail::getDefaultCPUGenerator()->current_seed();
+        return at::detail::getDefaultCPUGenerator().current_seed();
       })
     .define_singleton_method(
       "seed",
       *[]() {
         // TODO set for CUDA when available
-        return at::detail::getDefaultCPUGenerator()->seed();
+        auto generator = at::detail::getDefaultCPUGenerator();
+        return generator.seed();
       });
   // https://pytorch.org/cppdocs/api/structc10_1_1_i_value.html
   Class rb_cIValue = define_class_under<torch::IValue>(rb_mTorch, "IValue")
+    .add_handler<torch::Error>(handle_error)
     .define_constructor(Constructor<torch::IValue>())
     .define_method("bool?", &torch::IValue::isBool)
     .define_method("bool_list?", &torch::IValue::isBoolList)
@@ -317,7 +323,6 @@ void Init_ext()
       });
   rb_cTensor
-    .add_handler<c10::Error>(handle_error)
     .define_method("cuda?", &torch::Tensor::is_cuda)
     .define_method("sparse?", &torch::Tensor::is_sparse)
     .define_method("quantized?", &torch::Tensor::is_quantized)
@@ -325,6 +330,11 @@ void Init_ext()
     .define_method("numel", &torch::Tensor::numel)
     .define_method("element_size", &torch::Tensor::element_size)
     .define_method("requires_grad", &torch::Tensor::requires_grad)
+    .define_method(
+      "contiguous?",
+      *[](Tensor& self) {
+        return self.is_contiguous();
+      })
     .define_method(
       "addcmul!",
       *[](Tensor& self, Scalar value, const Tensor & tensor1, const Tensor & tensor2) {
@@ -342,8 +352,8 @@ void Init_ext()
       })
     .define_method(
       "_backward",
-      *[](Tensor& self, Object gradient) {
-        return gradient.is_nil() ? self.backward() : self.backward(from_ruby<torch::Tensor>(gradient));
+      *[](Tensor& self, OptionalTensor gradient, bool create_graph, bool retain_graph) {
+        return self.backward(gradient, create_graph, retain_graph);
       })
     .define_method(
       "grad",
@@ -374,6 +384,21 @@ void Init_ext()
         s << self.device();
         return s.str();
       })
+    .define_method(
+      "_data_str",
+      *[](Tensor& self) {
+        Tensor tensor = self;
+        // move to CPU to get data
+        if (tensor.device().type() != torch::kCPU) {
+          torch::Device device("cpu");
+          tensor = tensor.to(device);
+        }
+        auto data_ptr = (const char *) tensor.data_ptr();
+        return std::string(data_ptr, tensor.numel() * tensor.element_size());
+      })
+    // TODO figure out a better way to do this
     .define_method(
       "_flat_data",
       *[](Tensor& self) {
@@ -388,46 +413,40 @@ void Init_ext()
         Array a;
         auto dtype = tensor.dtype();
+        Tensor view = tensor.reshape({tensor.numel()});
         // TODO DRY if someone knows C++
         if (dtype == torch::kByte) {
-          uint8_t* data = tensor.data_ptr<uint8_t>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<uint8_t>());
           }
         } else if (dtype == torch::kChar) {
-          int8_t* data = tensor.data_ptr<int8_t>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(to_ruby<int>(data[i]));
+            a.push(to_ruby<int>(view[i].item().to<int8_t>()));
           }
         } else if (dtype == torch::kShort) {
-          int16_t* data = tensor.data_ptr<int16_t>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<int16_t>());
           }
         } else if (dtype == torch::kInt) {
-          int32_t* data = tensor.data_ptr<int32_t>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<int32_t>());
           }
         } else if (dtype == torch::kLong) {
-          int64_t* data = tensor.data_ptr<int64_t>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<int64_t>());
           }
         } else if (dtype == torch::kFloat) {
-          float* data = tensor.data_ptr<float>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<float>());
           }
         } else if (dtype == torch::kDouble) {
-          double* data = tensor.data_ptr<double>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i]);
+            a.push(view[i].item().to<double>());
           }
         } else if (dtype == torch::kBool) {
-          bool* data = tensor.data_ptr<bool>();
           for (int i = 0; i < tensor.numel(); i++) {
-            a.push(data[i] ? True : False);
+            a.push(view[i].item().to<bool>() ? True : False);
           }
         } else {
           throw std::runtime_error("Unsupported type");
@@ -442,14 +461,14 @@ void Init_ext()
     .define_singleton_method(
       "_make_subclass",
       *[](Tensor& rd, bool requires_grad) {
-        auto data = torch::autograd::as_variable_ref(rd).detach();
+        auto data = rd.detach();
         data.unsafeGetTensorImpl()->set_allow_tensor_metadata_change(true);
         auto var = data.set_requires_grad(requires_grad);
         return Parameter(std::move(var));
       });
   Class rb_cTensorOptions = define_class_under<torch::TensorOptions>(rb_mTorch, "TensorOptions")
-    .add_handler<c10::Error>(handle_error)
+    .add_handler<torch::Error>(handle_error)
     .define_constructor(Constructor<torch::TensorOptions>())
     .define_method(
       "dtype",
@@ -555,6 +574,7 @@ void Init_ext()
       });
   Class rb_cParameter = define_class_under<Parameter, torch::Tensor>(rb_mNN, "Parameter")
+    .add_handler<torch::Error>(handle_error)
     .define_method(
       "grad",
       *[](Parameter& self) {
@@ -564,6 +584,7 @@ void Init_ext()
   Class rb_cDevice = define_class_under<torch::Device>(rb_mTorch, "Device")
     .define_constructor(Constructor<torch::Device, std::string>())
+    .add_handler<torch::Error>(handle_error)
     .define_method("index", &torch::Device::index)
     .define_method("index?", &torch::Device::has_index)
     .define_method(
@@ -575,6 +596,7 @@ void Init_ext()
       });
   Module rb_mCUDA = define_module_under(rb_mTorch, "CUDA")
+    .add_handler<torch::Error>(handle_error)
     .define_singleton_method("available?", &torch::cuda::is_available)
     .define_singleton_method("device_count", &torch::cuda::device_count);
 }

data/ext/torch/extconf.rb CHANGED

@@ -7,17 +7,16 @@ $CXXFLAGS += " -std=c++14"
 # change to 0 for Linux pre-cxx11 ABI version
 $CXXFLAGS += " -D_GLIBCXX_USE_CXX11_ABI=1"
-# TODO check compiler name
-clang = RbConfig::CONFIG["host_os"] =~ /darwin/i
+apple_clang = RbConfig::CONFIG["CC_VERSION_MESSAGE"] =~ /apple clang/i
 # check omp first
 if have_library("omp") || have_library("gomp")
   $CXXFLAGS += " -DAT_PARALLEL_OPENMP=1"
-  $CXXFLAGS += " -Xclang" if clang
+  $CXXFLAGS += " -Xclang" if apple_clang
   $CXXFLAGS += " -fopenmp"
 end
-if clang
+if apple_clang
   # silence ruby/intern.h warning
   $CXXFLAGS += " -Wno-deprecated-register"

data/lib/torch.rb CHANGED

@@ -1,6 +1,12 @@
 # ext
 require "torch/ext"
+# stdlib
+require "fileutils"
+require "net/http"
+require "set"
+require "tmpdir"
 # native functions
 require "torch/native/generator"
 require "torch/native/parser"
@@ -465,11 +471,7 @@ module Torch
       when nil
         IValue.new
       when Array
-        if obj.all? { |v| v.is_a?(Tensor) }
-          IValue.from_list(obj.map { |v| IValue.from_tensor(v) })
-        else
-          raise Error, "Unknown list type"
-        end
+        IValue.from_list(obj.map { |v| to_ivalue(v) })
       else
         raise Error, "Unknown type: #{obj.class.name}"
       end

data/lib/torch/hub.rb CHANGED

@@ -5,12 +5,56 @@ module Torch
         raise NotImplementedYet
       end
-      def download_url_to_file(url)
-        raise NotImplementedYet
+      def download_url_to_file(url, dst)
+        uri = URI(url)
+        tmp = "#{Dir.tmpdir}/#{Time.now.to_f}" # TODO better name
+        location = nil
+        Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == "https") do |http|
+          request = Net::HTTP::Get.new(uri)
+          puts "Downloading #{url}..."
+          File.open(tmp, "wb") do |f|
+            http.request(request) do |response|
+              case response
+              when Net::HTTPRedirection
+                location = response["location"]
+              when Net::HTTPSuccess
+                response.read_body do |chunk|
+                  f.write(chunk)
+                end
+              else
+                raise Error, "Bad response"
+              end
+            end
+          end
+        end
+        if location
+          download_url_to_file(location, dst)
+        else
+          FileUtils.mv(tmp, dst)
+          nil
+        end
       end
-      def load_state_dict_from_url(url)
-        raise NotImplementedYet
+      def load_state_dict_from_url(url, model_dir: nil)
+        unless model_dir
+          torch_home = ENV["TORCH_HOME"] || "#{ENV["XDG_CACHE_HOME"] || "#{ENV["HOME"]}/.cache"}/torch"
+          model_dir = File.join(torch_home, "checkpoints")
+        end
+        FileUtils.mkdir_p(model_dir)
+        parts = URI(url)
+        filename = File.basename(parts.path)
+        cached_file = File.join(model_dir, filename)
+        unless File.exist?(cached_file)
+          # TODO support hash_prefix
+          download_url_to_file(url, cached_file)
+        end
+        Torch.load(cached_file)
       end
     end
   end

data/lib/torch/inspector.rb CHANGED

@@ -1,89 +1,264 @@
+# mirrors _tensor_str.py
 module Torch
   module Inspector
-    # TODO make more performance, especially when summarizing
-    # how? only read data that will be displayed
-    def inspect
-      data =
-        if numel == 0
-          "[]"
-        elsif dim == 0
-          item
+    PRINT_OPTS = {
+      precision: 4,
+      threshold: 1000,
+      edgeitems: 3,
+      linewidth: 80,
+      sci_mode: nil
+    }
+    class Formatter
+      def initialize(tensor)
+        @floating_dtype = tensor.floating_point?
+        @complex_dtype = tensor.complex?
+        @int_mode = true
+        @sci_mode = false
+        @max_width = 1
+        tensor_view = Torch.no_grad { tensor.reshape(-1) }
+        if !@floating_dtype
+          tensor_view.each do |value|
+            value_str = value.item.to_s
+            @max_width = [@max_width, value_str.length].max
+          end
         else
-          summarize = numel > 1000
+          nonzero_finite_vals = Torch.masked_select(tensor_view, Torch.isfinite(tensor_view) & tensor_view.ne(0))
+          # no valid number, do nothing
+          return if nonzero_finite_vals.numel == 0
+          # Convert to double for easy calculation. HalfTensor overflows with 1e8, and there's no div() on CPU.
+          nonzero_finite_abs = nonzero_finite_vals.abs.double
+          nonzero_finite_min = nonzero_finite_abs.min.double
+          nonzero_finite_max = nonzero_finite_abs.max.double
+          nonzero_finite_vals.each do |value|
+            if value.item != value.item.ceil
+              @int_mode = false
+              break
+            end
+          end
-          if dtype == :bool
-            fmt = "%s"
+          if @int_mode
+            # in int_mode for floats, all numbers are integers, and we append a decimal to nonfinites
+            # to indicate that the tensor is of floating type. add 1 to the len to account for this.
+            if nonzero_finite_max / nonzero_finite_min > 1000.0 || nonzero_finite_max > 1.0e8
+              @sci_mode = true
+              nonzero_finite_vals.each do |value|
+                value_str = "%.#{PRINT_OPTS[:precision]}e" % value.item
+                @max_width = [@max_width, value_str.length].max
+              end
+            else
+              nonzero_finite_vals.each do |value|
+                value_str = "%.0f" % value.item
+                @max_width = [@max_width, value_str.length + 1].max
+              end
+            end
           else
-            values = to_a.flatten
-            abs = values.select { |v| v != 0 }.map(&:abs)
-            max = abs.max || 1
-            min = abs.min || 1
-            total = 0
-            if values.any? { |v| v < 0 }
-              total += 1
+            # Check if scientific representation should be used.
+            if nonzero_finite_max / nonzero_finite_min > 1000.0 || nonzero_finite_max > 1.0e8 || nonzero_finite_min < 1.0e-4
+              @sci_mode = true
+              nonzero_finite_vals.each do |value|
+                value_str = "%.#{PRINT_OPTS[:precision]}e" % value.item
+                @max_width = [@max_width, value_str.length].max
+              end
+            else
+              nonzero_finite_vals.each do |value|
+                value_str = "%.#{PRINT_OPTS[:precision]}f" % value.item
+                @max_width = [@max_width, value_str.length].max
+              end
             end
+          end
+        end
-            if floating_point?
-              sci = max > 1e8 || max < 1e-4
+        @sci_mode = PRINT_OPTS[:sci_mode] unless PRINT_OPTS[:sci_mode].nil?
+      end
-              all_int = values.all? { |v| v.finite? && v == v.to_i }
-              decimal = all_int ? 1 : 4
+      def width
+        @max_width
+      end
-              total += sci ? 10 : decimal + 1 + max.to_i.to_s.size
+      def format(value)
+        value = value.item
-              if sci
-                fmt = "%#{total}.4e"
-              else
-                fmt = "%#{total}.#{decimal}f"
-              end
-            else
-              total += max.to_s.size
-              fmt = "%#{total}d"
+        if @floating_dtype
+          if @sci_mode
+            ret = "%#{@max_width}.#{PRINT_OPTS[:precision]}e" % value
+          elsif @int_mode
+            ret = String.new("%.0f" % value)
+            unless value.infinite? || value.nan?
+              ret += "."
             end
+          else
+            ret = "%.#{PRINT_OPTS[:precision]}f" % value
           end
+        elsif @complex_dtype
+          p = PRINT_OPTS[:precision]
+          raise NotImplementedYet
+        else
+          ret = value.to_s
+        end
+        # Ruby throws error when negative, Python doesn't
+        " " * [@max_width - ret.size, 0].max + ret
+      end
+    end
+    def inspect
+      Torch.no_grad do
+        str_intern(self)
+      end
+    rescue => e
+      # prevent stack error
+      puts e.backtrace.join("\n")
+      "Error inspecting tensor: #{e.inspect}"
+    end
+    private
+    # TODO update
+    def str_intern(slf)
+      prefix = "tensor("
+      indent = prefix.length
+      suffixes = []
+      has_default_dtype = [:float32, :int64, :bool].include?(slf.dtype)
+      if slf.numel == 0 && !slf.sparse?
+        # Explicitly print the shape if it is not (0,), to match NumPy behavior
+        if slf.dim != 1
+          suffixes << "size: #{shape.inspect}"
+        end
-          inspect_level(to_a, fmt, dim - 1, 0, summarize)
+        # In an empty tensor, there are no elements to infer if the dtype
+        # should be int64, so it must be shown explicitly.
+        if slf.dtype != :int64
+          suffixes << "dtype: #{slf.dtype.inspect}"
         end
+        tensor_str = "[]"
+      else
+        if !has_default_dtype
+          suffixes << "dtype: #{slf.dtype.inspect}"
+        end
+        if slf.layout != :strided
+          tensor_str = tensor_str(slf.to_dense, indent)
+        else
+          tensor_str = tensor_str(slf, indent)
+        end
+      end
-      attributes = []
-      if requires_grad
-        attributes << "requires_grad: true"
+      if slf.layout != :strided
+        suffixes << "layout: #{slf.layout.inspect}"
       end
-      if ![:float32, :int64, :bool].include?(dtype)
-        attributes << "dtype: #{dtype.inspect}"
+      # TODO show grad_fn
+      if slf.requires_grad?
+        suffixes << "requires_grad: true"
       end
-      "tensor(#{data}#{attributes.map { |a| ", #{a}" }.join("")})"
+      add_suffixes(prefix + tensor_str, suffixes, indent, slf.sparse?)
     end
-    private
+    def add_suffixes(tensor_str, suffixes, indent, force_newline)
+      tensor_strs = [tensor_str]
+      # rfind in Python returns -1 when not found
+      last_line_len = tensor_str.length - (tensor_str.rindex("\n") || -1) + 1
+      suffixes.each do |suffix|
+        suffix_len = suffix.length
+        if force_newline || last_line_len + suffix_len + 2 > PRINT_OPTS[:linewidth]
+          tensor_strs << ",\n" + " " * indent + suffix
+          last_line_len = indent + suffix_len
+          force_newline = false
+        else
+          tensor_strs.append(", " + suffix)
+          last_line_len += suffix_len + 2
+        end
+      end
+      tensor_strs.append(")")
+      tensor_strs.join("")
+    end
-    # TODO DRY code
-    def inspect_level(arr, fmt, total, level, summarize)
-      if level == total
-        cols =
-          if summarize && arr.size > 7
-            arr[0..2].map { |v| fmt % v } +
-            ["..."] +
-            arr[-3..-1].map { |v| fmt % v }
-          else
-            arr.map { |v| fmt % v }
-          end
+    def tensor_str(slf, indent)
+      return "[]" if slf.numel == 0
+      summarize = slf.numel > PRINT_OPTS[:threshold]
+      if slf.dtype == :float16 || slf.dtype == :bfloat16
+        slf = slf.float
+      end
+      formatter = Formatter.new(summarize ? summarized_data(slf) : slf)
+      tensor_str_with_formatter(slf, indent, formatter, summarize)
+    end
+    def summarized_data(slf)
+      edgeitems = PRINT_OPTS[:edgeitems]
-        "[#{cols.join(", ")}]"
+      dim = slf.dim
+      if dim == 0
+        slf
+      elsif dim == 1
+        if size(0) > 2 * edgeitems
+          Torch.cat([slf[0...edgeitems], slf[-edgeitems..-1]])
+        else
+          slf
+        end
+      elsif slf.size(0) > 2 * edgeitems
+        start = edgeitems.times.map { |i| slf[i] }
+        finish = (slf.length - edgeitems).upto(slf.length - 1).map { |i| slf[i] }
+        Torch.stack((start + finish).map { |x| summarized_data(x) })
       else
-        rows =
-          if summarize && arr.size > 7
-            arr[0..2].map { |row| inspect_level(row, fmt, total, level + 1, summarize) } +
-            ["..."] +
-            arr[-3..-1].map { |row| inspect_level(row, fmt, total, level + 1, summarize) }
-          else
-            arr.map { |row| inspect_level(row, fmt, total, level + 1, summarize) }
-          end
+        Torch.stack(slf.map { |x| summarized_data(x) })
+      end
+    end
+    def tensor_str_with_formatter(slf, indent, formatter, summarize)
+      edgeitems = PRINT_OPTS[:edgeitems]
+      dim = slf.dim
-        "[#{rows.join(",#{"\n" * (total - level)}#{" " * (level + 8)}")}]"
+      return scalar_str(slf, formatter) if dim == 0
+      return vector_str(slf, indent, formatter, summarize) if dim == 1
+      if summarize && slf.size(0) > 2 * edgeitems
+        slices = (
+          [edgeitems.times.map { |i| tensor_str_with_formatter(slf[i], indent + 1, formatter, summarize) }] +
+          ["..."] +
+          [((slf.length - edgeitems)...slf.length).map { |i| tensor_str_with_formatter(slf[i], indent + 1, formatter, summarize) }]
+        )
+      else
+        slices = slf.size(0).times.map { |i| tensor_str_with_formatter(slf[i], indent + 1, formatter, summarize) }
       end
+      tensor_str = slices.join("," + "\n" * (dim - 1) + " " * (indent + 1))
+      "[" + tensor_str + "]"
+    end
+    def scalar_str(slf, formatter)
+      formatter.format(slf)
+    end
+    def vector_str(slf, indent, formatter, summarize)
+      # length includes spaces and comma between elements
+      element_length = formatter.width + 2
+      elements_per_line = [1, ((PRINT_OPTS[:linewidth] - indent) / element_length.to_f).floor.to_i].max
+      char_per_line = element_length * elements_per_line
+      if summarize && slf.size(0) > 2 * PRINT_OPTS[:edgeitems]
+        data = (
+          [slf[0...PRINT_OPTS[:edgeitems]].map { |val| formatter.format(val) }] +
+          [" ..."] +
+          [slf[-PRINT_OPTS[:edgeitems]..-1].map { |val| formatter.format(val) }]
+        )
+      else
+        data = slf.map { |val| formatter.format(val) }
+      end
+      data_lines = (0...data.length).step(elements_per_line).map { |i| data[i...(i + elements_per_line)] }
+      lines = data_lines.map { |line| line.join(", ") }
+      "[" + lines.join("," + "\n" + " " * (indent + 1)) + "]"
     end
   end
 end