RubyGems - tensor_stream - Versions diffs - 0.8.0 → 0.8.1 - Mend

tensor_stream 0.8.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

checksums.yaml +5 -5
data/CHANGELOG.md +4 -0
data/README.md +18 -5
data/lib/tensor_stream/evaluator/opencl/kernels/apply_adam.cl +23 -0
data/lib/tensor_stream/evaluator/opencl/opencl_evaluator.rb +51 -4
data/lib/tensor_stream/evaluator/ruby/math_ops.rb +144 -0
data/lib/tensor_stream/evaluator/ruby/nn_ops.rb +99 -0
data/lib/tensor_stream/evaluator/ruby_evaluator.rb +6 -253
data/lib/tensor_stream/ops.rb +2 -1
data/lib/tensor_stream/session.rb +17 -8
data/lib/tensor_stream/train/adam_optimizer.rb +87 -0
data/lib/tensor_stream/train/gradient_descent_optimizer.rb +2 -1
data/lib/tensor_stream/train/optimizer.rb +25 -2
data/lib/tensor_stream/train/slot_creator.rb +1 -1
data/lib/tensor_stream/trainer.rb +1 -0
data/lib/tensor_stream/utils.rb +25 -4
data/lib/tensor_stream/variable.rb +1 -1
data/lib/tensor_stream/variable_scope.rb +7 -1
data/lib/tensor_stream/version.rb +1 -1
data/samples/iris.rb +9 -6
data/samples/linear_regression.rb +6 -4
data/samples/nearest_neighbor.rb +2 -2
data/{test_samples → samples}/raw_neural_net_sample.rb +17 -20
metadata +7 -8
data/test_samples/error.graphml +0 -120
data/test_samples/gradient_sample.graphml +0 -1255
data/test_samples/neural_network_raw.py +0 -101
data/test_samples/test.py +0 -46
data/test_samples/test2.py +0 -87

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: f84c2b9852fcf4931c47c0130b67497a50a87b0f
-  data.tar.gz: 524e1105da4e06e3472cbcfa0e6f764ae4512d37
+SHA256:
+  metadata.gz: a3c7d0a810a79ceedc0237379b105d7a9b598ba2513ef2d59ba3cec78d7b0da0
+  data.tar.gz: 7c0d90b27e548b72a86e88e7181f3d3b131a5fa3c6800000c743dd6e47d47b3b
 SHA512:
-  metadata.gz: 420e2675ab67d4c8462534bdf8c703671656f7852d984579e22ee57f1425dd5740fcb64a1e52363bf337cd7c691d87a75c76bd868b13c8a7f06d78e0eb00aa73
-  data.tar.gz: 24fe1022741883d46cdd5af51309da33d421d72874f0cc84bf2e0ed14a62602f1830c6060bd86e42359b7962b4a57727c9a48ce13d5950d5ba02f6a9cdfd719f
+  metadata.gz: a7a8a5607883d868da3ceaa9f870ac5c6d6809d45992a9ae0e4f26d5648bda2c4a26ea3abec506abd92362dcf3b63935b3f6acbcc70b605600307313f8c69f49
+  data.tar.gz: db8917bee53f91e1017b5fdb8b9ece8b50a1096d249cee40eed8c335f597f32e5ce81056230b8ce1b9acff4df728497fa13e896212cbdaccc023dbdb7ed3591e

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.8.1] - 2018-08-30
+- [TRAINING] Added AdamOptimizer
 ## [0.8.0] - 2018-08-29
 ### Added
 - [TRAINING] Added new supported optimizer, MomentumOptimizer loosely based on tensorflow's implementation (with nesterov support)

data/README.md CHANGED Viewed

@@ -17,7 +17,20 @@ The goal of this gem is to have a high performance machine learning and compute
 - eager execution (experimental)
 - (08-08-2018) Load pbtext files from tensorflow (Graph.parse_from_string)
-Since this is a pure ruby implementation for now, performance is not there yet. However it should be a good enough environment to learn about tensorflow and experiment with some models.
+## Compatibility
+TensorStream comes with a pure ruby and OpenCL implementation out of the box. The pure ruby implementation
+is known to work with most ruby implementations including TruffleRuby, JRuby as well as jit enabled versions of mri (ruby-2.6.0).
+OpenCL is supported only on mri implementations of ruby. This can be enabled by including the OpenCL evaluator (Make sure you have OpenCL drivers installed correctly on your system):
+```ruby
+require 'tensor_stream/evaluator/opencl/opencl_evaluator'
+```
+OpenCL is basically a requirement for deep learning and image processing tasks as the ruby implementation is too slow even with jit speedups using latest ruby implementations.
+OpenCL kernels used by tensorstream can be found at tensor_stream/lib/evaluator/opencl/kernels. These are non specific and should work with any device that supports OpenCL including intel GPUs and CPUs, as well as GPUs from Nvidia and AMD.
 ## Installation
@@ -72,6 +85,8 @@ pred = X * W + b
 # Mean squared error
 cost = ((pred - Y) ** 2).reduce(:+) / ( 2 * n_samples)
+# optimizer =  TensorStream::Train::MomentumOptimizer.new(0.01, 0.5, use_nesterov: true).minimize(cost)
+# optimizer =  TensorStream::Train::AdamOptimizer.new.minimize(cost)
 optimizer = TensorStream::Train::GradientDescentOptimizer.new(learning_rate).minimize(cost)
 # Initialize the variables (i.e. assign their default value)
@@ -338,7 +353,7 @@ ruby 2.4
 $ ruby -v
 ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
 $ ruby samples/linear_regression.rb
-495 seconds 1000 epochs
+495 seconds 10000 epochs
 ```
 ruby 2.6.0-preview2
@@ -383,8 +398,6 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
 Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/tensor_stream. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
 ## License
-The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

data/lib/tensor_stream/evaluator/opencl/kernels/apply_adam.cl ADDED Viewed

@@ -0,0 +1,23 @@
+% c_dtype = dtype_to_c_type(dtype)
+ // same dimension add floating point op
+ __kernel void apply_adam_<%= dtype %>(const int M, const int N,
+                                       __global const <%= c_dtype %> *grad,
+                                       __global const <%= c_dtype %> *learning_rate,
+                                       __global const <%= c_dtype %> *beta1_power,
+                                       __global const <%= c_dtype %> *beta2_power,
+                                       __global const <%= c_dtype %> *beta1,
+                                       __global const <%= c_dtype %> *beta2,
+                                       __global const <%= c_dtype %> *epsilon,
+                                       __global <%= c_dtype %> *momentum,
+                                       __global <%= c_dtype %> *output, __global <%= c_dtype %> *v) {
+    // Get the index of the current element to be processed
+    const int globalRow = get_global_id(0); // Row ID of C (0..M)
+    const int globalCol = get_global_id(1); // Col ID of C (0..N)
+    const int index = globalRow * N + globalCol;
+    <%= c_dtype %> alpha = learning_rate[0] * sqrt(1.0 - beta2_power[0]) / (1.0 - beta1_power[0]);
+    momentum[index] += (grad[index] - momentum[index]) * (1.0 - beta1[0]);
+    v[index] += (grad[index] * grad[index] - v[index]) * (1.0 - beta2[0]);
+    output[index] -= (momentum[index] * alpha) / ( sqrt(v[index]) + epsilon[0] );
+}

data/lib/tensor_stream/evaluator/opencl/opencl_evaluator.rb CHANGED Viewed

@@ -107,9 +107,8 @@ module TensorStream
         end
       end
-      def complete_eval(tensor, context)
+      def enqueue_buffer_read(tensor, context)
         buffer = _run(tensor, context)
         if buffer.is_a?(Array)
           buffer = buffer.collect do |b|
             next b if b.buffer.size.zero?
@@ -122,7 +121,12 @@ module TensorStream
           return [] if buffer.buffer.nil?
           return buffer if buffer.buffer.size.zero?
           _opencl_queue.enqueue_read_buffer(buffer.cl_buffer, buffer.buffer, event_wait_list: build_event_wait_list([buffer]))
+          buffer
         end
+      end
+      def complete_eval(tensor, context)
+        buffer = enqueue_buffer_read(tensor, context)
         _opencl_queue.finish
         buffer
       end
@@ -339,6 +343,48 @@ module TensorStream
                          learning_rate.cl_buffer, momentum.cl_buffer, output_buffer.cl_buffer,
                          assign_acc.buffer.cl_buffer, event_wait_list: event_wait_list)
         output_buffer.op = event
+        assign_acc.buffer.op = event
+        output_buffer
+      end
+      # Adam optimization algorithm
+      register_op :apply_adam do |_context, tensor, inputs|
+        _target_var, _m, _v, beta1_power, beta2_power, lr_t, beta1_t, beta2_t, epsilon_t, grad = inputs
+        assign = tensor.inputs[0] || tensor
+        assign_m = tensor.inputs[1]
+        assign_v = tensor.inputs[2]
+        # mark variable buffers as dirty
+        assign.buffer.dirty = true # force buffer copy when variable is read externally
+        assign_m.buffer.dirty = true # force buffer copy when variable is read externally
+        assign_v.buffer.dirty = true # force buffer copy when variable is read externally
+        output_buffer = assign.buffer
+        m, n = output_buffer.shape
+        work_group = [m || 1, n || 1]
+        cl_m = OpenCL::Int1.new(m || 1)
+        cl_n = OpenCL::Int1.new(n || 1)
+        event_wait_list = build_event_wait_list(inputs)
+        method_call = :"apply_adam_#{output_buffer.data_type}"
+        event = _cl_program("apply_adam", dtype: output_buffer.data_type)
+                            .send(method_call, _opencl_queue, work_group, cl_m, cl_n,
+                                  grad.cl_buffer,
+                                  lr_t.cl_buffer,
+                                  beta1_power.cl_buffer,
+                                  beta2_power.cl_buffer,
+                                  beta1_t.cl_buffer,
+                                  beta2_t.cl_buffer,
+                                  epsilon_t.cl_buffer,
+                                  assign_m.buffer.cl_buffer,
+                                  assign.buffer.cl_buffer,
+                                  assign_v.buffer.cl_buffer,
+                                  event_wait_list: event_wait_list)
+        output_buffer.op = event
+        assign_m.buffer.op = event
+        assign_v.buffer.op = event
         output_buffer
       end
@@ -713,8 +759,9 @@ module TensorStream
         convert_to_opencl(arr.buffer, shape, data_type: arr.data_type, name: tensor.name)
       end
-      register_op :flow_group do |_context, _tensor, inputs|
-        inputs
+      register_op :flow_group do |context, _tensor, inputs|
+        _opencl_queue.finish
+        nil
       end
       register_op :size do |_context, tensor, inputs|

data/lib/tensor_stream/evaluator/ruby/math_ops.rb ADDED Viewed

@@ -0,0 +1,144 @@
+module TensorStream
+  module MathOps
+    def MathOps.included(klass)
+      klass.class_eval do
+        register_op :tanh, no_eval: true do |context, _tensor, inputs|
+          call_op(:tanh, inputs[0], context, ->(t, _b) { Math.tanh(t) })
+        end
+        register_op :tan, no_eval: true do |context, _tensor, inputs|
+          call_op(:tan, inputs[0], context, ->(t, _b) { Math.tan(t) })
+        end
+        register_op :atan, no_eval: true do |context, _tensor, inputs|
+          call_op(:atan, inputs[0], context, ->(t, _b) { Math.atan(t) })
+        end
+        register_op :sec, no_eval: true do |context, _tensor, inputs|
+          call_op(:sec, inputs[0], context, ->(t, _b) { Math.sec(t) })
+        end
+        register_op :sin, no_eval: true do |context, _tensor, inputs|
+          call_op(:sin, inputs[0], context, ->(t, _b) { Math.sin(t) })
+        end
+        register_op :add, no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :add, a, b, context, ->(t, u) { t + u })
+        end
+        register_op :add_n, no_eval: true do |context, tensor, inputs|
+          if inputs.size == 1
+            complete_eval(inputs[0], context)
+          elsif inputs.size > 1
+            a = inputs.pop
+            until inputs.empty?
+              b = inputs.pop
+              a = call_vector_op(tensor, :add, a, b, context, ->(t, u) { t + u })
+            end
+            a
+          end
+        end
+        register_op :sub, no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :sub, a, b, context, ->(t, u) { t - u })
+        end
+        register_op %i[floor_mod mod], no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :mod, a, b, context, ->(t, u) { t % u })
+        end
+        register_op %i[floor_div], no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          if fp_type?(tensor.data_type)
+            call_vector_op(tensor, :div, a, b, context, ->(t, u) { (t / u).to_i.to_f })
+          else
+            call_vector_op(tensor, :div, a, b, context, ->(t, u) { t / u })
+          end
+        end
+        register_op :mul, no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :mul, a, b, context, ->(t, u) { t * u })
+        end
+        register_op :pow, no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :pow, a, b, context, ->(t, u) { t**u })
+        end
+        register_op :squared_difference, no_eval: true do |context, tensor, inputs|
+          a, b = inputs
+          call_vector_op(tensor, :squared_difference, a, b, context, ->(t, u) { (t - u) * (t - u) })
+        end
+        register_op :round, no_eval: true do |context, _tensor, inputs|
+          call_op(:round, inputs[0], context, ->(t, _b) { t.round })
+        end
+        register_op :abs, no_eval: true do |context, _tensor, inputs|
+          call_op(:abs, inputs[0], context, ->(t, _b) { t.abs })
+        end
+        register_op :asin, no_eval: true do |context, _tensor, inputs|
+          call_op(:asin, inputs[0], context, ->(t, _b) { Math.asin(t) })
+        end
+        register_op :acos, no_eval: true do |context, _tensor, inputs|
+          call_op(:acos, inputs[0], context, ->(t, _b) { Math.acos(t) })
+        end
+        register_op :cos, no_eval: true do |context, _tensor, inputs|
+          call_op(:cos, inputs[0], context, ->(t, _b) { Math.cos(t) })
+        end
+        register_op :log1p, no_eval: true do |context, _tensor, inputs|
+          call_op(:log1p, inputs[0], context, ->(t, _b) { Math.log(1 + t) })
+        end
+        register_op :log, no_eval: true do |context, _tensor, inputs|
+          call_op(:log, inputs[0], context, ->(t, _b) { t < 0 ? Float::NAN : Math.log(t) })
+        end
+        register_op :exp, no_eval: true do |context, _tensor, inputs|
+          call_op(:exp, inputs[0], context, ->(t, _b) { Math.exp(t) })
+        end
+        register_op :sigmoid, no_eval: true do |context, _tensor, inputs|
+          call_op(:sigmoid, inputs[0], context, ->(t, _b) { sigmoid(t) })
+        end
+        register_op :sqrt, no_eval: true do |context, _tensor, inputs|
+          call_op(:sqrt, inputs[0], context, ->(t, _b) { Math.sqrt(t) })
+        end
+        register_op :floor, no_eval: true do |context, _tensor, inputs|
+          call_op(:floor, inputs[0], context, ->(t, _b) { t.floor })
+        end
+        register_op :ceil, no_eval: true do |context, _tensor, inputs|
+          call_op(:ceil, inputs[0], context, ->(t, _b) { t.ceil })
+        end
+        register_op :square, no_eval: true do |context, _tensor, inputs|
+          call_op(:square, inputs[0], context, ->(t, _b) { t * t })
+        end
+        register_op :reciprocal, no_eval: true do |context, _tensor, inputs|
+          call_op(:reciprocal, inputs[0], context, ->(t, _b) { 1 / t })
+        end
+        register_op %i[neg negate], no_eval: true do |context, tensor, inputs|
+          call_vector_op(tensor, :negate, inputs[0], nil, context, ->(t, _u) { -t })
+        end
+        register_op :tanh_grad, no_eval: true do |context, _tensor, inputs|
+          call_op(:tanh_grad, inputs[0], context, ->(t, _b) { 1 - Math.tanh(t) * Math.tanh(t) })
+        end
+      end
+    end
+  end
+end

data/lib/tensor_stream/evaluator/ruby/nn_ops.rb ADDED Viewed

@@ -0,0 +1,99 @@
+module TensorStream
+  ## Collection of machine learning related ops
+  module NNOps
+    def NNOps.included(klass)
+      klass.class_eval do
+        register_op :apply_gradient_descent do |context, tensor, inputs|
+          target_var, learning_rate, delta = inputs
+          assign = tensor.inputs[0] || tensor
+          assign.value = process_vector_math_op(tensor, target_var, delta, context, ->(t, u) { t - u * learning_rate })
+          assign.value
+        end
+        register_op :apply_momentum do |context, tensor, inputs|
+          target_var, momentum_var, learning_rate, grad, momentum = inputs
+          assign = tensor.inputs[0] || tensor
+          assign_acc = tensor.inputs[1]
+          assign_acc.value = process_vector_math_op(tensor, momentum_var, grad, context, ->(t, u) { t * momentum + u })
+          if tensor.options[:use_nesterov]
+            delta = process_vector_math_op(tensor, grad, momentum_var, context, ->(g, acc) { g * learning_rate + acc * momentum * learning_rate })
+            assign.value = process_vector_math_op(tensor,target_var, delta, context, ->(t, u) { t - u })
+          else
+            assign.value = process_vector_math_op(tensor, target_var, momentum_var, context, ->(v, acc) { v - acc * learning_rate })
+          end
+          assign.value
+        end
+        register_op :apply_adam do |context, tensor, inputs|
+          target_var, m, v, beta1_power, beta2_power, lr_t, beta1_t, beta2_t, epsilon_t, grad = inputs
+          alpha = lr_t * Math.sqrt( 1.0 - beta2_power) / (1.0 - beta1_power)
+          assign = tensor.inputs[0]
+          assign_m = tensor.inputs[1]
+          assign_v = tensor.inputs[2]
+          m_delta = process_vector_math_op(tensor, grad, m, context, ->(g, m_d) { (g - m_d) * (1.0 - beta1_t) })
+          assign_m.value = process_vector_math_op(tensor, m, m_delta, context, ->(u_d , v_d) { u_d + v_d })
+          assign_v.value = process_vector_math_op(tensor, v, grad, context, ->(u_d , v_d) { u_d + (v_d ** 2 - u_d) * (1.0 - beta2_t)})
+          v_delta = process_vector_math_op(tensor, assign_m.value, assign_v.value, context, ->(m_d , v_d) {  (m_d * alpha) / (Math.sqrt(v_d) + epsilon_t) })
+          assign.value = process_vector_math_op(tensor, target_var, v_delta, context, ->(var_d , delta_d) { var_d - delta_d })
+          assign.value
+        end
+        register_op %i[softmax_cross_entropy_with_logits_v2 softmax_cross_entropy_with_logits] do |_context, tensor, inputs|
+          last_dimen_list = last_axis(inputs[0])
+          input_shape = shape_eval(inputs[0])
+          rank = input_shape.size - 1
+          labels = last_axis(inputs[1])
+          func = lambda { |logits, label|
+            c = logits.max
+            transformed_logits = logits.map { |l| l - c }
+            sum = transformed_logits.map { |x| Math.exp(x) }.reduce(:+)
+            losses = transformed_logits.zip(label).map { |x, y| (Math.log(sum) - x) * y }
+            probs = transformed_logits.zip(label).map  { |x, y| (Math.exp(x) / sum) - y }
+            [losses, probs]
+          }
+          if input_shape.size == 1
+            loss, prob = func.call(last_dimen_list, labels)
+            loss = reduce(loss, rank, false)
+            TensorStream::Evaluator::OutputGroup.new([loss, prob], [tensor.inputs[0].data_type, tensor.inputs[0].data_type])
+          else
+            losses = []
+            backprobs = []
+            arr = last_dimen_list.zip(labels).each do |list, label|
+              loss, prob = func.call(list, label)
+              losses << loss
+              backprobs << prob
+            end
+            reshaped_losses = TensorShape.reshape(losses.flatten, input_shape)
+            reshaped_backprops = TensorShape.reshape(backprobs.flatten, input_shape)
+            reshaped_losses = reduce(reshaped_losses, rank, false)
+            TensorStream::Evaluator::OutputGroup.new([reshaped_losses, reshaped_backprops], [tensor.inputs[0].data_type, tensor.inputs[0].data_type])
+          end
+        end
+        register_op :log_softmax do |_context, _tensor, inputs|
+          input_shape = shape_eval(inputs[0])
+          last_dimen_list = last_axis(inputs[0])
+          func = lambda { |logits|
+            c = logits.max
+            transformed_logits = logits.map { |l| l - c }
+            sum = transformed_logits.map { |x| Math.exp(x) }.reduce(:+)
+            transformed_logits.map { |x| x - Math.log(sum) }
+          }
+          if input_shape.size == 1
+            func.call(last_dimen_list)
+          else
+            arr = last_dimen_list.collect do |list|
+              func.call(list)
+            end
+            TensorShape.reshape(arr.flatten, input_shape)
+          end
+        end
+      end
+    end
+  end
+end

data/lib/tensor_stream/evaluator/ruby_evaluator.rb CHANGED Viewed

@@ -2,6 +2,8 @@ require 'tensor_stream/evaluator/operation_helpers/random_gaussian'
 require 'tensor_stream/evaluator/operation_helpers/array_ops_helper'
 require 'tensor_stream/evaluator/operation_helpers/math_helper'
 require 'tensor_stream/evaluator/base_evaluator'
+require 'tensor_stream/evaluator/ruby/math_ops'
+require 'tensor_stream/evaluator/ruby/nn_ops'
 module TensorStream
   module Evaluator
@@ -29,6 +31,8 @@ module TensorStream
       include TensorStream::OpHelper
       include TensorStream::ArrayOpsHelper
       include TensorStream::MathHelper
+      include TensorStream::MathOps
+      include TensorStream::NNOps
       def run(tensor, execution_context)
         return tensor.map { |t| run(t, execution_context) } if tensor.is_a?(Array) && !tensor.empty? && tensor[0].is_a?(Tensor)
@@ -243,143 +247,10 @@ module TensorStream
         Tensor.cast_dtype(input.flatten.size, tensor.options[:out_type])
       end
-      register_op %i[neg negate], no_eval: true do |context, tensor, inputs|
-        call_vector_op(tensor, :negate, inputs[0], nil, context, ->(t, _u) { -t })
-      end
-      register_op :add, no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :add, a, b, context, ->(t, u) { t + u })
-      end
-      register_op :add_n, no_eval: true do |context, tensor, inputs|
-        if inputs.size == 1
-          complete_eval(inputs[0], context)
-        elsif inputs.size > 1
-          a = inputs.pop
-          until inputs.empty?
-            b = inputs.pop
-            a = call_vector_op(tensor, :add, a, b, context, ->(t, u) { t + u })
-          end
-          a
-        end
-      end
-      register_op :sub, no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :sub, a, b, context, ->(t, u) { t - u })
-      end
-      register_op %i[floor_mod mod], no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :mod, a, b, context, ->(t, u) { t % u })
-      end
-      register_op %i[floor_div], no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        if fp_type?(tensor.data_type)
-          call_vector_op(tensor, :div, a, b, context, ->(t, u) { (t / u).to_i.to_f })
-        else
-          call_vector_op(tensor, :div, a, b, context, ->(t, u) { t / u })
-        end
-      end
-      register_op :mul, no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :mul, a, b, context, ->(t, u) { t * u })
-      end
-      register_op :pow, no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :pow, a, b, context, ->(t, u) { t**u })
-      end
-      register_op :squared_difference, no_eval: true do |context, tensor, inputs|
-        a, b = inputs
-        call_vector_op(tensor, :squared_difference, a, b, context, ->(t, u) { (t - u) * (t - u) })
-      end
       register_op %i[concat concat_v2] do |_context, tensor, inputs|
         concat_array(inputs, tensor.options[:axis])
       end
-      register_op :round, no_eval: true do |context, _tensor, inputs|
-        call_op(:round, inputs[0], context, ->(t, _b) { t.round })
-      end
-      register_op :abs, no_eval: true do |context, _tensor, inputs|
-        call_op(:abs, inputs[0], context, ->(t, _b) { t.abs })
-      end
-      register_op :tanh, no_eval: true do |context, _tensor, inputs|
-        call_op(:tanh, inputs[0], context, ->(t, _b) { Math.tanh(t) })
-      end
-      register_op :tan, no_eval: true do |context, _tensor, inputs|
-        call_op(:tan, inputs[0], context, ->(t, _b) { Math.tan(t) })
-      end
-      register_op :atan, no_eval: true do |context, _tensor, inputs|
-        call_op(:atan, inputs[0], context, ->(t, _b) { Math.atan(t) })
-      end
-      register_op :sec, no_eval: true do |context, _tensor, inputs|
-        call_op(:sec, inputs[0], context, ->(t, _b) { Math.sec(t) })
-      end
-      register_op :sin, no_eval: true do |context, _tensor, inputs|
-        call_op(:sin, inputs[0], context, ->(t, _b) { Math.sin(t) })
-      end
-      register_op :asin, no_eval: true do |context, _tensor, inputs|
-        call_op(:asin, inputs[0], context, ->(t, _b) { Math.asin(t) })
-      end
-      register_op :acos, no_eval: true do |context, _tensor, inputs|
-        call_op(:acos, inputs[0], context, ->(t, _b) { Math.acos(t) })
-      end
-      register_op :cos, no_eval: true do |context, _tensor, inputs|
-        call_op(:cos, inputs[0], context, ->(t, _b) { Math.cos(t) })
-      end
-      register_op :log1p, no_eval: true do |context, _tensor, inputs|
-        call_op(:log1p, inputs[0], context, ->(t, _b) { Math.log(1 + t) })
-      end
-      register_op :log, no_eval: true do |context, _tensor, inputs|
-        call_op(:log, inputs[0], context, ->(t, _b) { t < 0 ? Float::NAN : Math.log(t) })
-      end
-      register_op :exp, no_eval: true do |context, _tensor, inputs|
-        call_op(:exp, inputs[0], context, ->(t, _b) { Math.exp(t) })
-      end
-      register_op :sigmoid, no_eval: true do |context, _tensor, inputs|
-        call_op(:sigmoid, inputs[0], context, ->(t, _b) { sigmoid(t) })
-      end
-      register_op :sqrt, no_eval: true do |context, _tensor, inputs|
-        call_op(:sqrt, inputs[0], context, ->(t, _b) { Math.sqrt(t) })
-      end
-      register_op :floor, no_eval: true do |context, _tensor, inputs|
-        call_op(:floor, inputs[0], context, ->(t, _b) { t.floor })
-      end
-      register_op :ceil, no_eval: true do |context, _tensor, inputs|
-        call_op(:ceil, inputs[0], context, ->(t, _b) { t.ceil })
-      end
-      register_op :square, no_eval: true do |context, _tensor, inputs|
-        call_op(:square, inputs[0], context, ->(t, _b) { t * t })
-      end
-      register_op :reciprocal, no_eval: true do |context, _tensor, inputs|
-        call_op(:reciprocal, inputs[0], context, ->(t, _b) { 1 / t })
-      end
       register_op :stop_gradient, no_eval: true do |_context, _tensor, inputs|
         inputs[0]
       end
@@ -517,14 +388,6 @@ module TensorStream
       end
       register_op :sum, noop: true do |context, tensor, _inputs|
-        # axis = complete_eval(tensor.inputs[1], context)
-        # # fast path
-        # if axis.nil? && !tensor.options[:keepdims]
-        #   arr = complete_eval(tensor.inputs[0], context)
-        #   next arr unless arr.is_a?(Array)
-        #   next arr.flatten.reduce(:+)
-        # end
         func = lambda do |arr|
           reduced_val = arr[0]
           arr[1..arr.size].each do |v|
@@ -537,14 +400,6 @@ module TensorStream
       end
       register_op :prod, noop: true do |context, tensor, _inputs|
-        # axis = complete_eval(tensor.inputs[1], context)
-        # # fast path
-        # if axis.nil? && !tensor.options[:keepdims]
-        #   arr = complete_eval(tensor.inputs[0], context)
-        #   next arr unless arr.is_a?(Array)
-        #   next arr.flatten.reduce(:*)
-        # end
         c = fp_type?(tensor.data_type) ? 1.0 : 1
         func = lambda do |arr|
           return c if arr.nil?
@@ -577,10 +432,6 @@ module TensorStream
         r
       end
-      register_op :tanh_grad, no_eval: true do |context, _tensor, inputs|
-        call_op(:tanh_grad, inputs[0], context, ->(t, _b) { 1 - Math.tanh(t) * Math.tanh(t) })
-      end
       register_op :transpose do |_context, tensor, inputs|
         shape = shape_eval(inputs[0])
         rank = get_rank(inputs[0])
@@ -774,28 +625,6 @@ module TensorStream
         call_vector_op(tensor, :min, inputs[0], inputs[1], context, ->(t, u) { [t, u].min })
       end
-      register_op :apply_gradient_descent do |context, tensor, inputs|
-        target_var, learning_rate, delta = inputs
-        assign = tensor.inputs[0] || tensor
-        assign.value = process_vector_math_op(tensor, target_var, delta, context, ->(t, u) { t - u * learning_rate })
-        assign.value
-      end
-      register_op :apply_momentum do |context, tensor, inputs|
-        target_var, momentum_var, learning_rate, grad, momentum = inputs
-        assign = tensor.inputs[0] || tensor
-        assign_acc = tensor.inputs[1]
-        assign_acc.value = process_vector_math_op(tensor, momentum_var, grad, context, ->(t, u) { t * momentum + u })
-        if tensor.options[:use_nesterov]
-          delta = process_vector_math_op(tensor, grad, momentum_var, context, ->(g, acc) { g * learning_rate + acc * momentum * learning_rate })
-          assign.value = process_vector_math_op(tensor,target_var, delta, context, ->(t, u) { t - u })
-        else
-          assign.value = process_vector_math_op(tensor, target_var, momentum_var, context, ->(v, acc) { v - acc * learning_rate })
-        end
-        assign.value
-      end
       register_op :broadcast_gradient_args do |_context, tensor, inputs|
         rx, ry = get_broadcast_gradient_args(inputs[0], inputs[1])
         OutputGroup.new([rx, ry], tensor.inputs.map(&:data_type))
@@ -812,7 +641,8 @@ module TensorStream
       end
       register_op :flow_group, noop: true do |context, tensor, inputs|
-        inputs.collect { |input| global_eval(tensor, input, context) }
+        inputs.each { |input| global_eval(tensor, input, context) }
+        nil # no output
       end
       register_op :softmax do |_context, _tensor, inputs|
@@ -856,83 +686,6 @@ module TensorStream
         end
       end
-      register_op :log_softmax do |_context, _tensor, inputs|
-        input_shape = shape_eval(inputs[0])
-        last_dimen_list = last_axis(inputs[0])
-        func = lambda { |logits|
-          c = logits.max
-          transformed_logits = logits.map { |l| l - c }
-          sum = transformed_logits.map { |x| Math.exp(x) }.reduce(:+)
-          transformed_logits.map { |x| x - Math.log(sum) }
-        }
-        if input_shape.size == 1
-          func.call(last_dimen_list)
-        else
-          arr = last_dimen_list.collect do |list|
-            func.call(list)
-          end
-          TensorShape.reshape(arr.flatten, input_shape)
-        end
-      end
-      register_op %i[softmax_cross_entropy_with_logits_v2 softmax_cross_entropy_with_logits] do |_context, tensor, inputs|
-        last_dimen_list = last_axis(inputs[0])
-        input_shape = shape_eval(inputs[0])
-        rank = input_shape.size - 1
-        labels = last_axis(inputs[1])
-        func = lambda { |logits, label|
-          c = logits.max
-          transformed_logits = logits.map { |l| l - c }
-          sum = transformed_logits.map { |x| Math.exp(x) }.reduce(:+)
-          losses = transformed_logits.zip(label).map { |x, y| (Math.log(sum) - x) * y }
-          probs = transformed_logits.zip(label).map  { |x, y| (Math.exp(x) / sum) - y }
-          [losses, probs]
-        }
-        if input_shape.size == 1
-          loss, prob = func.call(last_dimen_list, labels)
-          loss = reduce(loss, rank, false)
-          OutputGroup.new([loss, prob], [tensor.inputs[0].data_type, tensor.inputs[0].data_type])
-        else
-          losses = []
-          backprobs = []
-          arr = last_dimen_list.zip(labels).each do |list, label|
-            loss, prob = func.call(list, label)
-            losses << loss
-            backprobs << prob
-          end
-          reshaped_losses = TensorShape.reshape(losses.flatten, input_shape)
-          reshaped_backprops = TensorShape.reshape(backprobs.flatten, input_shape)
-          reshaped_losses = reduce(reshaped_losses, rank, false)
-          OutputGroup.new([reshaped_losses, reshaped_backprops], [tensor.inputs[0].data_type, tensor.inputs[0].data_type])
-        end
-      end
-      register_op :softmax_cross_entropy_with_logits_v2_grad do |_context, _tensor, inputs|
-        last_dimen_list = last_axis(inputs[0])
-        labels = last_axis(inputs[1])
-        passed_grads = last_axis(inputs[2])
-        input_shape = shape_eval(inputs[0])
-        func = lambda { |logits, label, grad|
-          c = logits.max
-          transformed_logits = logits.map { |l| Math.exp(l - c) }
-          e_sum = transformed_logits.reduce(:+)
-          transformed_logits.zip(label).zip(grad).map { |(x, y), g| (x / e_sum) * g - y }
-        }
-        if input_shape.size == 1
-          func.call(last_dimen_list, labels, passed_grads)
-        else
-          arr = last_dimen_list.zip(labels).zip(passed_grads).collect do |(list, label), passed_grad|
-            func.call(list, label, passed_grad)
-          end
-          TensorShape.reshape(arr.flatten, input_shape)
-        end
-      end
       register_op :check_numerics do |context, tensor, inputs|
         message = tensor.options[:message]
         f = lambda { |t, _b|