RubyGems - ask_chatgpt - Versions diffs - 0.3.1 → 0.4.0 - Mend

ask_chatgpt 0.3.1 → 0.4.0

Files changed (9) hide show

checksums.yaml +4 -4
data/README.md +68 -0
data/bin/ask_chatgpt +17 -4
data/lib/ask_chatgpt/executor.rb +9 -0
data/lib/ask_chatgpt/version.rb +1 -1
data/lib/ask_chatgpt/voice.rb +176 -0
data/lib/ask_chatgpt.rb +13 -0
data/lib/generators/ask_chatgpt/templates/template.rb +5 -0
metadata +17 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 86f36d546bd015947fcac7a2bb9366530ba0ff8a9012f21f2151ae0484ecf647
-  data.tar.gz: 136c817bc54317c77325ca0145857262d48680b8b25072e3364dcd0fe526e501
+  metadata.gz: 6a5ccd11d62b51d4422562be281edcf67254e9829f551edf2f91acd970fd091b
+  data.tar.gz: 1328f4ffedb742cce8c330de2fedf7bca4632610a5e95e0215a6dabc6dc3d50d
 SHA512:
-  metadata.gz: 7b3112557fcb6e3fa11856693e26dc37df80dedafde058f6f181d303a336f75cc4ab21f9c1daf89dd61d253eea150b05d5f664408ab5299c07210f37d8f9d87c
-  data.tar.gz: 4c5a22ca8157813e15fa5e2f639eefd89967ba9bbd300d6c0929bfe18cd3b2ff4e600197d59411520da05626c77ec9ca1786e2b6f3bfd7b51a2d5403f4528098
+  metadata.gz: 869bb402cb40bbaa8d0e60a4c0c9e3247a155a18b60d805c3943baa2b0c90d1ccf8471de4359df5ca63a6e5197231cca1cbae5398716e49ed0ecd0c14b2e45fe
+  data.tar.gz: 83c04d9f695e3772ec35351b4de6e019d2f866023421045ec8f6dc2decdaded75f3feb9c70947caf1837b3942333ab69c23b4e02ca7e9efe87c77c7b248fb8af

data/README.md CHANGED Viewed

@@ -31,6 +31,10 @@ Go to Rails console and run:
       [first_name, last_name].join
     end
   }
+  #
+  # --- NEW ---
+  #
+  gpt.speak # or with alias gpt.s
 ```
 OR with CLI tool:
@@ -46,6 +50,8 @@ aGVsbG8gd29ybGQ=
 hello world
 ```
+>ask_chatgpt -s 1 # start voice input with CLI
 See some examples below. You can also create your own prompts with just few lines of code [here](#options--configurations).
 Also you can use a CLI tool, [how to use it](#cli-tool).
@@ -113,6 +119,17 @@ And you can edit:
     # config.max_tokens       = 3000 # or nil by default
     # config.included_prompts = []
+    # enable voice input with `gpt.speak` or `gpt.s`. Note, you also need to configure `audio_device_id`
+    # config.voice_enabled = true
+    # to get audio device ID (index in the input devices)
+    # install ffmpeg, and execute from the console
+    # `ffmpeg -f avfoundation -list_devices true -i ""`
+    # config.audio_device_id = 1
+    # after "voice_max_duration" seconds it will send audio to Open AI
+    # config.voice_max_duration = 10 # 10 seconds
     # Examples of custom prompts:
     # you can use them `gpt.extract_email("some string")`
@@ -182,8 +199,55 @@ end
 or directly in console `gpt.debug!` (and finish `gpt.debug!(:off)`)
+## Voice Input
+Demo: https://youtu.be/uBR0wnQvKao
+For now I consider this as an experimental and fun feature. Look forward seeing your feedback.
+Works with command: `gpt.speak` or `gpt.s` (alias).
+This command starts recording right away and it will stop after `voice_max_duration` seconds or if you press any key.
+To exit recording mode press `Q`.
+Voice is using `ffmpeg` tool, so you need to install it. Some instruction like this will work: https://www.hostinger.com/tutorials/how-to-install-ffmpeg.
+Also, you need to configure `audio_device_id`. Run `ffmpeg -f avfoundation -list_devices true -i ""`
+It will give you list of all devices, like this:
+```s
+ffmpeg -f avfoundation -list_devices true -i ""
+ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
+  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
+  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
+  libavutil      58.  2.100 / 58.  2.100
+  libavcodec     60.  3.100 / 60.  3.100
+  libavformat    60.  3.100 / 60.  3.100
+  libavdevice    60.  1.100 / 60.  1.100
+  libavfilter     9.  3.100 /  9.  3.100
+  libswscale      7.  1.100 /  7.  1.100
+  libswresample   4. 10.100 /  4. 10.100
+  libpostproc    57.  1.100 / 57.  1.100
+[AVFoundation indev @ 0x7f7fd1a04380] AVFoundation video devices:
+[AVFoundation indev @ 0x7f7fd1a04380] [0] FaceTime HD Camera
+[AVFoundation indev @ 0x7f7fd1a04380] [1] USB Camera VID:1133 PID:2085
+[AVFoundation indev @ 0x7f7fd1a04380] [2] Capture screen 0
+[AVFoundation indev @ 0x7f7fd1a04380] [3] Capture screen 1
+[AVFoundation indev @ 0x7f7fd1a04380] AVFoundation audio devices:
+[AVFoundation indev @ 0x7f7fd1a04380] [0] Microsoft Teams Audio
+[AVFoundation indev @ 0x7f7fd1a04380] [1] Built-in Microphone
+[AVFoundation indev @ 0x7f7fd1a04380] [2] Unknown USB Audio Device
+: Input/output error
+```
+In my case I used "1", because it's `Built-in Microphone`.
 ## CLI Tool
+You can ask questions from cli or even start voice input.
 Example 1:
 ![AskChatGPT](docs/unzip.gif)
@@ -232,6 +296,10 @@ end
 - print tokens usage? `.with_usage`
 - support org_id? in the configs
 - use `gpt` in the code of the main app (e.g. model/controller)
+- when voice is used add support for payloads, e.g. `gpt.with_payload(json).speak` (and it will send payload with my question)
+- refactor voice input code :) as first version it's fine
+- can we discover audio device ID?
+- use tempfile for audio, instead of output.wav
 ## Contributing

data/bin/ask_chatgpt CHANGED Viewed

@@ -24,6 +24,7 @@ parser = OptionParser.new do |opts|
       ask_chatgpt -f app/models/user.rb -q "find a bug in this Rails model"
       ask_chatgpt -f app/models/user.rb -q "create RSpec spec for this model"
       ask_chatgpt -f test/dummy/Gemfile -q "sort Ruby gems alphabetically"
+      ask_chatgpt -s 1"
     Version: #{AskChatGPT::VERSION}
@@ -33,6 +34,10 @@ parser = OptionParser.new do |opts|
     options[:prompt] = prompt
   end
+  opts.on("-s", "--speak AudioDeviceID", String, "Specify audio device ID") do |audio_device_id|
+    options[:audio_device_id] = audio_device_id
+  end
   opts.on("-f", "--file FILE", String, "Specify file with prompt") do |file|
     options[:file_path] = file
   end
@@ -53,14 +58,22 @@ AskChatGPT.debug = !!options[:debug]
 options[:prompt] = ARGV.join(" ") if options[:prompt].blank?
-if options[:prompt].blank?
+if options[:prompt].blank? &&  options[:audio_device_id].blank?
   puts parser
   exit
 end
 include AskChatGPT::Console
-instance = gpt.ask(options[:prompt])
-instance = instance.payload(File.read(options[:file_path])) if options[:file_path].present?
+if options[:audio_device_id].present?
+  AskChatGPT.voice_enabled = true
+  AskChatGPT.voice_max_duration = 20
+  AskChatGPT.audio_device_id = options[:audio_device_id]
-puts instance.inspect
+  instance = gpt.speak
+else
+  instance = gpt.ask(options[:prompt])
+  instance = instance.payload(File.read(options[:file_path])) if options[:file_path].present?
+  puts instance.inspect
+end

data/lib/ask_chatgpt/executor.rb CHANGED Viewed

@@ -2,6 +2,7 @@ require_relative "sugar"
 require_relative "prompts/base"
 require_relative "prompts/improve"
 require_relative "default_behavior"
+require_relative "voice"
 Dir[File.join(__dir__, "prompts", "*.rb")].each do |file|
   require file
@@ -23,6 +24,14 @@ module AskChatgpt
       AskChatgpt::Executor.new(client)
     end
+    def speak
+      puts "Voice input is not enabled (docs: https://github.com/railsjazz/ask_chatgpt)" unless AskChatGPT.voice_enabled
+      puts "Audio device ID is not configured (docs: https://github.com/railsjazz/ask_chatgpt)" unless AskChatGPT.audio_device_id
+      AskChatgpt::VoiceFlow::Voice.new.run
+    end
+    alias_method :s, :speak
     def initialize(client)
       @scope   = AskChatGPT.included_prompts.dup
       @client  = client

data/lib/ask_chatgpt/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module AskChatgpt
-  VERSION = "0.3.1"
+  VERSION = "0.4.0"
 end

data/lib/ask_chatgpt/voice.rb ADDED Viewed

@@ -0,0 +1,176 @@
+module StringExt
+  refine String do
+    def black;          "\e[30m#{self}\e[0m" end
+    def red;            "\e[31m#{self}\e[0m" end
+    def green;          "\e[32m#{self}\e[0m" end
+    def brown;          "\e[33m#{self}\e[0m" end
+    def blue;           "\e[34m#{self}\e[0m" end
+    def magenta;        "\e[35m#{self}\e[0m" end
+    def cyan;           "\e[36m#{self}\e[0m" end
+    def gray;           "\e[37m#{self}\e[0m" end
+  end
+end
+using StringExt
+module AskChatgpt
+  module VoiceFlow
+    require 'io/console'
+    require 'fileutils'
+    require 'timeout'
+    require 'open3'
+    class AudioRecorder
+      OUTPUT_FILE = "output.wav"
+      def initialize(duration)
+        @duration = duration
+      end
+      # ffmpeg -f avfoundation -list_devices true -i ""
+      def audio_device_id
+        AskChatGPT.audio_device_id
+      end
+      def start
+        delete_audio_file
+        ffmpeg_command = build_ffmpeg_command
+        @stdin, @stdout_and_stderr, @wait_thread = Open3.popen2e(ffmpeg_command)
+      end
+      def stop
+        @stdin.puts 'q'
+        @stdin.close
+        @stdout_and_stderr.close
+        sleep(0.2)
+      rescue Errno::EPIPE, IOError
+      end
+      def delete_audio_file
+        FileUtils.rm(OUTPUT_FILE) if File.exist?(OUTPUT_FILE)
+      end
+      private
+      def build_ffmpeg_command
+        case RUBY_PLATFORM
+        when /darwin/
+          input_device = "-f avfoundation -i \":#{audio_device_id}\""
+        when /linux/
+          input_device = '-f alsa -i default'
+        when /mingw|mswin/
+          input_device = '-f dshow -i audio="Microphone (Realtek High Definition Audio)"'
+        else
+          raise "Unsupported platform: #{RUBY_PLATFORM}"
+        end
+        "ffmpeg -loglevel quiet #{input_device} -t #{@duration} #{OUTPUT_FILE}"
+      end
+    end
+    class Voice
+      def initialize
+        @messages = []
+        @wanna_quit = false
+        @duration = (AskChatGPT.voice_max_duration.presence || 10).to_i
+        @ffmpeg_wait_duration = 0.5
+        @executing = true
+        @spinner = nil
+      end
+      def run
+        while @executing
+          # Start the parallel process
+          audio_recorder = AudioRecorder.new(@duration)
+          audio_recorder.start
+          begin
+            Timeout.timeout(@duration + @ffmpeg_wait_duration) do
+              @spinner = TTY::Spinner.new("[Recording]".red + " / Press any key to stop recording or \"Esc\" / \"q\" to quit ... ".blue + ":spinner".red, format: :spin)
+              @spinner.auto_spin
+              sleep(@ffmpeg_wait_duration) # five some time for ffmpeg to start
+              # Listen for user input in the main process
+              begin
+                char = $stdin.getch
+                if char.ord == 27 || char.upcase == "Q"
+                  audio_recorder.stop
+                  @executing = false
+                  @spinner.stop
+                  puts "Bye...".brown
+                end
+                break
+              end while char.nil?
+            end
+          rescue Timeout::Error
+          ensure
+            audio_recorder.stop
+            @spinner.stop
+            break unless @executing
+          end
+          if !File.exist?("output.wav")
+            puts "No audio file found, please try again.".brown
+            sleep(0.5)
+            next
+          end
+          @spinner = TTY::Spinner.new("Thinking :spinner".cyan, format: :dots)
+          @spinner.auto_spin
+          response = client.transcribe(parameters: { model: "whisper-1", file: File.open("output.wav", "rb") })
+          @spinner.stop
+          if response["error"]
+            puts response["error"].inspect.brown
+            @executing = false
+            break
+          end
+          user_input = response["text"].to_s
+          puts "USER> ".green + user_input
+          @messages << { role: "user", content:user_input }
+          print "ASSISTANT> ".magenta
+          stop_stream = false
+          reply = []
+          keypresser = Thread.new do
+            loop { stop_stream = true if $stdin.getch }
+          end
+          begin
+            client.chat(
+              parameters: {
+                model: "gpt-3.5-turbo",
+                messages: @messages,
+                temperature: 0.7,
+                stream: proc do |chunk, _bytesize|
+                  break if stop_stream
+                  message = chunk.dig("choices", 0, "delta", "content")
+                  next if message.to_s.empty?
+                  message = message.gsub("\n", "\r\n")
+                  print message
+                  reply += [message]
+                end
+              })
+          rescue LocalJumpError
+            puts
+          ensure
+            Thread.kill(keypresser)
+            stop_stream = false
+            @messages << { role: "assistant", content: reply.join }
+            puts
+          end
+          audio_recorder.delete_audio_file
+        end
+      end
+      def client
+        @client ||= OpenAI::Client.new(access_token: AskChatGPT.access_token)
+      end
+    end
+  end
+end

data/lib/ask_chatgpt.rb CHANGED Viewed

@@ -49,6 +49,19 @@ module AskChatgpt
   mattr_accessor :included_prompts
   @@included_prompts = [AskChatGPT::Prompts::App.new]
+  # enable voice input, requires ffmpeg to be installed and also you need to configure audio_device_id
+  mattr_accessor :voice_enabled
+  @voice_enabled = false
+  # to get audio device ID (index in the input devices)
+  # ffmpeg -f avfoundation -list_devices true -i ""
+  mattr_accessor :audio_device_id
+  @@audio_device_id = nil
+  # max duration of audio to record
+  mattr_accessor :voice_max_duration
+  @@voice_max_duration = 10 # 10 seconds
   def self.setup
     yield(self)
   end

data/lib/generators/ask_chatgpt/templates/template.rb CHANGED Viewed

@@ -10,6 +10,11 @@ AskChatGPT.setup do |config|
   # config.temperature      = 0.1
   # config.included_prompts = []
+  # enable voice input, requires ffmpeg to be installed and also you need to configure audio_device_id
+  # config.voice_enabled      = true
+  # config.audio_device_id    = 1
+  # config.voice_max_duration = 10 # 10 seconds
   # Examples of custom prompts:
   # you can use them `gpt.ask(:extract_email, "some string")`

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ask_chatgpt
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.4.0
 platform: ruby
 authors:
 - Igor Kasyanchuk
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-05-01 00:00:00.000000000 Z
+date: 2023-05-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rails
@@ -95,6 +95,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: 1.4.3
+- !ruby/object:Gem::Dependency
+  name: io-console
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: wrapped_print
   requirement: !ruby/object:Gem::Requirement
@@ -156,6 +170,7 @@ files:
 - lib/ask_chatgpt/railtie.rb
 - lib/ask_chatgpt/sugar.rb
 - lib/ask_chatgpt/version.rb
+- lib/ask_chatgpt/voice.rb
 - lib/generators/ask_chatgpt/USAGE
 - lib/generators/ask_chatgpt/ask_chatgpt_generator.rb
 - lib/generators/ask_chatgpt/templates/template.rb