RubyGems - aia - Versions diffs - 0.5.15 → 0.5.16 - Mend

aia 0.5.15 → 0.5.16

Files changed (12) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2d027907d70cb497a761f25ad65c2e9429f312cc9694466bf9a8db612a1ead0a
-  data.tar.gz: b76efb7bd589a685e9380d969db85f3df88c36fe888fd03e44819d26e339a353
+  metadata.gz: 0e8b6a3c91dad9236a014bbe130c6f359ba3121a8e731398304f6d6305158138
+  data.tar.gz: ce833e093f76d57388296361371484e1ce75c48442f4199895135f24ab3a8289
 SHA512:
-  metadata.gz: b83635891018c810bf7c794a66bb7c0842e28d9152ceb444f5105468928b419575ce7f3f88e70528ef8dc87b46ab2246525e43726f9480a2f2ce8be00a850270
-  data.tar.gz: 979137d859737b3dec4f264d1e6c6611131b5fbbcb4682ec1dd5843a69c147754b0937414f9c891cc1e8b0a09d95129af83ba1432bc9b01e8465ff7562245ca9
+  metadata.gz: bfd04950aeb63e7d35f1063d8264fcdbd5e66d09ddf0ca9776caacfe978c4340a00b3e26ccadd728f0b83d9278eedc391b72c942e357a95acd6e39f6aab93f4d
+  data.tar.gz: 7e4f2a9698906c61d84eb4d19f9d059867579e922da77e1a7ddb55fe1dc50a6784b1f7e831ea6c3daf68a3e5078c21ec61514dafee29b135bc7ebf9c55a585c8

data/.semver CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 :major: 0
 :minor: 5
-:patch: 15
+:patch: 16
 :special: ''
 :metadata: ''

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,15 @@
 ## [Unreleased]
+## [0.5.16] 2024-04-02
+- fixed prompt pipelines
+- added //next and //pipeline directives as shortcuts to //config [next,pipeline]
+- Added new backend "client" as an internal OpenAI client
+- Added --sm, --speech_model default: tts-1
+- Added --tm, --transcription_model default: whisper-1
+- Added --voice default: alloy (if "siri" and Mac? then uses cli tool "say")
+- Added --image_size and --image_quality (--is --iq)
 ## [0.5.15] 2024-03-30
 - Added the ability to accept piped in text to be appeded to the end of the prompt text: curl $URL | aia ad_hoc
 - Fixed bugs with entering directives as follow-up prompts during a chat session

data/README.md CHANGED Viewed

@@ -6,15 +6,16 @@ It leverages the `prompt_manager` gem to manage prompts for the `mods` and `sgpt
 **Most Recent Change**: Refer to the [Changelog](CHANGELOG.md)
+> v0.5.16
+> - fixed bugs with the prompt pipeline
+> - Added new backend "client" which is an `aia` internal client to the OpenAI API that allows both text-to-speech and speech-to-text
+> - Added --image_size and --image_quality to support image generation with the dall-e-2 and dall-e-3 models using the new internal `aia` OpenAI client.
+>
 > v0.5.15
 > - Support piped content by appending to end of prompt text
 > - fixed bugs with directives entered as follow-up while in chat mode
 >
-> v0.5.14
-> - Directly access OpenAI to do text to speech when using the `--speak` option
-> - Added --voice to specify which voice to use
-> - Added --speech_model to specify which TTS model to use
->
 <!-- Tocer[start]: Auto-generated, don't remove. -->
@@ -43,6 +44,7 @@ It leverages the `prompt_manager` gem to manage prompts for the `mods` and `sgpt
     - [--next](#--next)
     - [--pipeline](#--pipeline)
     - [Best Practices ??](#best-practices-)
+    - [Example pipline](#example-pipline)
   - [All About ROLES](#all-about-roles)
     - [The --roles_dir (AIA_ROLES_DIR)](#the---roles_dir-aia_roles_dir)
     - [The --role Option](#the---role-option)
@@ -344,6 +346,8 @@ three.txt contains //config next four
 ```
 BUT if you have more than two prompts in your sequence then consider using the --pipeline option.
+**The directive //next is short for //config next**
 ### --pipeline
 `aia one --pipeline two,three,four`
@@ -352,6 +356,8 @@ or inside of the `one.txt` prompt file use this directive:
 `//config pipeline two,three,four`
+**The directive //pipeline is short for //config pipeline**
 ### Best Practices ??
 Since the response of one prompt is fed into the next prompt within the sequence instead of having all prompts write their response to the same out file, use these directives inside the associated prompt files:
@@ -366,6 +372,47 @@ Since the response of one prompt is fed into the next prompt within the sequence
 This way you can see the response that was generated for each prompt in the sequence.
+### Example pipline
+TODO: the audio-to-text is still under development.
+Suppose you have an audio file of a meeting.  You what to get a transcription of what was said in that meeting.  Sometimes raw transcriptions hide the real value of the recording so you have crafted a pompt that takes the raw transcriptions and does a technical summary with a list of action items.
+Create two prompts named transcribe.txt and tech_summary.txt
+```
+# transcribe.txt
+# Desc: takes one audio file
+# note that there is no "prompt" text only the directive
+//config backend  client
+//config model    whisper-1
+//next            tech_summary
+```
+and
+```
+# tech_summary.txt
+//config model    gpt-4-turbo
+//config out_file meeting_summary.md
+Review the raw transcript of a technical meeting,
+summarize the discussion and
+note any action items that were generated.
+Format your response in markdown.
+```
+Now you can do this:
+```
+aia transcribe my_tech_meeting.m4a
+```
+You summary of the meeting is in the file `meeting_summary.md`
 ## All About ROLES
 ### The --roles_dir (AIA_ROLES_DIR)

data/lib/aia/cli.rb CHANGED Viewed

@@ -155,9 +155,11 @@ class AIA::Cli
       extra:      [''],   #
       #
       model:      ["gpt-4-1106-preview",  "--llm --model"],
-      speech_model: ["tts-1", "--sm --spech_model"],
+      speech_model: ["tts-1", "--sm --speech_model"],
       voice:        ["alloy", "--voice"],
       #
+      transcription_model:  ["wisper-1", "--tm --transcription_model"],
+      #
       dump_file:  [nil,       "--dump"],
       completion: [nil,       "--completion"],
       #
@@ -186,6 +188,11 @@ class AIA::Cli
       log_file:   ["~/.prompts/_prompts.log", "-l --log_file --no-log_file"],
       #
       backend:    ['mods',    "-b --be --backend --no-backend"],
+      #
+      # text2image related ...
+      #
+      image_size:     ['', '--is --image_size'],
+      image_quality:  ['', '--iq --image_quality'],
     }
     AIA.config = AIA::Config.new(@options.transform_values { |values| values.first })

data/lib/aia/directives.rb CHANGED Viewed

@@ -70,6 +70,8 @@ class AIA::Directives
             Pathname.new(value) :
             Pathname.pwd + value
         end
+      elsif %w[next pipeline].include? item.downcase
+        pipeline(value)
       else
         AIA.config[item] = value
       end
@@ -79,6 +81,33 @@ class AIA::Directives
   end
+  # TODO: we need a way to submit CLI arguments into
+  #       the next prompt(s) from the main prompt.
+  #       currently the config for subsequent prompts
+  #       is expected to be set within those prompts.
+  #       Maybe something like:
+  #         //next prompt_id CLI args
+  #       This would mean that the pipeline would be:
+  #         //pipeline id1 cli args, id2 cli args, id3 cli args
+  #
+  # TODO: Change AIA.config.pipline Array to be an Array of arrays
+  #       where each entry is:
+  #         [prompt_id, cli_args]
+  #       This means that:
+  #         entry = AIA.config.pipeline.shift
+  #         entry.is_A?(Sring) ? 'old format' : 'new format'
+  #
+  # //next id
+  # //pipeline id1,id2, id3   ,   id4
+  def pipeline(what)
+    return if what.empty?
+    AIA.config.pipeline << what.split(',').map(&:strip)
+    AIA.config.pipeline.flatten!
+  end
+  alias_method :next, :pipeline
   # when path_to_file is relative it will be
   # relative to the PWD.
   #

data/lib/aia/main.rb CHANGED Viewed

@@ -33,6 +33,8 @@ class AIA::Main
     @directive_output = ""
     AIA::Tools.load_tools
+    AIA.client = AIA::Client.new
     AIA::Cli.new(args)
     if AIA.config.debug?
@@ -115,6 +117,8 @@ class AIA::Main
     result = get_and_display_result(the_prompt)
+    AIA.speak(result) if AIA.config.speak?
     logger.prompt_result(@prompt, result)
     if AIA.config.chat?
@@ -125,14 +129,34 @@ class AIA::Main
     return if AIA.config.next.empty? && AIA.config.pipeline.empty?
-    # Reset some config items to defaults
+    keep_going(result) unless AIA.config.pipeline.empty?
+  end
+  # The AIA.config.pipeline is NOT empty, so feed this result
+  # into the next prompt within the pipeline.
+  #
+  def keep_going(result)
+    temp_file = Tempfile.new('aia_pipeline')
+    temp_file.write(result)
+    temp_file.close
     AIA.config.directives = []
-    AIA.config.next       = AIA.config.pipeline.shift
-    AIA.config.arguments  = [AIA.config.next, AIA.config.out_file.to_s]
+    AIA.config.model      = ""
+    AIA.config.arguments  = [
+                              AIA.config.pipeline.shift,
+                              temp_file.path,
+                              # TODO: additional arguments from the pipeline
+                            ]
     AIA.config.next       = ""
+    AIA.config.files = [temp_file.path]
     @prompt = AIA::Prompt.new.prompt
-    call # Recurse!
+    call # Recurse! until the AIA.config.pipeline is emplty
+    puts
+  ensure
+    temp_file.unlink
   end

data/lib/aia/tools/client.rb ADDED Viewed

@@ -0,0 +1,197 @@
+# lib/aia/tools/client.rb
+require_relative 'backend_common'
+OpenAI.configure do |config|
+  config.access_token = ENV.fetch("OPENAI_ACCESS_TOKEN")
+end
+class AIA::Client < AIA::Tools
+  include AIA::BackendCommon
+  meta(
+    name:     'client',
+    role:     :backend,
+    desc:     'Ruby implementation of the OpenAI API',
+    url:      'https://github.com/alexrudall/ruby-openai',
+    install:  'gem install ruby-openai',
+  )
+  attr_reader :client, :raw_response
+  DEFAULT_PARAMETERS  = ''
+  DIRECTIVES          = []
+  def initialize(text: "", files: [])
+    super
+    @client     = OpenAI::Client.new
+  end
+  def build_command
+    # No-Op
+  end
+  def run
+    handle_model(AIA.config.model)
+  rescue => e
+    puts "Error handling model #{AIA.config.model}: #{e.message}"
+  end
+  def speak(what = @text)
+    print "Speaking ... " if AIA.verbose?
+    text2audio(what)
+    puts "Done."          if AIA.verbose?
+  end
+  ###########################################################
+  private
+  # Handling different models more abstractly
+  def handle_model(model_name)
+    case model_name
+    when /vision/
+      image2text
+    when /^gpt.*$/, /^babbage.*$/, /^davinci.*$/
+      text2text
+    when /^dall-e.*$/
+      text2image
+    when /^tts.*$/
+      text2audio
+    when /^whisper.*$/
+      audio2text
+    else
+      raise "Unsupported model: #{model_name}"
+    end
+  end
+  def image2text
+    # TODO: Implement
+  end
+  def text2text
+    @raw_response = client.chat(
+      parameters: {
+          model:        AIA.config.model, # Required.
+          messages:     [{ role: "user", content: text}], # Required.
+          temperature:  AIA.config.temp,
+      }
+    )
+    response = raw_response.dig('choices', 0, 'message', 'content')
+    response
+  end
+  def text2image
+    parameters = {
+      model:    AIA.config.model,
+      prompt:   text
+    }
+    parameters[:size]     = AIA.config.image_size     unless AIA.config.image_size.empty?
+    parameters[:quality]  = AIA.config.image_quality  unless AIA.config.image_quality.empty?
+    raw_response  = client.images.generate(parameters:)
+    response = raw_response.dig("data", 0, "url")
+    response
+  end
+  def text2audio(what = @text, save: false, play: true)
+    raise "OpenAI's text to speech capability is not available" unless client
+    player = select_audio_player
+    response = client.audio.speech(
+      parameters: {
+        model: AIA.config.speech_model,
+        input: what,
+        voice: AIA.config.voice
+      }
+    )
+    handle_audio_response(response, player, save, play)
+  end
+  def audio2text(path_to_audio_file = @files.first)
+    response = client.audio.transcribe(
+      parameters: {
+        model: AIA.config.model,
+        file: File.open(path_to_audio_file, "rb")
+      }
+    )
+    response["text"]
+  rescue => e
+    "An error occurred: #{e.message}"
+  end
+  # Helper methods
+  def select_audio_player
+    case OS.host_os
+    when /mac|darwin/
+      'afplay'
+    when /linux/
+      'mpg123'
+    when /mswin|mingw|cygwin/
+      'cmdmp3'
+    else
+      raise "No MP3 player available"
+    end
+  end
+  def handle_audio_response(response, player, save, play)
+    Tempfile.create(['speech', '.mp3']) do |f|
+      f.binmode
+      f.write(response)
+      f.close
+      `cp #{f.path} #{Pathname.pwd + "speech.mp3"}` if save
+      `#{player} #{f.path}` if play
+    end
+  end
+  ###########################################################
+  public
+  class << self
+    def list_models
+      new.client.model.list
+    end
+    def speak(what)
+      save_model = AIA.config.model
+      AIA.config.model = AIA.config.speech_model
+      new(text: what).speak
+      AIA.config.model = save_model
+    end
+  end
+end
+__END__
+##########################################################

data/lib/aia.rb CHANGED Viewed

@@ -49,12 +49,6 @@ module AIA
     attr_accessor :client
     def run(args=ARGV)
-      begin
-        @client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
-      rescue OpenAI::ConfigurationError
-        @client = nil
-      end
       args = args.split(' ') if args.is_a?(String)
       # TODO: Currently this is a one and done architecture.
@@ -72,43 +66,13 @@ module AIA
       if OS.osx? && 'siri' == config.voice.downcase
         system "say #{Shellwords.escape(what)}"
       else
-        use_openai_tts(what)
+        Client.speak(what)
       end
     end
-    def use_openai_tts(what)
-      if client.nil?
-        puts "\nWARNING: OpenAI's text to speech capability is not available at this time."
-        return
-      end
-      player = if OS.osx?
-                  'afplay'
-                elsif OS.linux?
-                  'mpg123'
-                elsif OS.windows?
-                  'cmdmp3'
-                else
-                  puts "\nWARNING: There is no MP3 player available"
-                  return
-                end
-      response = client.audio.speech(
-        parameters: {
-          model: config.speech_model,
-          input: what,
-          voice: config.voice
-        }
-      )
-      Tempfile.create(['speech', '.mp3']) do |f|
-        f.binmode
-        f.write(response)
-        f.close
-        `#{player} #{f.path}`
-      end
-    end
+    def verbose?  = AIA.config.verbose?
+    def debug?    = AIA.config.debug?
   end
 end

data/man/aia.1 CHANGED Viewed

@@ -1,6 +1,6 @@
 .\" Generated by kramdown-man 1.0.1
 .\" https://github.com/postmodern/kramdown-man#readme
-.TH aia 1 "v0.5.14" AIA "User Manuals"
+.TH aia 1 "v0.5.16" AIA "User Manuals"
 .SH NAME
 .PP
 aia \- command\-line interface for an AI assistant
@@ -39,6 +39,12 @@ This option tells \fBaia\fR to replace references to system environment variable
 \fB\-\-erb\fR
 If dynamic prompt content using \[Do](\.\.\.) wasn\[cq]t enough here is ERB\.  Embedded RUby\.  <%\[eq] ruby code %> within a prompt will have its ruby code executed and the results of that execution will be inserted into the prompt\.  I\[cq]m sure we will find a way to truly misuse this capability\.  Remember, some say that the simple prompt is the best prompt\.
 .TP
+\fB\-\-iq\fR, \fB\-\-image\[ru]quality\fR \fIVALUE\fP
+(Used with backend \[oq]client\[cq] only) See the OpenAI docs for valid values (depends on model) \- default: \[oq]\[cq]
+.TP
+\fB\-\-is\fR, \fB\-\-image\[ru]size\fR \fIVALUE\fP
+(Used with backend \[oq]client\[cq] only) See the OpenAI docs for valid values (depends on model) \- default: \[oq]\[cq]
+.TP
 \fB\-\-model\fR \fINAME\fP
 Name of the LLM model to use \- default is gpt\-4\-1106\-preview
 .TP
@@ -48,9 +54,18 @@ Render markdown to the terminal using the external tool \[lq]glow\[rq] \- defaul
 \fB\-\-speak\fR
 Simple implementation\. Uses the \[lq]say\[rq] command to speak the response\.  Fun with \-\-chat
 .TP
+\fB\-\-sm\fR, \fB\-\-speech\[ru]model\fR \fIMODEL NAME\fP
+Which OpenAI LLM to use for text\-to\-speech (TTS) \- default: tts\-1
+.TP
+\fB\-\-voice\fR \fIVOICE NAME\fP
+Which voice to use when speaking text\.  If its \[lq]siri\[rq] and the platform is a Mac, then the CLI utility \[lq]say\[rq] is used\.  Any other name will be used with OpenAI \- default: alloy
+.TP
 \fB\-\-terse\fR
 Add a clause to the prompt text that instructs the backend to be terse in its response\.
 .TP
+\fB\-\-tm\fR, \fB\-\-transcription\[ru]model\fR \fIMODEL NAME\fP
+Which OpenAI LLM to use for audio\-to\-text \- default: whisper\-1
+.TP
 \fB\-\-version\fR
 Print Version \- default is false
 .TP
@@ -175,6 +190,28 @@ or just
 \fB\[sl]\[sl]config next three\fR
 .PP
 if you want to specify them one at a time\.
+.PP
+You can also use the shortcuts \fB\[sl]\[sl]next\fR and \fB\[sl]\[sl]pipeline\fR
+.PP
+.PP
+.RS 4
+.EX
+\[sl]\[sl]next two
+\[sl]\[sl]next three
+\[sl]\[sl]next four
+\[sl]\[sl]next five
+.EE
+.RE
+.PP
+Is the same thing as
+.PP
+.PP
+.RS 4
+.EX
+\[sl]\[sl]pipeline two,three,four
+\[sl]\[sl]next five
+.EE
+.RE
 .SH SEE ALSO
 .RS
 .IP \(bu 2
@@ -221,6 +258,13 @@ glow
 .UE
  Render markdown on the CLI
 .RE
+.SH Image Generation
+.PP
+The \-\-backend \[lq]client\[rq] is the only back end that supports image generation using the \fBdall\-e\-2\fR and \fBdall\-e\-3\fR models through OpenAI\.  The result of your prompt will be a URL that points to the OpenAI storage space where your image is placed\.
+.PP
+Use \-\-image\[ru]size and \-\-image\[ru]quality to specified the desired size and quality of the generated image\.  The valid values are available at the OpenAI website\.
+.PP
+https:\[sl]\[sl]platform\.openai\.com\[sl]docs\[sl]guides\[sl]images\[sl]usage?context\[eq]node
 .SH AUTHOR
 .PP
 Dewayne VanHoozer

data/man/aia.1.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# aia 1 "v0.5.14" AIA "User Manuals"
+# aia 1 "v0.5.16" AIA "User Manuals"
 ## NAME
@@ -43,6 +43,12 @@ The aia command-line tool is an interface for interacting with an AI model backe
 `--erb`
 : If dynamic prompt content using $(...) wasn't enough here is ERB.  Embedded RUby.  <%= ruby code %> within a prompt will have its ruby code executed and the results of that execution will be inserted into the prompt.  I'm sure we will find a way to truly misuse this capability.  Remember, some say that the simple prompt is the best prompt.
+`--iq`, `--image_quality` *VALUE*
+: (Used with backend 'client' only) See the OpenAI docs for valid values (depends on model) - default: ''
+`--is`, `--image_size` *VALUE*
+: (Used with backend 'client' only) See the OpenAI docs for valid values (depends on model) - default: ''
 `--model` *NAME*
 : Name of the LLM model to use - default is gpt-4-1106-preview
@@ -52,9 +58,18 @@ The aia command-line tool is an interface for interacting with an AI model backe
 `--speak`
 : Simple implementation. Uses the "say" command to speak the response.  Fun with --chat
+`--sm`, `--speech_model` *MODEL NAME*
+: Which OpenAI LLM to use for text-to-speech (TTS) - default: tts-1
+`--voice` *VOICE NAME*
+: Which voice to use when speaking text.  If its "siri" and the platform is a Mac, then the CLI utility "say" is used.  Any other name will be used with OpenAI - default: alloy
 `--terse`
 : Add a clause to the prompt text that instructs the backend to be terse in its response.
+`--tm`, `--transcription_model` *MODEL NAME*
+: Which OpenAI LLM to use for audio-to-text - default: whisper-1
 `--version`
 : Print Version - default is false
@@ -176,6 +191,21 @@ or just
 if you want to specify them one at a time.
+You can also use the shortcuts `//next` and `//pipeline`
+```
+//next two
+//next three
+//next four
+//next five
+```
+Is the same thing as
+```
+//pipeline two,three,four
+//next five
+```
 ## SEE ALSO
@@ -193,6 +223,13 @@ if you want to specify them one at a time.
 - [glow](https://github.com/charmbracelet/glow) Render markdown on the CLI
+## Image Generation
+The --backend "client" is the only back end that supports image generation using the `dall-e-2` and `dall-e-3` models through OpenAI.  The result of your prompt will be a URL that points to the OpenAI storage space where your image is placed.
+Use --image_size and --image_quality to specified the desired size and quality of the generated image.  The valid values are available at the OpenAI website.
+https://platform.openai.com/docs/guides/images/usage?context=node
 ## AUTHOR

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: aia
 version: !ruby/object:Gem::Version
-  version: 0.5.15
+  version: 0.5.16
 platform: ruby
 authors:
 - Dewayne VanHoozer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-03-30 00:00:00.000000000 Z
+date: 2024-04-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: hashie
@@ -278,6 +278,7 @@ files:
 - lib/aia/prompt.rb
 - lib/aia/tools.rb
 - lib/aia/tools/backend_common.rb
+- lib/aia/tools/client.rb
 - lib/aia/tools/editor.rb
 - lib/aia/tools/fzf.rb
 - lib/aia/tools/glow.rb