RubyGems - llm.rb - Versions diffs - 1.0.1 → 2.0.0 - Mend

llm.rb 1.0.1 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
data/README.md +120 -179
data/lib/llm/bot.rb +105 -78
data/lib/llm/buffer.rb +18 -84
data/lib/llm/builder.rb +61 -0
data/lib/llm/function.rb +5 -1
data/lib/llm/providers/anthropic/format/completion_format.rb +39 -23
data/lib/llm/providers/deepseek/format/completion_format.rb +13 -2
data/lib/llm/providers/gemini/audio.rb +2 -2
data/lib/llm/providers/gemini/format/completion_format.rb +26 -15
data/lib/llm/providers/gemini/images.rb +1 -1
data/lib/llm/providers/gemini/stream_parser.rb +46 -25
data/lib/llm/providers/ollama/format/completion_format.rb +33 -14
data/lib/llm/providers/openai/format/completion_format.rb +47 -27
data/lib/llm/providers/openai/format/respond_format.rb +22 -7
data/lib/llm/schema/object.rb +23 -2
data/lib/llm/tool/param.rb +75 -0
data/lib/llm/tool.rb +5 -2
data/lib/llm/version.rb +1 -1
data/lib/llm.rb +1 -0
metadata +3 -5
data/lib/llm/bot/builder.rb +0 -31
data/lib/llm/bot/conversable.rb +0 -37
data/lib/llm/bot/prompt/completion.rb +0 -49
data/lib/llm/bot/prompt/respond.rb +0 -49

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 207a44401195a654a57ebf8050d211fc2c1722420a647dedcc447031aead1451
-  data.tar.gz: aa8abf7d5104d0a93033a43041057d1af96f8f45dde7f924243788cd0b14621e
+  metadata.gz: 51146c557539ccc945c508cb84f06a6cbbffc1157814a357bddc038ae46aba92
+  data.tar.gz: 7c178e4cacf30c0b9799421edaa3d1e380d83c8edb43d3d376f9c3798b6a3976
 SHA512:
-  metadata.gz: 8faf81ef91911ecbcd81694232f6e314caed8dab2ca8b30a241d8e346c3a4a084f0d46bb9e5e94f191a3dc7676c1c865f7062d440498d94e2e0b30a57a5a3510
-  data.tar.gz: 3ba71c5c46b5ebbec12d136f24cff08d0da11ea807b81db1319c31b1dae18cb6786ad8ed759f3333d7e2f0f1ee3c69ee7ae85cf95e6d82b3e5552f0d516a279e
+  metadata.gz: b70b4797379674007036ec14512817097292d65d640f104aa8b9c0040b4bd5734d414dfa3f2ef2d9e990639862e1cb214dabf3462d4c8d600746d758579c5db5
+  data.tar.gz: c294544c718c35617b66f3fd5e667b8b66861cab8144809662ded2d5793ef2eaa2e856c09f7166a7cc6b385a6facb3167a9295d90ae6676b75d4688edc70724a

data/README.md CHANGED Viewed

@@ -1,48 +1,13 @@
-> **⚠️ Maintenance Mode ⚠️** <br>
-> Please note that the primary author of llm.rb is pivoting away from
-> Ruby and towards [Golang](https://golang.org) for future projects.
-> Although llm.rb will be maintained for the foreseeable future it is not
-> where my primary interests are anymore. Thanks for understanding.
 ## About
 llm.rb is a zero-dependency Ruby toolkit for Large Language Models that
-includes OpenAI, Gemini, Anthropic, xAI (Grok), [zAI](https://z.ai), DeepSeek,
-Ollama, and LlamaCpp. The toolkit includes full support for chat, streaming,
-tool calling, audio, images, files, and structured outputs (JSON Schema).
+includes OpenAI, Gemini, Anthropic, xAI (Grok), zAI, DeepSeek, Ollama,
+and LlamaCpp. The toolkit includes full support for chat, streaming,
+tool calling, audio, images, files, and structured outputs.
 ## Quick start
-#### Demo
-This cool demo writes a new [llm-shell](https://github.com/llmrb/llm-shell#readme) command
-with the help of [llm.rb](https://github.com/llmrb/llm#readme). <br> Similar-ish to
-GitHub Copilot but for the terminal.
-<details>
-  <summary>Start demo</summary>
-  <img src="https://github.com/llmrb/llm/blob/main/share/llm-shell/examples/demo.gif?raw=true" alt="llm-shell demo" />
-</details>
-#### Guides
-* [An introduction to RAG](https://0x1eef.github.io/posts/an-introduction-to-rag-with-llm.rb/) &ndash;
-  a blog post that implements the RAG pattern
-* [How to estimate the age of a person in a photo](https://0x1eef.github.io/posts/age-estimation-with-llm.rb/) &ndash;
-  a blog post that implements an age estimation tool
-* [How to edit an image with Gemini](https://0x1eef.github.io/posts/how-to-edit-images-with-gemini/) &ndash;
-  a blog post that implements image editing with Gemini
-* [Fast sailing with persistent connections](https://0x1eef.github.io/posts/persistent-connections-with-llm.rb/) &ndash;
-  a blog post that optimizes performance with a thread-safe connection pool
-* [How to build agents (with llm.rb)](https://0x1eef.github.io/posts/how-to-build-agents-with-llm.rb/) &ndash;
-  a blog post that implements agentic behavior via tools
-#### Ecosystem
-* [llm-shell](https://github.com/llmrb/llm-shell) &ndash; a developer-oriented console for Large Language Model communication
-* [llm-spell](https://github.com/llmrb/llm-spell) &ndash; a utility that can correct spelling mistakes with a Large Language Model
-#### Show code
+#### REPL
 A simple chatbot that maintains a conversation and streams
 responses in real-time:
@@ -55,18 +20,62 @@ llm = LLM.openai(key: ENV["KEY"])
 bot = LLM::Bot.new(llm, stream: $stdout)
 loop do
   print "> "
-  input = $stdin.gets&.chomp || break
-  bot.chat(input).flush
+  bot.chat($stdin.gets)
   print "\n"
 end
 ```
+#### Build
+We can send multiple messages at once by building a chain of messages:
+```ruby
+#!/usr/bin/env ruby
+require "llm"
+llm = LLM.openai(key: ENV["KEY"])
+bot = LLM::Bot.new(llm)
+prompt = bot.build_prompt do
+  it.system "Your task is to answer all user queries"
+  it.user "What language should I learn next ?"
+end
+bot.chat(prompt)
+bot.messages.each { print "[#{it.role}] ", it.content, "\n" }
+```
+#### Images
+We can generate an image on the fly and estimate how old the person
+in the image is:
+```ruby
+#!/usr/bin/env ruby
+require "llm"
+llm = LLM.openai(key: ENV["OPENAI_SECRET"])
+schema = llm.schema.object(
+  age: llm.schema.integer.required.description("The age of the person in a photo"),
+  confidence: llm.schema.number.required.description("Model confidence (0.0 to 1.0)"),
+  notes: llm.schema.string.required.description("Model notes or caveats")
+)
+img = llm.images.create(prompt: "A man in his 30s")
+bot = LLM::Bot.new(llm, schema:)
+res = bot.chat bot.image_url(img.urls[0])
+body = res.choices.find(&:assistant?).content!
+print "age: ", body["age"], "\n"
+print "confidence: ", body["confidence"], "\n"
+print "notes: ", body["notes"], "\n"
+```
 ## Features
 #### General
 - ✅ A single unified interface for multiple providers
 - 📦 Zero dependencies outside Ruby's standard library
-- 🚀 Smart API design that minimizes the number of requests made
+- 🚀 Simple, composable API
 - ♻️ Optional: per-provider, process-wide connection pool via net-http-persistent
 #### Chat, Agents
@@ -136,6 +145,7 @@ llm = LLM.openai(key: "yourapikey")
 llm = LLM.gemini(key: "yourapikey")
 llm = LLM.anthropic(key: "yourapikey")
 llm = LLM.xai(key: "yourapikey")
+llm = LLM.zai(key: "yourapikey")
 llm = LLM.deepseek(key: "yourapikey")
 ##
@@ -179,18 +189,13 @@ ensure thread-safety.
 #### Completions
-> This example uses the stateless chat completions API that all
-> providers support. A similar example for OpenAI's stateful
-> responses API is available in the [docs/](https://0x1eef.github.io/x/llm.rb/file.OPENAI.html#responses)
-> directory.
 The following example creates an instance of
 [LLM::Bot](https://0x1eef.github.io/x/llm.rb/LLM/Bot.html)
-and enters into a conversation where messages are buffered and
-sent to the provider on-demand. The implementation is designed to
-buffer messages by waiting until an attempt to iterate over
-[LLM::Bot#messages](https://0x1eef.github.io/x/llm.rb/LLM/Bot.html#messages-instance_method)
-is made before sending a request to the LLM:
+and enters into a conversation where each call to "bot.chat" immediately
+sends a request to the provider, updates the conversation history, and
+returns an [LLM::Response](https://0x1eef.github.io/x/llm.rb/LLM/Response.html).
+The full conversation history is automatically included in
+each subsequent request:
 ```ruby
 #!/usr/bin/env ruby
@@ -198,45 +203,42 @@ require "llm"
 llm  = LLM.openai(key: ENV["KEY"])
 bot  = LLM::Bot.new(llm)
-url  = "https://en.wikipedia.org/wiki/Special:FilePath/Cognac_glass.jpg"
+url  = "https://upload.wikimedia.org/wikipedia/commons/c/c7/Lisc_lipy.jpg"
-bot.chat "Your task is to answer all user queries", role: :system
-bot.chat ["Tell me about this URL", URI(url)], role: :user
-bot.chat ["Tell me about this PDF", File.open("handbook.pdf", "rb")], role: :user
-bot.chat "Are the URL and PDF similar to each other?", role: :user
+prompt = bot.build_prompt do
+  it.system "Your task is to answer all user queries"
+  it.user ["Tell me about this URL", bot.image_url(url)]
+  it.user ["Tell me about this PDF", bot.local_file("handbook.pdf")]
+end
-# At this point, we execute a single request
-bot.messages.each { print "[#{_1.role}] ", _1.content, "\n" }
+bot.chat(prompt)
+bot.messages.each { print "[#{it.role}] ", it.content, "\n" }
 ```
 #### Streaming
-> There Is More Than One Way To Do It (TIMTOWTDI) when you are
-> using llm.rb &ndash; and this is especially true when it
-> comes to streaming. See the streaming documentation in
-> [docs/](https://0x1eef.github.io/x/llm.rb/file.STREAMING.html#scopes)
-> for more details.
 The following example streams the messages in a conversation
 as they are generated in real-time. The `stream` option can
-be set to an IO object, or the value `true` to enable streaming
-&ndash; and at the end of the request, `bot.chat` returns the
-same response as the non-streaming version which allows you
-to process a response in the same way:
+be set to an IO object, or the value `true` to enable streaming.
+When streaming, the `bot.chat` method will block until the entire
+stream is received. At the end, it returns the `LLM::Response` object
+containing the full aggregated content:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-bot = LLM::Bot.new(llm)
-url = "https://en.wikipedia.org/wiki/Special:FilePath/Cognac_glass.jpg"
-bot.chat(stream: $stdout) do |prompt|
-  prompt.system "Your task is to answer all user queries"
-  prompt.user ["Tell me about this URL", URI(url)]
-  prompt.user ["Tell me about this PDF", File.open("handbook.pdf", "rb")]
-  prompt.user "Are the URL and PDF similar to each other?"
-end.flush
+bot = LLM::Bot.new(llm, stream: $stdout)
+url = "https://upload.wikimedia.org/wikipedia/commons/c/c7/Lisc_lipy.jpg"
+prompt = bot.build_prompt do
+  it.system "Your task is to answer all user queries"
+  it.user ["Tell me about this URL", bot.image_url(url)]
+  it.user ["Tell me about the PDF", bot.local_file("handbook.pdf")]
+end
+bot.chat(prompt)
 ```
 ### Schema
@@ -252,31 +254,28 @@ an LLM should emit, and the LLM will abide by the schema:
 #!/usr/bin/env ruby
 require "llm"
+llm = LLM.openai(key: ENV["KEY"])
 ##
 # Objects
-llm = LLM.openai(key: ENV["KEY"])
 schema = llm.schema.object(probability: llm.schema.number.required)
 bot = LLM::Bot.new(llm, schema:)
 bot.chat "Does the earth orbit the sun?", role: :user
-bot.messages.find(&:assistant?).content! # => {probability: 1.0}
+puts bot.messages.find(&:assistant?).content! # => {probability: 1.0}
 ##
 # Enums
 schema = llm.schema.object(fruit: llm.schema.string.enum("Apple", "Orange", "Pineapple"))
-bot = LLM::Bot.new(llm, schema:)
-bot.chat "Your favorite fruit is Pineapple", role: :system
+bot = LLM::Bot.new(llm, schema:) :system
 bot.chat "What fruit is your favorite?", role: :user
-bot.messages.find(&:assistant?).content! # => {fruit: "Pineapple"}
+puts bot.messages.find(&:assistant?).content! # => {fruit: "Pineapple"}
 ##
 # Arrays
 schema = llm.schema.object(answers: llm.schema.array(llm.schema.integer.required))
 bot = LLM::Bot.new(llm, schema:)
-bot.chat "Answer all of my questions", role: :system
-bot.chat "Tell me the answer to ((5 + 5) / 2)", role: :user
-bot.chat "Tell me the answer to ((5 + 5) / 2) * 2", role: :user
 bot.chat "Tell me the answer to ((5 + 5) / 2) * 2 + 1", role: :user
-bot.messages.find(&:assistant?).content! # => {answers: [5, 10, 11]}
+puts bot.messages.find(&:assistant?).content! # => {answers: [11]}
 ```
 ### Tools
@@ -300,11 +299,9 @@ its surrounding scope, which can be useful in some situations.
 The
 [LLM::Bot#functions](https://0x1eef.github.io/x/llm.rb/LLM/Bot.html#functions-instance_method)
-method returns an array of functions that can be called after sending a message and
-it will only be populated if the LLM detects a function should be called. Each function
-corresponds to an element in the "tools" array. The array is emptied after a function call,
-and potentially repopulated on the next message:
+method returns an array of functions that can be called after a `chat` interaction
+if the LLM detects a function should be called. You would then typically call these
+functions and send their results back to the LLM in a subsequent `chat` call:
 ```ruby
 #!/usr/bin/env ruby
@@ -360,7 +357,7 @@ require "llm"
 class System < LLM::Tool
   name "system"
   description "Run a shell command"
-  params { |schema| schema.object(command: schema.string.required) }
+  param :command, String, "The command to execute", required: true
   def call(command:)
     ro, wo = IO.pipe
@@ -371,6 +368,7 @@ class System < LLM::Tool
   end
 end
+llm = LLM.openai(key: ENV["KEY"])
 bot = LLM::Bot.new(llm, tools: [System])
 bot.chat "Your task is to run shell commands via a tool.", role: :system
@@ -385,46 +383,6 @@ bot.chat bot.functions.map(&:call) # report return value to the LLM
 # {stderr: "", stdout: "FreeBSD"}
 ```
-#### Server Tools
-The
-[LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-and
-[LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html)
-classes can define a local function or tool that can be called by
-a provider on your behalf, and the
-[LLM::ServerTool](https://0x1eef.github.io/x/llm.rb/LLM/ServerTool.html)
-class represents a tool that is defined and implemented by a provider, and we can
-request that the provider call the tool on our behalf. That's the primary difference
-between a function implemented locally and a tool implemented by a provider. The
-available tools depend on the provider, and the following example uses the
-OpenAI provider to execute Python code on OpenAI's servers:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.responses.create "Run: 'print(\"hello world\")'",
-                            tools: [llm.server_tool(:code_interpreter)]
-print res.output_text, "\n"
-```
-#### Web Search
-A common tool among all providers is the ability to perform a web search, and
-the following example uses the OpenAI provider to search the web using the
-Web Search tool. This can also be done with the Anthropic and Gemini providers:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.web_search(query: "summarize today's news")
-print res.output_text, "\n"
-```
 ### Files
 #### Create
@@ -442,24 +400,32 @@ require "llm"
 llm = LLM.openai(key: ENV["KEY"])
 bot = LLM::Bot.new(llm)
-file = llm.files.create(file: "/books/goodread.pdf")
-bot.chat ["Tell me about this file", file]
-bot.messages.select(&:assistant?).each { print "[#{_1.role}] ", _1.content, "\n" }
+file = llm.files.create(file: "/book.pdf")
+res = bot.chat ["Tell me about this file", file]
+res.choices.each { print "[#{it.role}] ", it.content, "\n" }
 ```
 ### Prompts
 #### Multimodal
-It is generally a given that an LLM will understand text but they can also
-understand and generate other types of media as well: audio, images, video,
-and even URLs. The object given as a prompt in llm.rb can be a string to
-represent text, a URI object to represent a URL, an LLM::Response object
-to represent a file stored with the LLM, and so on. These are objects you
-can throw at the prompt and have them be understood automatically.
+While LLMs inherently understand text, they can also process and
+generate other types of media such as audio, images, video, and
+even URLs. To provide these multimodal inputs to the LLM, llm.rb
+uses explicit tagging methods on the `LLM::Bot` instance.
+These methods wrap your input into a special `LLM::Object`,
+clearly indicating its type and intent to the underlying LLM
+provider.
+For instance, to specify an image URL, you would use
+`bot.image_url`. For a local file, `bot.local_file`. For an
+already uploaded file managed by the LLM provider's Files API,
+`bot.remote_file`. This approach ensures clarity and allows
+llm.rb to correctly format the input for each provider's
+specific requirements.
-A prompt can also have multiple parts, and in that case, an array is given
-as a prompt. Each element is considered to be part of the prompt:
+An array can be used for a prompt with multiple parts, where each
+element contributes to the overall input:
 ```ruby
 #!/usr/bin/env ruby
@@ -467,16 +433,17 @@ require "llm"
 llm = LLM.openai(key: ENV["KEY"])
 bot = LLM::Bot.new(llm)
+url = "https://upload.wikimedia.org/wikipedia/commons/c/c7/Lisc_lipy.jpg"
-bot.chat ["Tell me about this URL", URI("https://example.com/path/to/image.png")]
-[bot.messages.find(&:assistant?)].each { print "[#{_1.role}] ", _1.content, "\n" }
+res1 = bot.chat ["Tell me about this URL", bot.image_url(url)]
+res1.choices.each { print "[#{it.role}] ", it.content, "\n" }
-file = llm.files.create(file: "/books/goodread.pdf")
-bot.chat ["Tell me about this PDF", file]
-[bot.messages.find(&:assistant?)].each { print "[#{_1.role}] ", _1.content, "\n" }
+file = llm.files.create(file: "/book.pdf")
+res2 = bot.chat ["Tell me about this PDF", bot.remote_file(file)]
+res2.choices.each { print "[#{it.role}] ", it.content, "\n" }
-bot.chat ["Tell me about this image", File.open("/images/nemothefish.png", "r")]
-[bot.messages.find(&:assistant?)].each { print "[#{_1.role}] ", _1.content, "\n" }
+res3 = bot.chat ["Tell me about this image", bot.local_file("/puffy.png")]
+res3.choices.each { print "[#{it.role}] ", it.content, "\n" }
 ```
 ### Audio
@@ -662,36 +629,10 @@ end
 # Select a model
 model = llm.models.all.find { |m| m.id == "gpt-3.5-turbo" }
 bot = LLM::Bot.new(llm, model: model.id)
-bot.chat "Hello #{model.id} :)"
-bot.messages.select(&:assistant?).each { print "[#{_1.role}] ", _1.content, "\n" }
+res = bot.chat "Hello #{model.id} :)"
+res.choices.each { print "[#{it.role}] ", it.content, "\n" }
 ```
-## Reviews
-I supplied both Gemini and DeepSeek with the contents of [lib/](https://github.com/llmrb/llm/tree/main/lib)
-and [README.md](https://github.com/llmrb/llm#readme) via [llm-shell](https://github.com/llmrb/llm-shell#readme).
-Their feedback was way more positive than I could have imagined 😅 These are genuine responses though, with no
-special prompting or engineering. I just provided them with the source code and asked for their opinion.
-<details>
-  <summary>Review by Gemini</summary>
-  <img src="https://github.com/llmrb/llm/blob/main/share/llm-shell/examples/gemini.png?raw=true" alt="Gemini review" />
-</details>
-<details>
-  <summary>Review by DeepSeek</summary>
-  <img src="https://github.com/llmrb/llm/blob/main/share/llm-shell/examples/deepseek.png?raw=true" alt="DeepSeek review" />
-</details>
-## Documentation
-### API
-The README tries to provide a high-level overview of the library. For everything
-else there's the API reference. It covers classes and methods that the README glances
-over or doesn't cover at all. The API reference is available at
-[0x1eef.github.io/x/llm.rb](https://0x1eef.github.io/x/llm.rb).
 ## Install
 llm.rb can be installed via rubygems.org:
@@ -702,4 +643,4 @@ llm.rb can be installed via rubygems.org:
 [BSD Zero Clause](https://choosealicense.com/licenses/0bsd/)
 <br>
-See [LICENSE](./LICENSE)
+See [LICENSE](./LICENSE)