langchainrb 0.7.2 → 0.7.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +10 -0
- data/README.md +158 -320
- data/lib/langchain/agent/agents.md +54 -0
- data/lib/langchain/llm/aws_bedrock.rb +216 -0
- data/lib/langchain/llm/openai.rb +26 -9
- data/lib/langchain/llm/response/aws_titan_response.rb +17 -0
- data/lib/langchain/llm/response/base_response.rb +3 -0
- data/lib/langchain/utils/token_length/ai21_validator.rb +1 -0
- data/lib/langchain/utils/token_length/base_validator.rb +6 -2
- data/lib/langchain/utils/token_length/cohere_validator.rb +1 -0
- data/lib/langchain/utils/token_length/google_palm_validator.rb +1 -0
- data/lib/langchain/utils/token_length/openai_validator.rb +20 -0
- data/lib/langchain/vectorsearch/chroma.rb +3 -1
- data/lib/langchain/vectorsearch/milvus.rb +3 -1
- data/lib/langchain/vectorsearch/pgvector.rb +3 -1
- data/lib/langchain/vectorsearch/pinecone.rb +3 -1
- data/lib/langchain/vectorsearch/qdrant.rb +3 -1
- data/lib/langchain/vectorsearch/weaviate.rb +4 -2
- data/lib/langchain/version.rb +1 -1
- metadata +19 -5
- data/lib/langchain/evals/ragas/critique.rb +0 -62
- data/lib/langchain/evals/ragas/prompts/critique.yml +0 -18
- data/lib/langchain/loader_chunkers/html.rb +0 -27
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f4c388275b83a0e4260f4ae9271f4c164a8d34ea5ea9585916d91e7e9c17c980
|
4
|
+
data.tar.gz: 8daa400de3ed80bb3fb9c53cc19ef4d56f137c2aa157bd268dbda488d0fca432
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4bae87c050be6a8fa011c1ae5de4b119abac498669f2e63ca1829e11b7b5ecca7610330be670d24fd6cb98c2e2599c593e9922378985efc586d76c124efb865e
|
7
|
+
data.tar.gz: 2a39b084c6a239aeb0de22bfc87629d2f2909b23eabfcf71a835a1f1624d84afe3ea106afdafb8f1fb301b7934d73abc7253c9b8bd3f6c9b170231ebb5af0936
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,15 @@
|
|
1
1
|
## [Unreleased]
|
2
2
|
|
3
|
+
## [0.7.5] - 2023-11-13
|
4
|
+
- Fixes
|
5
|
+
|
6
|
+
## [0.7.4] - 2023-11-10
|
7
|
+
- AWS Bedrock is available as an LLM provider. Available models from AI21, Cohere, AWS, and Anthropic.
|
8
|
+
|
9
|
+
## [0.7.3] - 2023-11-08
|
10
|
+
- LLM response passes through the context in RAG cases
|
11
|
+
- Fix gpt-4 token length validation
|
12
|
+
|
3
13
|
## [0.7.2] - 2023-11-02
|
4
14
|
- Azure OpenAI LLM support
|
5
15
|
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
💎🔗 Langchain.rb
|
2
2
|
---
|
3
|
-
⚡ Building applications
|
3
|
+
⚡ Building LLM-powered applications in Ruby ⚡
|
4
4
|
|
5
5
|
For deep Rails integration see: [langchainrb_rails](https://github.com/andreibondarev/langchainrb_rails) gem.
|
6
6
|
|
@@ -11,21 +11,24 @@ Available for paid consulting engagements! [Email me](mailto:andrei@sourcelabs.i
|
|
11
11
|
[![Docs](http://img.shields.io/badge/yard-docs-blue.svg)](http://rubydoc.info/gems/langchainrb)
|
12
12
|
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/andreibondarev/langchainrb/blob/main/LICENSE.txt)
|
13
13
|
[![](https://dcbadge.vercel.app/api/server/WDARp7J2n8?compact=true&style=flat)](https://discord.gg/WDARp7J2n8)
|
14
|
+
[![X](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40rushing_andrei)](https://twitter.com/rushing_andrei)
|
14
15
|
|
15
|
-
|
16
|
+
## Use Cases
|
17
|
+
* Retrieval Augmented Generation (RAG) and vector search
|
18
|
+
* Chat bots
|
19
|
+
* [AI agents](https://github.com/andreibondarev/langchainrb/tree/main/lib/langchain/agent/agents.md)
|
16
20
|
|
17
|
-
##
|
21
|
+
## Table of Contents
|
18
22
|
|
19
23
|
- [Installation](#installation)
|
20
24
|
- [Usage](#usage)
|
21
|
-
- [
|
22
|
-
- [
|
23
|
-
- [
|
24
|
-
- [
|
25
|
-
- [
|
26
|
-
- [Loaders](#loaders-)
|
27
|
-
- [Examples](#examples)
|
25
|
+
- [Large Language Models (LLMs)](#large-language-models-llms)
|
26
|
+
- [Prompt Management](#prompt-management)
|
27
|
+
- [Output Parsers](#output-parsers)
|
28
|
+
- [Building RAG](#building-retrieval-augment-generation-rag-system)
|
29
|
+
- [Building chat bots](#building-chat-bots)
|
28
30
|
- [Evaluations](#evaluations-evals)
|
31
|
+
- [Examples](#examples)
|
29
32
|
- [Logging](#logging)
|
30
33
|
- [Development](#development)
|
31
34
|
- [Discord](#discord)
|
@@ -46,264 +49,66 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|
46
49
|
require "langchain"
|
47
50
|
```
|
48
51
|
|
49
|
-
|
50
|
-
|
51
|
-
| Database | Querying | Storage | Schema Management | Backups | Rails Integration |
|
52
|
-
| -------- |:------------------:| -------:| -----------------:| -------:| -----------------:|
|
53
|
-
| [Chroma](https://trychroma.com/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
54
|
-
| [Hnswlib](https://github.com/nmslib/hnswlib/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | WIP |
|
55
|
-
| [Milvus](https://milvus.io/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
56
|
-
| [Pinecone](https://www.pinecone.io/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
57
|
-
| [Pgvector](https://github.com/pgvector/pgvector) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
58
|
-
| [Qdrant](https://qdrant.tech/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
59
|
-
| [Weaviate](https://weaviate.io/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | WIP | :white_check_mark: |
|
60
|
-
|
61
|
-
### Using Vector Search Databases 🔍
|
62
|
-
|
63
|
-
Choose the LLM provider you'll be using (OpenAI or Cohere) and retrieve the API key.
|
64
|
-
|
65
|
-
Add `gem "weaviate-ruby", "~> 0.8.3"` to your Gemfile.
|
66
|
-
|
67
|
-
Pick the vector search database you'll be using and instantiate the client:
|
68
|
-
```ruby
|
69
|
-
client = Langchain::Vectorsearch::Weaviate.new(
|
70
|
-
url: ENV["WEAVIATE_URL"],
|
71
|
-
api_key: ENV["WEAVIATE_API_KEY"],
|
72
|
-
index_name: "",
|
73
|
-
llm: Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
|
74
|
-
)
|
75
|
-
|
76
|
-
# You can instantiate any other supported vector search database:
|
77
|
-
client = Langchain::Vectorsearch::Chroma.new(...) # `gem "chroma-db", "~> 0.6.0"`
|
78
|
-
client = Langchain::Vectorsearch::Hnswlib.new(...) # `gem "hnswlib", "~> 0.8.1"`
|
79
|
-
client = Langchain::Vectorsearch::Milvus.new(...) # `gem "milvus", "~> 0.9.2"`
|
80
|
-
client = Langchain::Vectorsearch::Pinecone.new(...) # `gem "pinecone", "~> 0.1.6"`
|
81
|
-
client = Langchain::Vectorsearch::Pgvector.new(...) # `gem "pgvector", "~> 0.2"`
|
82
|
-
client = Langchain::Vectorsearch::Qdrant.new(...) # `gem"qdrant-ruby", "~> 0.9.3"`
|
83
|
-
```
|
84
|
-
|
85
|
-
```ruby
|
86
|
-
# Creating the default schema
|
87
|
-
client.create_default_schema
|
88
|
-
```
|
89
|
-
|
90
|
-
```ruby
|
91
|
-
# Store plain texts in your vector search database
|
92
|
-
client.add_texts(
|
93
|
-
texts: [
|
94
|
-
"Begin by preheating your oven to 375°F (190°C). Prepare four boneless, skinless chicken breasts by cutting a pocket into the side of each breast, being careful not to cut all the way through. Season the chicken with salt and pepper to taste. In a large skillet, melt 2 tablespoons of unsalted butter over medium heat. Add 1 small diced onion and 2 minced garlic cloves, and cook until softened, about 3-4 minutes. Add 8 ounces of fresh spinach and cook until wilted, about 3 minutes. Remove the skillet from heat and let the mixture cool slightly.",
|
95
|
-
"In a bowl, combine the spinach mixture with 4 ounces of softened cream cheese, 1/4 cup of grated Parmesan cheese, 1/4 cup of shredded mozzarella cheese, and 1/4 teaspoon of red pepper flakes. Mix until well combined. Stuff each chicken breast pocket with an equal amount of the spinach mixture. Seal the pocket with a toothpick if necessary. In the same skillet, heat 1 tablespoon of olive oil over medium-high heat. Add the stuffed chicken breasts and sear on each side for 3-4 minutes, or until golden brown."
|
96
|
-
]
|
97
|
-
)
|
98
|
-
```
|
99
|
-
```ruby
|
100
|
-
# Store the contents of your files in your vector search database
|
101
|
-
my_pdf = Langchain.root.join("path/to/my.pdf")
|
102
|
-
my_text = Langchain.root.join("path/to/my.txt")
|
103
|
-
my_docx = Langchain.root.join("path/to/my.docx")
|
104
|
-
|
105
|
-
client.add_data(paths: [my_pdf, my_text, my_docx])
|
106
|
-
```
|
107
|
-
```ruby
|
108
|
-
# Retrieve similar documents based on the query string passed in
|
109
|
-
client.similarity_search(
|
110
|
-
query:,
|
111
|
-
k: # number of results to be retrieved
|
112
|
-
)
|
113
|
-
```
|
114
|
-
```ruby
|
115
|
-
# Retrieve similar documents based on the query string passed in via the [HyDE technique](https://arxiv.org/abs/2212.10496)
|
116
|
-
client.similarity_search_with_hyde()
|
117
|
-
```
|
118
|
-
```ruby
|
119
|
-
# Retrieve similar documents based on the embedding passed in
|
120
|
-
client.similarity_search_by_vector(
|
121
|
-
embedding:,
|
122
|
-
k: # number of results to be retrieved
|
123
|
-
)
|
124
|
-
```
|
125
|
-
```ruby
|
126
|
-
# Q&A-style querying based on the question passed in
|
127
|
-
client.ask(
|
128
|
-
question:
|
129
|
-
)
|
130
|
-
```
|
131
|
-
|
132
|
-
## Integrating Vector Search into ActiveRecord models
|
133
|
-
```ruby
|
134
|
-
class Product < ActiveRecord::Base
|
135
|
-
vectorsearch provider: Langchain::Vectorsearch::Qdrant.new(
|
136
|
-
api_key: ENV["QDRANT_API_KEY"],
|
137
|
-
url: ENV["QDRANT_URL"],
|
138
|
-
index_name: "Products",
|
139
|
-
llm: Langchain::LLM::GooglePalm.new(api_key: ENV["GOOGLE_PALM_API_KEY"])
|
140
|
-
)
|
52
|
+
## Large Language Models (LLMs)
|
53
|
+
Langchain.rb wraps all supported LLMs in a unified interface allowing you to easily swap out and test out different models.
|
141
54
|
|
142
|
-
|
143
|
-
|
144
|
-
|
55
|
+
#### Supported LLMs and features:
|
56
|
+
| LLM providers | embed() | complete() | chat() | summarize() | Notes |
|
57
|
+
| -------- |:------------------:| :-------: | :-----------------: | :-------: | :----------------- |
|
58
|
+
| [OpenAI](https://openai.com/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | ❌ | Including Azure OpenAI |
|
59
|
+
| [AI21](https://ai21.com/) | ❌ | :white_check_mark: | ❌ | :white_check_mark: | |
|
60
|
+
| [Anthropic](https://milvus.io/) | ❌ | :white_check_mark: | ❌ | ❌ | |
|
61
|
+
| [AWS Bedrock](https://aws.amazon.com/bedrock) | :white_check_mark: | :white_check_mark: | ❌ | ❌ | Provides AWS, Cohere, AI21, Antropic and Stability AI models |
|
62
|
+
| [Cohere](https://www.pinecone.io/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |
|
63
|
+
| [GooglePalm](https://ai.google/discover/palm2/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |
|
64
|
+
| [HuggingFace](https://huggingface.co/) | :white_check_mark: | ❌ | ❌ | ❌ | |
|
65
|
+
| [Ollama](https://ollama.ai/) | :white_check_mark: | :white_check_mark: | ❌ | ❌ | |
|
66
|
+
| [Replicate](https://replicate.com/) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |
|
145
67
|
|
146
|
-
|
147
|
-
```ruby
|
148
|
-
# Retrieve similar products based on the query string passed in
|
149
|
-
Product.similarity_search(
|
150
|
-
query:,
|
151
|
-
k: # number of results to be retrieved
|
152
|
-
)
|
153
|
-
```
|
154
|
-
```ruby
|
155
|
-
# Q&A-style querying based on the question passed in
|
156
|
-
Product.ask(
|
157
|
-
question:
|
158
|
-
)
|
159
|
-
```
|
160
|
-
|
161
|
-
Additional info [here](https://github.com/andreibondarev/langchainrb/blob/main/lib/langchain/active_record/hooks.rb#L10-L38).
|
162
|
-
|
163
|
-
### Using Standalone LLMs 🗣️
|
164
|
-
|
165
|
-
Add `gem "ruby-openai", "~> 4.0.0"` to your Gemfile.
|
68
|
+
#### Using standalone LLMs:
|
166
69
|
|
167
70
|
#### OpenAI
|
168
|
-
```ruby
|
169
|
-
openai = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
|
170
|
-
```
|
171
|
-
You can pass additional parameters to the constructor, it will be passed to the OpenAI client:
|
172
|
-
```ruby
|
173
|
-
openai = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"], llm_options: {uri_base: "http://localhost:1234"}) )
|
174
|
-
```
|
175
|
-
```ruby
|
176
|
-
openai.embed(text: "foo bar")
|
177
|
-
```
|
178
|
-
```ruby
|
179
|
-
openai.complete(prompt: "What is the meaning of life?")
|
180
|
-
```
|
181
|
-
|
182
|
-
##### Open AI Function calls support
|
183
|
-
|
184
|
-
Conversation support
|
185
|
-
|
186
|
-
```ruby
|
187
|
-
chat = Langchain::Conversation.new(llm: openai)
|
188
|
-
```
|
189
|
-
```ruby
|
190
|
-
chat.set_context("You are the climate bot")
|
191
|
-
chat.set_functions(functions)
|
192
|
-
```
|
193
71
|
|
194
|
-
qdrant:
|
195
|
-
|
196
|
-
```ruby
|
197
|
-
client.llm.functions = functions
|
198
|
-
```
|
199
|
-
|
200
|
-
#### Azure
|
201
72
|
Add `gem "ruby-openai", "~> 5.2.0"` to your Gemfile.
|
202
73
|
|
203
74
|
```ruby
|
204
|
-
|
205
|
-
api_key: ENV["AZURE_API_KEY"],
|
206
|
-
llm_options: {
|
207
|
-
api_type: :azure,
|
208
|
-
api_version: "2023-03-15-preview"
|
209
|
-
},
|
210
|
-
embedding_deployment_url: ENV.fetch("AZURE_EMBEDDING_URI"),
|
211
|
-
chat_deployment_url: ENV.fetch("AZURE_CHAT_URI")
|
212
|
-
)
|
213
|
-
```
|
214
|
-
where `AZURE_EMBEDDING_URI` is e.g. `https://custom-domain.openai.azure.com/openai/deployments/gpt-35-turbo` and `AZURE_CHAT_URI` is e.g. `https://custom-domain.openai.azure.com/openai/deployments/ada-2`
|
215
|
-
|
216
|
-
You can pass additional parameters to the constructor, it will be passed to the Azure client:
|
217
|
-
```ruby
|
218
|
-
azure = Langchain::LLM::Azure.new(
|
219
|
-
api_key: ENV["AZURE_API_KEY"],
|
220
|
-
llm_options: {
|
221
|
-
api_type: :azure,
|
222
|
-
api_version: "2023-03-15-preview",
|
223
|
-
request_timeout: 240 # Optional
|
224
|
-
},
|
225
|
-
embedding_deployment_url: ENV.fetch("AZURE_EMBEDDING_URI"),
|
226
|
-
chat_deployment_url: ENV.fetch("AZURE_CHAT_URI")
|
227
|
-
)
|
228
|
-
```
|
229
|
-
```ruby
|
230
|
-
azure.embed(text: "foo bar")
|
231
|
-
```
|
232
|
-
```ruby
|
233
|
-
azure.complete(prompt: "What is the meaning of life?")
|
234
|
-
```
|
235
|
-
|
236
|
-
#### Cohere
|
237
|
-
Add `gem "cohere-ruby", "~> 0.9.6"` to your Gemfile.
|
238
|
-
|
239
|
-
```ruby
|
240
|
-
cohere = Langchain::LLM::Cohere.new(api_key: ENV["COHERE_API_KEY"])
|
241
|
-
```
|
242
|
-
```ruby
|
243
|
-
cohere.embed(text: "foo bar")
|
244
|
-
```
|
245
|
-
```ruby
|
246
|
-
cohere.complete(prompt: "What is the meaning of life?")
|
247
|
-
```
|
248
|
-
|
249
|
-
#### HuggingFace
|
250
|
-
Add `gem "hugging-face", "~> 0.3.2"` to your Gemfile.
|
251
|
-
```ruby
|
252
|
-
hugging_face = Langchain::LLM::HuggingFace.new(api_key: ENV["HUGGING_FACE_API_KEY"])
|
253
|
-
```
|
254
|
-
|
255
|
-
#### Replicate
|
256
|
-
Add `gem "replicate-ruby", "~> 0.2.2"` to your Gemfile.
|
257
|
-
```ruby
|
258
|
-
replicate = Langchain::LLM::Replicate.new(api_key: ENV["REPLICATE_API_KEY"])
|
75
|
+
llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
|
259
76
|
```
|
260
|
-
|
261
|
-
#### Google PaLM (Pathways Language Model)
|
262
|
-
Add `"google_palm_api", "~> 0.1.3"` to your Gemfile.
|
77
|
+
You can pass additional parameters to the constructor, it will be passed to the OpenAI client:
|
263
78
|
```ruby
|
264
|
-
|
79
|
+
llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"], llm_options: { ... })
|
265
80
|
```
|
266
81
|
|
267
|
-
|
268
|
-
Add `gem "ai21", "~> 0.2.1"` to your Gemfile.
|
82
|
+
Generate vector embeddings:
|
269
83
|
```ruby
|
270
|
-
|
84
|
+
llm.embed(text: "foo bar")
|
271
85
|
```
|
272
86
|
|
273
|
-
|
274
|
-
Add `gem "anthropic", "~> 0.1.0"` to your Gemfile.
|
87
|
+
Generate a text completion:
|
275
88
|
```ruby
|
276
|
-
|
89
|
+
llm.complete(prompt: "What is the meaning of life?")
|
277
90
|
```
|
278
91
|
|
92
|
+
Generate a chat completion:
|
279
93
|
```ruby
|
280
|
-
|
94
|
+
llm.chat(prompt: "Hey! How are you?")
|
281
95
|
```
|
282
96
|
|
283
|
-
|
97
|
+
Summarize the text:
|
284
98
|
```ruby
|
285
|
-
|
99
|
+
llm.complete(text: "...")
|
286
100
|
```
|
287
101
|
|
102
|
+
You can use any other LLM by invoking the same interface:
|
288
103
|
```ruby
|
289
|
-
|
290
|
-
```
|
291
|
-
```ruby
|
292
|
-
ollama.embed(text: "Hello world!")
|
104
|
+
llm = Langchain::LLM::GooglePalm.new(...)
|
293
105
|
```
|
294
106
|
|
295
|
-
###
|
107
|
+
### Prompt Management
|
296
108
|
|
297
109
|
#### Prompt Templates
|
298
110
|
|
299
|
-
Create a prompt with
|
300
|
-
|
301
|
-
```ruby
|
302
|
-
prompt = Langchain::Prompt::PromptTemplate.new(template: "Tell me a {adjective} joke.", input_variables: ["adjective"])
|
303
|
-
prompt.format(adjective: "funny") # "Tell me a funny joke."
|
304
|
-
```
|
305
|
-
|
306
|
-
Create a prompt with multiple input variables:
|
111
|
+
Create a prompt with input variables:
|
307
112
|
|
308
113
|
```ruby
|
309
114
|
prompt = Langchain::Prompt::PromptTemplate.new(template: "Tell me a {adjective} joke about {content}.", input_variables: ["adjective", "content"])
|
@@ -384,7 +189,8 @@ prompt = Langchain::Prompt.load_from_path(file_path: "spec/fixtures/prompt/promp
|
|
384
189
|
prompt.input_variables #=> ["adjective", "content"]
|
385
190
|
```
|
386
191
|
|
387
|
-
|
192
|
+
|
193
|
+
### Output Parsers
|
388
194
|
|
389
195
|
Parse LLM text responses into structured output, such as JSON.
|
390
196
|
|
@@ -484,93 +290,147 @@ fix_parser.parse(llm_response)
|
|
484
290
|
|
485
291
|
See [here](https://github.com/andreibondarev/langchainrb/tree/main/examples/create_and_manage_prompt_templates_using_structured_output_parser.rb) for a concrete example
|
486
292
|
|
487
|
-
|
488
|
-
|
293
|
+
## Building Retrieval Augment Generation (RAG) system
|
294
|
+
RAG is a methodology that assists LLMs generate accurate and up-to-date information.
|
295
|
+
A typical RAG workflow follows the 3 steps below:
|
296
|
+
1. Relevant knowledge (or data) is retrieved from the knowledge base (typically a vector search DB)
|
297
|
+
2. A prompt, containing retrieved knowledge above, is constructed.
|
298
|
+
3. LLM receives the prompt above to generate a text completion.
|
299
|
+
Most common use-case for a RAG system is powering Q&A systems where users pose natural language questions and receive answers in natural language.
|
489
300
|
|
490
|
-
|
301
|
+
### Vector search databases
|
302
|
+
Langchain.rb provides a convenient unified interface on top of supported vectorsearch databases that make it easy to configure your index, add data, query and retrieve from it.
|
491
303
|
|
492
|
-
|
304
|
+
#### Supported vector search databases and features:
|
493
305
|
|
494
|
-
|
495
|
-
|
496
|
-
|
306
|
+
| Database | Open-source | Cloud offering |
|
307
|
+
| -------- |:------------------:| :------------: |
|
308
|
+
| [Chroma](https://trychroma.com/) | :white_check_mark: | :white_check_mark: |
|
309
|
+
| [Hnswlib](https://github.com/nmslib/hnswlib/) | :white_check_mark: | ❌ |
|
310
|
+
| [Milvus](https://milvus.io/) | :white_check_mark: | :white_check_mark: Zilliz Cloud |
|
311
|
+
| [Pinecone](https://www.pinecone.io/) | ❌ | :white_check_mark: |
|
312
|
+
| [Pgvector](https://github.com/pgvector/pgvector) | :white_check_mark: | :white_check_mark: |
|
313
|
+
| [Qdrant](https://qdrant.tech/) | :white_check_mark: | :white_check_mark: |
|
314
|
+
| [Weaviate](https://weaviate.io/) | :white_check_mark: | :white_check_mark: |
|
497
315
|
|
498
|
-
|
316
|
+
### Using Vector Search Databases 🔍
|
499
317
|
|
500
|
-
|
501
|
-
llm: openai,
|
502
|
-
tools: [search_tool, calculator]
|
503
|
-
)
|
504
|
-
```
|
318
|
+
Pick the vector search database you'll be using, add the gem dependency and instantiate the client:
|
505
319
|
```ruby
|
506
|
-
|
507
|
-
#=> "Approximately 2,945 soccer fields would be needed to cover the distance between NYC and DC in a straight line."
|
320
|
+
gem "weaviate-ruby", "~> 0.8.9"
|
508
321
|
```
|
509
322
|
|
510
|
-
|
323
|
+
Choose and instantiate the LLM provider you'll be using to generate embeddings
|
324
|
+
```ruby
|
325
|
+
llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
|
326
|
+
```
|
511
327
|
|
512
|
-
|
328
|
+
```ruby
|
329
|
+
client = Langchain::Vectorsearch::Weaviate.new(
|
330
|
+
url: ENV["WEAVIATE_URL"],
|
331
|
+
api_key: ENV["WEAVIATE_API_KEY"],
|
332
|
+
index_name: "Documents",
|
333
|
+
llm: llm
|
334
|
+
)
|
335
|
+
```
|
513
336
|
|
337
|
+
You can instantiate any other supported vector search database:
|
514
338
|
```ruby
|
515
|
-
|
339
|
+
client = Langchain::Vectorsearch::Chroma.new(...) # `gem "chroma-db", "~> 0.6.0"`
|
340
|
+
client = Langchain::Vectorsearch::Hnswlib.new(...) # `gem "hnswlib", "~> 0.8.1"`
|
341
|
+
client = Langchain::Vectorsearch::Milvus.new(...) # `gem "milvus", "~> 0.9.2"`
|
342
|
+
client = Langchain::Vectorsearch::Pinecone.new(...) # `gem "pinecone", "~> 0.1.6"`
|
343
|
+
client = Langchain::Vectorsearch::Pgvector.new(...) # `gem "pgvector", "~> 0.2"`
|
344
|
+
client = Langchain::Vectorsearch::Qdrant.new(...) # `gem"qdrant-ruby", "~> 0.9.3"`
|
345
|
+
```
|
516
346
|
|
517
|
-
|
347
|
+
Create the default schema:
|
348
|
+
```ruby
|
349
|
+
client.create_default_schema
|
518
350
|
```
|
351
|
+
|
352
|
+
Add plain text data to your vector search database:
|
519
353
|
```ruby
|
520
|
-
|
521
|
-
|
354
|
+
client.add_texts(
|
355
|
+
texts: [
|
356
|
+
"Begin by preheating your oven to 375°F (190°C). Prepare four boneless, skinless chicken breasts by cutting a pocket into the side of each breast, being careful not to cut all the way through. Season the chicken with salt and pepper to taste. In a large skillet, melt 2 tablespoons of unsalted butter over medium heat. Add 1 small diced onion and 2 minced garlic cloves, and cook until softened, about 3-4 minutes. Add 8 ounces of fresh spinach and cook until wilted, about 3 minutes. Remove the skillet from heat and let the mixture cool slightly.",
|
357
|
+
"In a bowl, combine the spinach mixture with 4 ounces of softened cream cheese, 1/4 cup of grated Parmesan cheese, 1/4 cup of shredded mozzarella cheese, and 1/4 teaspoon of red pepper flakes. Mix until well combined. Stuff each chicken breast pocket with an equal amount of the spinach mixture. Seal the pocket with a toothpick if necessary. In the same skillet, heat 1 tablespoon of olive oil over medium-high heat. Add the stuffed chicken breasts and sear on each side for 3-4 minutes, or until golden brown."
|
358
|
+
]
|
359
|
+
)
|
522
360
|
```
|
523
361
|
|
524
|
-
|
525
|
-
|
362
|
+
Or use the file parsers to load, parse and index data into your database:
|
363
|
+
```ruby
|
364
|
+
my_pdf = Langchain.root.join("path/to/my.pdf")
|
365
|
+
my_text = Langchain.root.join("path/to/my.txt")
|
366
|
+
my_docx = Langchain.root.join("path/to/my.docx")
|
526
367
|
|
527
|
-
|
368
|
+
client.add_data(paths: [my_pdf, my_text, my_docx])
|
369
|
+
```
|
370
|
+
Supported file formats: docx, html, pdf, text, json, jsonl, csv, xlsx.
|
528
371
|
|
529
|
-
|
372
|
+
Retrieve similar documents based on the query string passed in:
|
373
|
+
```ruby
|
374
|
+
client.similarity_search(
|
375
|
+
query:,
|
376
|
+
k: # number of results to be retrieved
|
377
|
+
)
|
378
|
+
```
|
530
379
|
|
531
|
-
|
532
|
-
|
533
|
-
|
534
|
-
|
535
|
-
| "ruby_code_interpreter" | Interprets Ruby expressions | | `gem "safe_ruby", "~> 1.0.4"` |
|
536
|
-
| "google_search" | A wrapper around Google Search | `ENV["SERPAPI_API_KEY"]` (https://serpapi.com/manage-api-key) | `gem "google_search_results", "~> 2.0.0"` |
|
537
|
-
| "weather" | Calls Open Weather API to retrieve the current weather | `ENV["OPEN_WEATHER_API_KEY"]` (https://home.openweathermap.org/api_keys) | `gem "open-weather-ruby-client", "~> 0.3.0"` |
|
538
|
-
| "wikipedia" | Calls Wikipedia API to retrieve the summary | | `gem "wikipedia-client", "~> 1.17.0"` |
|
380
|
+
Retrieve similar documents based on the query string passed in via the [HyDE technique](https://arxiv.org/abs/2212.10496):
|
381
|
+
```ruby
|
382
|
+
client.similarity_search_with_hyde()
|
383
|
+
```
|
539
384
|
|
540
|
-
|
385
|
+
Retrieve similar documents based on the embedding passed in:
|
386
|
+
```ruby
|
387
|
+
client.similarity_search_by_vector(
|
388
|
+
embedding:,
|
389
|
+
k: # number of results to be retrieved
|
390
|
+
)
|
391
|
+
```
|
541
392
|
|
542
|
-
|
393
|
+
RAG-based querying
|
394
|
+
```ruby
|
395
|
+
client.ask(
|
396
|
+
question:
|
397
|
+
)
|
398
|
+
```
|
543
399
|
|
544
|
-
|
400
|
+
## Building chat bots
|
545
401
|
|
546
|
-
|
402
|
+
### Conversation class
|
547
403
|
|
404
|
+
Choose and instantiate the LLM provider you'll be using:
|
548
405
|
```ruby
|
549
|
-
Langchain::
|
406
|
+
llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
|
550
407
|
```
|
551
|
-
|
552
|
-
or
|
553
|
-
|
408
|
+
Instantiate the Conversation class:
|
554
409
|
```ruby
|
555
|
-
Langchain::
|
410
|
+
chat = Langchain::Conversation.new(llm: llm)
|
556
411
|
```
|
557
412
|
|
558
|
-
|
413
|
+
(Optional) Set the conversation context:
|
414
|
+
```ruby
|
415
|
+
chat.set_context("You are a chatbot from the future")
|
416
|
+
```
|
559
417
|
|
418
|
+
Exchange messages with the LLM
|
419
|
+
```ruby
|
420
|
+
chat.message("Tell me about future technologies")
|
421
|
+
```
|
560
422
|
|
561
|
-
|
562
|
-
|
563
|
-
|
564
|
-
|
565
|
-
|
566
|
-
|
567
|
-
| JSON | Langchain::Processors::JSON | |
|
568
|
-
| JSONL | Langchain::Processors::JSONL | |
|
569
|
-
| csv | Langchain::Processors::CSV | |
|
570
|
-
| xlsx | Langchain::Processors::Xlsx | `gem "roo", "~> 2.10.0"` |
|
423
|
+
To stream the chat response:
|
424
|
+
```ruby
|
425
|
+
chat = Langchain::Conversation.new(llm: llm) do |chunk|
|
426
|
+
print(chunk)
|
427
|
+
end
|
428
|
+
```
|
571
429
|
|
572
|
-
|
573
|
-
|
430
|
+
Open AI Functions support
|
431
|
+
```ruby
|
432
|
+
chat.set_functions(functions)
|
433
|
+
```
|
574
434
|
|
575
435
|
## Evaluations (Evals)
|
576
436
|
The Evaluations module is a collection of tools that can be used to evaluate and track the performance of the output products by LLM and your RAG (Retrieval Augmented Generation) pipelines.
|
@@ -598,13 +458,16 @@ ragas.score(answer: "", question: "", context: "")
|
|
598
458
|
# }
|
599
459
|
```
|
600
460
|
|
461
|
+
## Examples
|
462
|
+
Additional examples available: [/examples](https://github.com/andreibondarev/langchainrb/tree/main/examples)
|
463
|
+
|
601
464
|
## Logging
|
602
465
|
|
603
466
|
LangChain.rb uses standard logging mechanisms and defaults to `:warn` level. Most messages are at info level, but we will add debug or warn statements as needed.
|
604
467
|
To show all log messages:
|
605
468
|
|
606
469
|
```ruby
|
607
|
-
Langchain.logger.level = :
|
470
|
+
Langchain.logger.level = :debug
|
608
471
|
```
|
609
472
|
|
610
473
|
## Development
|
@@ -618,31 +481,6 @@ Langchain.logger.level = :info
|
|
618
481
|
## Discord
|
619
482
|
Join us in the [Langchain.rb](https://discord.gg/WDARp7J2n8) Discord server.
|
620
483
|
|
621
|
-
## Core Contributors
|
622
|
-
[<img style="border-radius:50%" alt="Andrei Bondarev" src="https://avatars.githubusercontent.com/u/541665?v=4" width="80" height="80" class="avatar">](https://twitter.com/rushing_andrei)
|
623
|
-
|
624
|
-
## Contributors
|
625
|
-
[<img style="border-radius:50%" alt="Alex Chaplinsky" src="https://avatars.githubusercontent.com/u/695947?v=4" width="80" height="80" class="avatar">](https://github.com/alchaplinsky)
|
626
|
-
[<img style="border-radius:50%" alt="Josh Nichols" src="https://avatars.githubusercontent.com/u/159?v=4" width="80" height="80" class="avatar">](https://github.com/technicalpickles)
|
627
|
-
[<img style="border-radius:50%" alt="Matt Lindsey" src="https://avatars.githubusercontent.com/u/5638339?v=4" width="80" height="80" class="avatar">](https://github.com/mattlindsey)
|
628
|
-
[<img style="border-radius:50%" alt="Ricky Chilcott" src="https://avatars.githubusercontent.com/u/445759?v=4" width="80" height="80" class="avatar">](https://github.com/rickychilcott)
|
629
|
-
[<img style="border-radius:50%" alt="Moeki Kawakami" src="https://avatars.githubusercontent.com/u/72325947?v=4" width="80" height="80" class="avatar">](https://github.com/moekidev)
|
630
|
-
[<img style="border-radius:50%" alt="Jens Stmrs" src="https://avatars.githubusercontent.com/u/3492669?v=4" width="80" height="80" class="avatar">](https://github.com/faustus7)
|
631
|
-
[<img style="border-radius:50%" alt="Rafael Figueiredo" src="https://avatars.githubusercontent.com/u/35845775?v=4" width="80" height="80" class="avatar">](https://github.com/rafaelqfigueiredo)
|
632
|
-
[<img style="border-radius:50%" alt="Piero Dotti" src="https://avatars.githubusercontent.com/u/5167659?v=4" width="80" height="80" class="avatar">](https://github.com/ProGM)
|
633
|
-
[<img style="border-radius:50%" alt="Michał Ciemięga" src="https://avatars.githubusercontent.com/u/389828?v=4" width="80" height="80" class="avatar">](https://github.com/zewelor)
|
634
|
-
[<img style="border-radius:50%" alt="Bruno Bornsztein" src="https://avatars.githubusercontent.com/u/3760?v=4" width="80" height="80" class="avatar">](https://github.com/bborn)
|
635
|
-
[<img style="border-radius:50%" alt="Tim Williams" src="https://avatars.githubusercontent.com/u/1192351?v=4" width="80" height="80" class="avatar">](https://github.com/timrwilliams)
|
636
|
-
[<img style="border-radius:50%" alt="Zhenhang Tung" src="https://avatars.githubusercontent.com/u/8170159?v=4" width="80" height="80" class="avatar">](https://github.com/ZhenhangTung)
|
637
|
-
[<img style="border-radius:50%" alt="Hama" src="https://avatars.githubusercontent.com/u/38002468?v=4" width="80" height="80" class="avatar">](https://github.com/akmhmgc)
|
638
|
-
[<img style="border-radius:50%" alt="Josh Weir" src="https://avatars.githubusercontent.com/u/10720337?v=4" width="80" height="80" class="avatar">](https://github.com/joshweir)
|
639
|
-
[<img style="border-radius:50%" alt="Arthur Hess" src="https://avatars.githubusercontent.com/u/446035?v=4" width="80" height="80" class="avatar">](https://github.com/arthurhess)
|
640
|
-
[<img style="border-radius:50%" alt="Jin Shen" src="https://avatars.githubusercontent.com/u/54917718?v=4" width="80" height="80" class="avatar">](https://github.com/jacshen-ebay)
|
641
|
-
[<img style="border-radius:50%" alt="Earle Bunao" src="https://avatars.githubusercontent.com/u/4653624?v=4" width="80" height="80" class="avatar">](https://github.com/erbunao)
|
642
|
-
[<img style="border-radius:50%" alt="Maël H." src="https://avatars.githubusercontent.com/u/61985678?v=4" width="80" height="80" class="avatar">](https://github.com/mael-ha)
|
643
|
-
[<img style="border-radius:50%" alt="Chris O. Adebiyi" src="https://avatars.githubusercontent.com/u/62605573?v=4" width="80" height="80" class="avatar">](https://github.com/oluvvafemi)
|
644
|
-
[<img style="border-radius:50%" alt="Aaron Breckenridge" src="https://avatars.githubusercontent.com/u/201360?v=4" width="80" height="80" class="avatar">](https://github.com/breckenedge)
|
645
|
-
|
646
484
|
## Star History
|
647
485
|
|
648
486
|
[![Star History Chart](https://api.star-history.com/svg?repos=andreibondarev/langchainrb&type=Date)](https://star-history.com/#andreibondarev/langchainrb&Date)
|